
Hi there
Thanks for the amazing library!
I was wondering whether where there's anything I can do to improve the performance of the AdvancedComplexMath.Erf function. My application's performance is mostly limited by this function.
I could live with reduced with precision.
Cheers,
Simon


Coordinator
Jul 23, 2014 at 7:52 PM

It's great to know which functions are being used and where performance improvements are needed. I will look into the performance of this function. It would be helpful to know what regime (e.g. imaginary parts around 1000 and real parts around +1, or
whatever) is of interest to you. Thanks!



Thanks for the quick reply!
I'm calling the function in the range 100 .. 100 + i * (0.5 .. 0.5).
Also, I'm toying with various caching approaches & LUTs which seem to alleviate the performance problem somewhat. And I'm still working on a reformulation of my problem that wouldn't involve Erf.


Coordinator
Jul 23, 2014 at 9:53 PM

I spent a little time investigating and I wanted to give you an update.
I measured the performance of various complex functions in the region you specified. On my laptop, simple operations on complex numbers like log or sine typically take 0.3 microseconds. Since the implementation of these simple functions reduces to a few highly
optimized builtins, you can take this as a lower bound on the evaluation time for more complicated complex functions; no matter how great the code, I am very unlikely to get any complex function evaluation faster than this. Evaluation of the complex error
function currently takes 4 microseconds, so about 10 times slower than a basic function. For the complex gamma function, which uses a welloptimized algorithm that has been the industry standard for 1015 years, evaluation takes 2 microseconds, so about 5
times slower than a basic operation.
Realistically, then, any performance improvement we are likely to get out of the complex error function is pretty unlikely to speed it up by more than a factor of 2. We would be proud of a 50% improvement and a factor of 4 would be beyond our wildest expectations.
I had hoped that you had uncovered a region in which our algorithm was simply terrible and we could just replace it with a better one for that region and get an ordersofmagnitude improvment. (For example, I know that for orders larger than a few thousand,
our Bessel function implementation is terrible in this way. So, by the way, is Mathematica's.) But it looks more like our performance is pretty much what you would expect for a complicated complex function and your application's demands are simply very high.
I am not saying to abandon all hope: we will look at this algorithm and if a small speedup would satisfy your requirements we may be able to meet them. But looking at caching and lookup tables is probably a good idea. Might I also suggest parallel computation?
This function is certainly threadsafe and most machines today are multicore.



Thank you very much for taking your time and looking into this!
I'm already using multithreading (& the new SIMD/RyuJIT stuff) for most parts of my algorithm.

