This project has moved. For the latest updates, please go here.

Performance of AdvancedComplexMath.Erf

Jul 23, 2014 at 3:49 PM
Hi there

Thanks for the amazing library!

I was wondering whether where there's anything I can do to improve the performance of the AdvancedComplexMath.Erf function. My application's performance is mostly limited by this function.

I could live with reduced with precision.

Jul 23, 2014 at 7:52 PM
It's great to know which functions are being used and where performance improvements are needed. I will look into the performance of this function. It would be helpful to know what regime (e.g. imaginary parts around -1000 and real parts around +1, or whatever) is of interest to you. Thanks!
Jul 23, 2014 at 7:58 PM
Thanks for the quick reply!

I'm calling the function in the range -100 .. 100 + i * (-0.5 .. 0.5).

Also, I'm toying with various caching approaches & LUTs which seem to alleviate the performance problem somewhat. And I'm still working on a reformulation of my problem that wouldn't involve Erf.
Jul 23, 2014 at 9:53 PM
I spent a little time investigating and I wanted to give you an update.

I measured the performance of various complex functions in the region you specified. On my laptop, simple operations on complex numbers like log or sine typically take 0.3 microseconds. Since the implementation of these simple functions reduces to a few highly optimized built-ins, you can take this as a lower bound on the evaluation time for more complicated complex functions; no matter how great the code, I am very unlikely to get any complex function evaluation faster than this. Evaluation of the complex error function currently takes 4 microseconds, so about 10 times slower than a basic function. For the complex gamma function, which uses a well-optimized algorithm that has been the industry standard for 10-15 years, evaluation takes 2 microseconds, so about 5 times slower than a basic operation.

Realistically, then, any performance improvement we are likely to get out of the complex error function is pretty unlikely to speed it up by more than a factor of 2. We would be proud of a 50% improvement and a factor of 4 would be beyond our wildest expectations. I had hoped that you had uncovered a region in which our algorithm was simply terrible and we could just replace it with a better one for that region and get an orders-of-magnitude improvment. (For example, I know that for orders larger than a few thousand, our Bessel function implementation is terrible in this way. So, by the way, is Mathematica's.) But it looks more like our performance is pretty much what you would expect for a complicated complex function and your application's demands are simply very high.

I am not saying to abandon all hope: we will look at this algorithm and if a small speed-up would satisfy your requirements we may be able to meet them. But looking at caching and look-up tables is probably a good idea. Might I also suggest parallel computation? This function is certainly thread-safe and most machines today are multi-core.
Marked as answer by deiruch on 7/25/2014 at 2:54 AM
Jul 25, 2014 at 9:54 AM
Thank you very much for taking your time and looking into this!

I'm already using multithreading (& the new SIMD/RyuJIT stuff) for most parts of my algorithm.