This article primarily focuses on how to enhance the performance of highly intensive computational tasks through minor techniques, including functional programming, kernel parallelization, compilation optimization, and ultimately, utilizing the capability of GPU acceleration using CUDA.

Linking Mathematica with C++ is also a great way to boost performance, but we will not cover it here, as Luyan Yu has authored another excellent article on this topic, available at:

Linking C++ code with Mathematica using LibraryLink - Luyan Yu

Let’s begin with a simple task, find out the sum of all number digits within 1,000,000.:

for example, sum of all number digits within 12 will be :

1+2+3+4+5+6+7+8+9+1+0+1+1+1+2=51.

To solve the problem itself is fairly easy, we first consider number in this form from 000,000 to 999,999, each digit (0 through 9) appears an equal number of times in each position. For each position, the sum of every 10 digits is 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45. Thus, The sum of the digits of all numbers in any given position is 45 * 100,000 = 4,500,000. Then, since there are six positions, the total sum of all digits from 000,000 to 999,999 is 6 * 4,500,000 = 27,000,000. Then we add 1,000,000 as a final 1 to get the result 27,000,001.

The formula of total digits sum with in 10^x can be easily deduced, x*45*10^(x-1)+1 or

$$ f(x)=\frac{9}{2}\ 10^x x+1 $$

However, solving this specific problem is not our focus here, nor is the solving algorithm itself. This article aims to teach you techniques for enhancing the performance of any general function or algorithm you might write in the future. We will focus on optimizing performance by using different implementations of the same algorithm in Mathematica. Thus, the algorithm used for testing should be intentionally kept consistent for control variable purposes.

Let's start with the simplest and most basic approach: C-style coding. We will use this as our baseline performance test algorithm.

If you are a C programmer, you might write Mathematica code like this:

cDigitSum[nn_] := Module[{sum = 0, n = nn},
    While[
        n > 0
        ,
        sum += Mod[n, 10];
        n = Quotient[n, 10];
    ];
    sum
]

AbsoluteTiming[
    total = 0;
    For[n = 0, n <= 10^6, n++,
        total += cDigitSum[n]
    ];
    total
]

{11.9438, 27000001}

Costs around 12 seconds, which is not very efficient. but don’t be discoursed. Let’s see what Mathematica can do.

Surprisingly, Mathematica has tons of magic built-in function, even include this one : DigitSum. Even more surprising is that it is slower than our humble and trivial cDigitSum function.