Simple Computation Benchmarks

I know that the EV3 system is not made for CPU intensive work. Nevertheless, this might be interesting for some programs, and I could do this without disturbing the physical setup of my son. My first program was a simple Mandelbrot computation, and part of my motivation to do this was because this turned out to be much slower than I thought it should be.

First of all, the EV3 is not slow. You are likely to see any delay in computations only when you do thousands or more operations in quick succession. The Mandelbrot program fills the screen pixel by pixel, and does up to 10 iterations for each of them, which is why you can actually see the delay in computation. My initial version filled the screen in something like 5 minutes, and I suspected I could do better. It turned out I was right, but the biggest boost came from changing an unsuspected part of the program.

In order to measure the time the EV3 takes to execute something I enclose this something in a loop with a fixed loop counter. Just before the loop I reset a timer, which I read just after the loop finishes. I run each of these loops three times, calculate the average and display the time on the EV3 display. The inner loop I run five times, with 1000, 3000, 10000, 30000 and 100000 iterations each. I change this by hand in between invocations. Using a variable and another loop over these five cases turned out to change the results too much. The whole program, with an empty inner loop, is shown here.

benchmark setup

One word to measurement errors: While I did an average over three runs for each number, the random errors are still noticeable, I would estimate at least 5 percent. Also, I ran this on an EV3 without motors or sensors connected, as this tends to change the results in a noticeable way. I didn't measure this yet, but three motors and one sensor slowed each computation down by about 10 percent. In addition, I left the USB cable attached to the brick while taking measurements, for convenience. Running the benchmarks without USB connection tended to give about 5 to 10 percent shorter times than the ones noted below.

Empty loop

empty loop
iterationstime (s)
1000 0.064
3000 0.184
10000 0.646
30000 1.928
100000 6.492

I started with the simplest case: an empty loop, mainly to have an idea how expensive the loop itself is. The numbers show a pretty decent linear trend, which should be expected.

Loop interrupt

loop interrupt
iterationstime (s)
1000 0.116
3000 0.314
10000 1.006
30000 2.975
100000 9.971

One simple example of a simple, non-empty loop is a loop with a loop interrupt block inside. This was a block I used in my Mandelbrot implementation and I wondered if this is causing trouble, and indeed, it was. I assumed that what a loop interrupt block is doing would be to either directly go to the end of the specified loop and continue execution there, or alternatively, to treat all blocks inside of the specified loop as no-op (doing nothing) until execution comes to the end of this loop, and then immediately exits. Thus, I expected the times for this construct to be very close to zero, and independent of the specified loop count since it is supposedly aborting the loop within the first iteration. However, as you can see, times increase for higher loop counts. Indeed, what the loop interrupt block seems to be really doing is converting all blocks within the loop to no-ops, but still going through their sequence, and most of all, going through all loop iterations that way. This can also be seen by printing the loop counter after such a loop. I am not sure why the Lego developers did the loop interrupt block this way, but I find this behavior very unexpected and undesirable.

I later got the behavior I expected from the loop interrupt block by replacing the counter loop by a logic loop with my condition for the interrupt block feeding in to the loop condition. I can now get the Mandelbrot picture in a few seconds instead of minutes.

Logic loop

logic loop
iterationstime (s)
1000 0.113
3000 0.337
10000 1.089
30000 3.322
10000011.078

Another try with different loops was a logic loop, using it's own loop counter as break condition. The times show that this is worse than an empty counter loop, with roughly 50 percent longer execution time for the one additional block.

Variable read/write

variable read/write
iterationstime (s)
1000 0.109
3000 0.312
10000 1.063
30000 3.193
10000010.704

Another simple loop body is a variable read, followed immediately by a variable write to the same variable. Times are comparable to the logic loop, indicating that only one of the blocks has 'computational time' associated with it. And indeed, when running the program attached to the software, only the variable write block is indicated as 'active'.

Simple math increment

simple math increment
iterationstime (s)
1000 0.151
3000 0.447
10000 1.516
30000 4.368
10000014.535

Adding a bit complexity, I next incremented a variable for each loop iteration. This means using one more block (a simple math block in this case), increasing runtime by about 40 percent compared to the simple read-write case.

Simple math increment using constant

simple math increment using constant block
iterationstime (s)
1000 0.152
3000 0.456
10000 1.508
30000 4.377
10000014.559

This case only shows that using a constant block is essentially identical to using a '1' directly in the input of the math block, not only in the result, but also at runtime. Reading variables (or constants) seems to be pretty cheap.

Advanced math increment

advanced math increment
iterationstime (s)
1000 0.124
3000 0.371
10000 1.124
30000 3.616
10000011.110

Now I come to the second surprise of this whole exercise. I exchanged the simple math block with it's advanced version, using '1' as one of the inputs and simple incrementing a variable as in the previous cases. I expected either the same, or slower execution than the simple block. To my surprise, the advanced block was much faster than the simple block, even for such a simple operation. My conclusion: always use the advanced math block if execution time is important.

Display pixel, without and with screen erase

pixel without blank
iterationstime (s)
1000 0.272
3000 0.799
10000 2.658
30000 7.977
10000026.613
pixel with blank
iterationstime (s)
1000 0.301
3000 0.916
10000 2.953
30000 8.890
10000029.513

The next two cases measure the EV3s ability to quickly manipulate it's display, pixel by pixel. I did this because that is what is needed to show the result of the Mandelbrot computation. The only difference between the two cases above is whether I clear the screen before drawing each pixel. It can be seen that drawing a pixel is about five times more expensive than simple blocks like a variable write or a simple math block, and that clearing the display is actually not that expensive (considering that this touches all pixels). This means that the brick itself can connect to the display quite fast, and most of the time visible by the users is overhead from using the blocks.

Conclusion

While this was just a tiny experiment, it shows quite nicely that the brick hardware is very likely not the limiting factor for the computational speed seen using the Lego software. Much more likely is that the speed is actually determined by implementation overhead of the blocks, or maybe even by intended delays within each block.