Simple Computation Benchmarks

I know that the EV3 system is not made for CPU intensive work. Nevertheless, this might be interesting for some programs, and I could do this without disturbing the physical setup of my son. My first program was a simple Mandelbrot computation, and part of my motivation to do this was because this turned out to be much slower than I thought it should be.

First of all, the EV3 is not slow. You are likely to see any delay in computations only when you do thousands or more operations in quick succession. The Mandelbrot program fills the screen pixel by pixel, and does up to 10 iterations for each of them, which is why you can actually see the delay in computation. My initial version filled the screen in something like 5 minutes, and I suspected I could do better. It turned out I was right, but the biggest boost came from changing an unsuspected part of the program.

In order to measure the time the EV3 takes to execute something I enclose this something in a loop with a fixed loop counter. Just before the loop I reset a timer, which I read just after the loop finishes. I run each of these loops three times, calculate the average and display the time on the EV3 display. The inner loop I run five times, with 1000, 3000, 10000, 30000 and 100000 iterations each. I change this by hand in between invocations. Using a variable and another loop over these five cases turned out to change the results too much. The whole program, with an empty inner loop, is shown here.

One word to measurement errors: While I did an average over three runs for each number, the random errors are still noticeable, I would estimate at least 5 percent. Also, I ran this on an EV3 without motors or sensors connected, as this tends to change the results in a noticeable way. I didn't measure this yet, but three motors and one sensor slowed each computation down by about 10 percent. In addition, I left the USB cable attached to the brick while taking measurements, for convenience. Running the benchmarks without USB connection tended to give about 5 to 10 percent shorter times than the ones noted below.

Empty loop

	iterations	time (s)

	1000	0.064
	3000	0.184
	10000	0.646
	30000	1.928
	100000	6.492

I started with the simplest case: an empty loop, mainly to have an idea how expensive the loop itself is. The numbers show a pretty decent linear trend, which should be expected.

Loop interrupt

	iterations	time (s)

	1000	0.116
	3000	0.314
	10000	1.006
	30000	2.975
	100000	9.971

One simple example of a simple, non-empty loop is a loop with a loop interrupt block inside. This was a block I used in my Mandelbrot implementation and I wondered if this is causing trouble, and indeed, it was. I assumed that what a loop interrupt block is doing would be to either directly go to the end of the specified loop and continue execution there, or alternatively, to treat all blocks inside of the specified loop as no-op (doing nothing) until execution comes to the end of this loop, and then immediately exits. Thus, I expected the times for this construct to be very close to zero, and independent of the specified loop count since it is supposedly aborting the loop within the first iteration. However, as you can see, times increase for higher loop counts. Indeed, what the loop interrupt block seems to be really doing is converting all blocks within the loop to no-ops, but still going through their sequence, and most of all, going through all loop iterations that way. This can also be seen by printing the loop counter after such a loop. I am not sure why the Lego developers did the loop interrupt block this way, but I find this behavior very unexpected and undesirable.

I later got the behavior I expected from the loop interrupt block by replacing the counter loop by a logic loop with my condition for the interrupt block feeding in to the loop condition. I can now get the Mandelbrot picture in a few seconds instead of minutes.

Logic loop

	iterations	time (s)

	1000	0.113
	3000	0.337
	10000	1.089
	30000	3.322
	100000	11.078

Another try with different loops was a logic loop, using it's own loop counter as break condition. The times show that this is worse than an empty counter loop, with roughly 50 percent longer execution time for the one additional block.

Variable read/write

	iterations	time (s)

	1000	0.109
	3000	0.312
	10000	1.063
	30000	3.193
	100000	10.704

Another simple loop body is a variable read, followed immediately by a variable write to the same variable. Times are comparable to the logic loop, indicating that only one of the blocks has 'computational time' associated with it. And indeed, when running the program attached to the software, only the variable write block is indicated as 'active'.

Simple math increment

	iterations	time (s)

	1000	0.151
	3000	0.447
	10000	1.516
	30000	4.368
	100000	14.535

Adding a bit complexity, I next incremented a variable for each loop iteration. This means using one more block (a simple math block in this case), increasing runtime by about 40 percent compared to the simple read-write case.

Simple math increment using constant

	iterations	time (s)

	1000	0.152
	3000	0.456
	10000	1.508
	30000	4.377
	100000	14.559

This case only shows that using a constant block is essentially identical to using a '1' directly in the input of the math block, not only in the result, but also at runtime. Reading variables (or constants) seems to be pretty cheap.

Advanced math increment

	iterations	time (s)

	1000	0.124
	3000	0.371
	10000	1.124
	30000	3.616
	100000	11.110

Now I come to the second surprise of this whole exercise. I exchanged the simple math block with it's advanced version, using '1' as one of the inputs and simple incrementing a variable as in the previous cases. I expected either the same, or slower execution than the simple block. To my surprise, the advanced block was much faster than the simple block, even for such a simple operation. My conclusion: always use the advanced math block if execution time is important.

Display pixel, without and with screen erase

	iterations	time (s)

	1000	0.272
	3000	0.799
	10000	2.658
	30000	7.977
	100000	26.613

	iterations	time (s)

	1000	0.301
	3000	0.916
	10000	2.953
	30000	8.890
	100000	29.513

The next two cases measure the EV3s ability to quickly manipulate it's display, pixel by pixel. I did this because that is what is needed to show the result of the Mandelbrot computation. The only difference between the two cases above is whether I clear the screen before drawing each pixel. It can be seen that drawing a pixel is about five times more expensive than simple blocks like a variable write or a simple math block, and that clearing the display is actually not that expensive (considering that this touches all pixels). This means that the brick itself can connect to the display quite fast, and most of the time visible by the users is overhead from using the blocks.

Conclusion

While this was just a tiny experiment, it shows quite nicely that the brick hardware is very likely not the limiting factor for the computational speed seen using the Lego software. Much more likely is that the speed is actually determined by implementation overhead of the blocks, or maybe even by intended delays within each block.