Friday, February 7, 2014

Flood fill in Java - Part 2 - Optimized Version


Introduction


Periodically I would check which pages of my blog are getting more visitors. I found that one of the pages that get the most hits is How to do a flood fill operation in Java ?

Since I wish to know who is viewing my page I do a simple search "macteki flood fill" in google. Soon I find out that my flood filler was used as a starting point of a student assignment in Eastern Washsington University in 2012. The instructor was Dr. Edwin Armstrong and the details of the course is still on the web as of the day I write this article: CS 370 GUI Programming


When students started to complete their assignments, some questions arose and they asked for help in the web. Hence my code started spreading in different programming forums. This brings more visitors here. When I looked at those forums recently, there were some opinions that this flood filler is slow and without error checking.


In the original version, I didn't do a range check before inserting points in the examination list. This will cause an exception if the caller cannot guarantee that the seeding point was inside a closed region. It is simple to avoid such exception by checking the x-y bounds before inserting the points. I didn't do that just because I wanted to make the demonstration program as simple as possible. And I have warned in the source code : "make sure the initial point is inside the circle"
If the seeding point is outside the closed region, it would not work.


Be Reluctant to Optimize


Since this blog is focused on readability, if there are two ways of doing the same work, one is more readable and one is faster, I would often choose the more readable one. However, since I saw quite a lot suggestions on how to make a faster implementation, as the original author, I decided to make an optimized version myself.


Intensive optimization is the enemy of source code readability. Since this blog preferred readability over performance. This article will be a very exceptional one.


Target : a 100 times speed up


Before we optimize, we need to set a target. Most of the time you don't need a 100 times speed up, all you need is a reasonable response time for the user. If your flood filler is used in a painting program, I would say the original version was already fast enough for the purpose. If your flood filler is used as part of image processing such as OCR, then you may want a faster version.

Since I am not targeting any specific applications, so I just set a high target. I think a 100 times speed up is impressive enough.


Assume nothing: Time your code

Before we made any changes, we need to have some timing functions to measure our improvements. Timing for a single flood fill is not a good idea because the elapsed time would be so short that the error margin becomes very high. Therefore I measure the elapsed time for 1000 flood fills. However, we cannot fill an already yellow circle with yellow again. Because the flood fill algorithm assumes that the original color in the region is different from the color to be filled. Therefore we need to fill the circle with yellow then with cyan alternately. On my machine, it took 9.999 second for the 1000 operations. You may want to reduce the number of loops if your computer is slow. For example, you may loop 100 times instead of 500 times.


Optimization 1: Knows your API


If you look at the original version, you will see it is using an ArrayList to store the points to be tested. However, since it always adds to the tail and removes from the head, obviously an ArrayList is not the fastest data structure for these operations. If you have some idea in data structure, you may know a faster data structures for adding and removing for both ends is a LinkedList. So our first optimization include a one line change only, using java.util.LinkedList instead of java.util.ArrayList. The only line changed was line 4. It is now using a LinkedList instead of an ArrayList. Now we measure the elapsed time again : it is now 5.615s. (was 9.999s) A single line change results in a speed up of over 40%. That is not bad



Optimization 2: Check before adding

Looking at the original version, we always add all the four neighbors to the list. However, we know that if the neighbor color is not the same as the initial color, we don't need to add them to the list. As I said, optimization tends to be the enemy of source code readability. This optimization would start to kill simplicity and readability. New elapse time is 4.196s (was 5.615s)
This is a 25% speedup. We still have a long way to go. Let's make a big jump in next step.

Optimization 3: Removing Function Calls

The original version frequently calls the image API getRGB() and setRGB(). We may remove these API calls by using direct access to the image buffer. The data structure of the image buffer is a one dimensional array in row major order. The index of the array is computed by :
  index=y*w+x;  // where w=width of image
Hence the API
  initialColor=image.getRGB(x,y); 
would become
  initialColor=pixels[y*w+x];
And the API
  image.setRGB(x,y-1,fillColor);
would become
  pixels[(y-1)*w+x]=fillColor;
The new version of flood filler becomes : As I said, this step is a big jump. The new elapse time is : 0.53s (was 4.196s)

If we compares the original elapsed time 9.999s with the new elapsed time 0.53s, we would see a 18 times speed up.
Note: It is a 18 TIMES speed up, not a 18 PERCENT speed up. It is a big speed up already.

Optimization 4: Use Integer List instead of Point List

This optimization helps to reduce the complexity of data structure. This step itself does not decrease the elapsed time. You may treat this as a code refactoring that makes further optimizations possible. The new elapsed time is : 0.531s (was 0.53s)
As I said, this step is a code refactoring. Hence it doesn't improve performance.

Optimization 5: Use simple data structure instead of LinkedList


Since we have used list of integers to replace list of points. We may now use a much simpler structure to replace the LinkedList. The simplest data structure that fits our application is a linear queue implemented as integer array. Reducing the complexity of data structure helps a lot. The new elapsed time is now 0.156s (was : 0.531s) Comparing with the original 9.999s, it is a 64 times speed up now.

Optimization 6: Avoid memory allocation

Java memory allocation is much slower than you think, especially when you are allocating a big array. The next optimization avoid allocating the queue[] array every time the flood filler is called. It would try to reused an existing array if the array is already big enough. The new elapsed time is 0.109s (was 0.156s) Comparing with the original 9.999s, it is a 91 times speed up now. We are getting closed.


Optimization 7: Removing redundant operations

Looking at the above version, we see many redundant calculations. For example, index-1 appears three times, index-w appears three times, index+w appears three times. The following optimization removes the index recalculation, with the sacrifice of readability again. The new elapsed time is : 0.094s (was : 0.109s).
Comparing with the original 9.999s again, a 106 times speed up.
Finally we made it !


I think over a hundred times speed up is impressive enough, and we would stop here.


Conclusion

We have just completed an optimization journey. As you may see, the optimized version is obviously less readable and less intuitive than the original version. That was the reason why the original version was presented. If you ask me : Can you optimize further ? I would say : definitely yes. If you ask me : Can you make another 100 times speedup ? I would say : absolutely not.(I spoke for me only. Someone else may be possible to speed it up for another 100 times.) If you ask my general opinions on optimization, I would say don't do it unless it is absolutely necessary. And stop it as soon as your target is reached. After doing the optimization, be sure to keep the non-optimized version as a control for unit testing. The final result and the full source code of the optimized version follows :