Wednesday, April 13, 2011

Sorting in Java - Part 6

Concurrent MergeSort


This is part 6 of the series. It is suggested to read the whole series from the beginning.

The need for concurrency

In the past, parallel processing is a luxury for home user. Nowadays most newly made home computers are equipped with multi-processors. Mastering the technique of concurrent programming is becoming more and more important.

Multi-threading in Java

Suppose we have two processors in our system, how would we control which CPU to use ? Actually, we don't have to bother about this. The thread scheduling system should evenly distribute the running threads to the available processors. All we have to do is to create a thread for each of the independent task that can be executed concurrently.

Designing concurrent algorithm

If an algorithm is sequential in nature, multi-threading won't help to improve performance. The best way to learn concurrent programming is to try to implement an algorithm which is concurrent in nature. That is, the sub-problems are independent to each other and the sub-problems are of similar complexity. Merge Sort falls in this category.

Simple Threading in Java

  1. Implement a Runnable interface or extend a Thread class.
  2. Implement the run() method, which is the thread entry point.
  3. Start the thread.
  4. Wait for the running thread to complete.

/******************************************************************************
* File : SimpleThread.java
* Author : http://java.macteki.com/
* Description :
*   Simple Thread Demonstration.
* Tested with : JDK 1.6
******************************************************************************/

class SimpleThread extends Thread  // step 1, extends the Thread class
{
  // step 2, implements the run() method
  public void run()
  {
    System.out.println("Count down thread started.");
    String delimiter="";
    for (int i=10;i>=0;i--)
    {     
      System.out.print(delimiter+i);
      delimiter="...";
      try {Thread.sleep(1000); } catch (Exception ignore) {}      
    }
    System.out.println("\nThread end !!!");
  }

  public static void main(String[] args) throws Exception
  {
    // step 3, start the thread
    SimpleThread thread=new SimpleThread();  
    thread.start();
    
    // step 4, wait for the thread to complete,
    thread.join();

    System.exit(0);
  }
}

Writing two concurrent threads

Actually, almost all Java programs are multi-thread in nature. Even we don't start any thread explicitly, a running Java program will start many System Threads such as the IO thread and the GC thread.

System threads are handled by the system. As a multi-threading programmer, we are interested in user thread only. The main thread is the first user thread for all Java programs, its entry point is the main() method. Inside the main thread, we may start another thread. In this case, the main thread is known as a parent thread and the newly started thread is known as the child thread. And they are running in parallel.

To demonstrate this, we could rewrite the above sample so that the main thread is not just waiting. While the child thread is counting down, we may let the main thread counts up, in a faster step.

/******************************************************************************
* File : SimpleThread.java (version 2)
* Author : http://java.macteki.com/
* Description :
*   Simple Threads Demonstration.
* Tested with : JDK 1.6
******************************************************************************/

class SimpleThread extends Thread  // step 1, extends the Thread class
{
  // step 2, implements the run() method
  public void run()
  {
    System.out.println("Count down thread started.");
    String delimiter="";
    for (int i=10;i>=0;i--)
    {     
      System.out.print(delimiter+i);
      delimiter="...";
      try {Thread.sleep(1000); } catch (Exception ignore) {}      
    }
    System.out.println("\nThread end !!!");
  }

  public static void main(String[] args) throws Exception
  {
    // step 3, start the thread
    SimpleThread childThread=new SimpleThread();  
    childThread.start();

    // step 3.5, let the main thread do something so that we
    // know we are running parallel threads.
    for (int i=0;i<=10;i++)
    {
      System.out.print("[MAIN:"+i+"]");
      try { Thread.sleep(500); } catch (Exception ignore) {}
    }
    
    // step 4, wait for the child thread to complete,
    childThread.join();

    System.exit(0);
  }
}

Synchronization

In the above sample, the main thread counts up faster than the child thread. Hence the main thread completes it tasks faster and the method childThread.join() would put the main thread in a sleeping state. It will be sleeping until the child thread finish. What if the main thread is doing a more time consuming tasks ? For example, if we change Thread.sleep(500) into Thread.sleep(2000), then the child thread will complete the task much faster, in that case, there is no problem. childThread.join() will return immediately and the main thread will just continue, since it has no more task to do, it will just call System.exit(). The join() method is said to be a synchronization point because after the join() method returns control to the caller, we can be sure both the parent and the child thread have completed their tasks, no matter which thread actually completed the task earlier.

Concurrent Merge Sort

Let's recall the sequential merge sort algorithm in the previous article.

mergeSort(a,low,low+mid-1,helper,depth+1);      // sort the left sub-list
mergeSort(a,low+mid,high,helper,depth+1);       // sort the right sub-list
merge(a,low,high,helper);      // merge the sorted lists
Both sub-list are sorted in the main thread, one after another. Since they are independent tasks, why not start a child thread to sort one of the sublist ?
// sort the left sub-list using a child thread
SortingThread bThread=new SortingThread(a,low,low+mid-1,helper,depth,"bThread ");

// sort the right sub-list in the parent thread
mergeSort(a,low+mid,high,helper,depth+1);      

bThread.join();                // parent-child synchronization

merge(a,low,high,helper);      // merge the sorted lists

One important point, we don't want to start too many threads. For a system with two processors, two concurrent threads is optimal. This can be done by adding a depth variable which is increased by 1 for each recursive call. If we limit the program to start a thread only at depth <=0, then at most 1 child thread will be started.

Source code

/******************************************************************************
* File : ConcurrentMergeSort.java
* Author : http://java.macteki.com/
* Description :
*   Concurrent Merge Sort
* Tested with : JDK 1.6
******************************************************************************/

class ConcurrentMergeSort
{

  public static void printArray(int[] a)
  {
    String delim="";
    for (int i=0;i<a.length;i++)
    {
      System.out.print(delim+a[i]);
      delim=",";   // add delimiter after first element
    }
    System.out.println();
  }


  public static void mergeSort(int[] a,int low,int high,int[] helper,int depth) 
  throws Exception
  {
    int length=high-low+1;
    if (length<=1) return;   // a list with only 1 element is already sorted
    int mid=length/2;

    if (depth<=0)
    {
      // sort the left sub-list using a child thread
      SortingThread bThread=new SortingThread(a,low,low+mid-1,helper,depth,"bThread ");

      // sort the right sub-list in the parent thread
      mergeSort(a,low+mid,high,helper,depth+1);      

      bThread.join();                // parent-child synchronization

      merge(a,low,high,helper);      // merge the sorted lists
    }
    else  // sequential merge sort
    {
      mergeSort(a,low,low+mid-1,helper,depth+1);      // sort the left sub-list
      mergeSort(a,low+mid,high,helper,depth+1);       // sort the right sub-list
      merge(a,low,high,helper);      // merge the sorted lists
    }
  }

  public static void merge(int[] a, int low,int high,int[] helper)
  {
    int mid=(high-low+1)/2;
    int ai=low,bi=low,ci=low+mid;

    while (ai<=high)
    {
      if (bi<=low+mid-1 && ci<=high)  // both lists contains element
      {
        helper[ai++]=(a[bi]<a[ci])?a[bi++]:a[ci++];
      }    
      else if (bi<=low+mid-1)    // only list B contains element
      {
        while (ai<=high) helper[ai++]=a[bi++];
      }
      else                    // only list C contains element
      {
        while (ai<=high) helper[ai++]=a[ci++];      
      }
    }

    // copied back the helper to the input array
    for (int i=low;i<=high;i++)
    {
      a[i]=helper[i];
    }
  }


  public static void main(String[] args) throws Exception
  {
    int[] a={86,29,29,20,91,78,73,13,98,85,90,19,98,66,12,41,36,33,19,94,27,49,23,97,94};

    int[] helper=new int[a.length];
    mergeSort(a,0,a.length-1,helper,0);    
    
    printArray(a);
  }
}


class SortingThread extends Thread
{
  static int threadCount;
  int threadID;
  int[] subArray=null;
  int[] helper=null;
  int depth=0;
  int low=0,high=0;
  String name="null";

  public SortingThread(int[] a, int low,int high,int[] helper,int depth, String name)
  {
    this.subArray = a;
    this.depth = depth;
    this.name = name;
    this.low=low;
    this.high=high;
    this.helper=helper;
    this.threadID=threadCount++;
    this.start();  // start the thread immediately
  } 

  // thread entry point
  public void run()
  {
    try
    {      
      ConcurrentMergeSort.mergeSort(subArray,low,high,helper,depth+1);
      subArray=null;  helper=null;  // garbage collection friendly
    } catch (Exception e)
    {
      e.printStackTrace();
    }
  }
}

Performance

Concurrent Merge Sort does not improve performance in a single processor platform. In a multi-processor platform, it has a good speed up. The following is the result of sorting 20 million integers in my single processor computer (Pentium M 1.73Ghz).

  • (Part 1)JavaSort n=20000000 elapse=7.341s
  • (Part 3)ShellSort n=20000000 elapse=12.438s
  • (Part 4)SimpleMergeSort n=20000000 elapse=31.966s
  • (Part 5)HelperMergeSort n=20000000 elapse=13.149s
  • (Part 6)ConcurrentMergeSort n=20000000 elapse=13.299s
Since the Insertion Sort introduced in Part 2 may takes hours or even days to sort 20 million integers, it is omitted in the test.

From the above test, concurrent merge sort is not faster than the sequential merge sort introduced in Part 5.

Let's start another test on a multiprocessor environment. The following is the result of running the test in my macbook air 2010, which is equipped with 1.86GHz Intel Core 2 Duo processor.

  • (Part 1)JavaSort n=20000000 elapse=4.593s
  • (Part 3)ShellSort n=20000000 elapse=7.925s
  • (Part 4)SimpleMergeSort n=20000000 elapse=14.329s
  • (Part 5)HelperMergeSort n=20000000 elapse=6.515s
  • (Part 6)ConcurrentMergeSort n=20000000 elapse=4.323s

This time concurrent merge sort is running significantly faster than helper merge sort(4.323s vs 6.515s). Of course we cannot expect a double speed up because the final merge() step is still sequential. For such a simple modification, it is worth the effort for this speed up.

1 comment:

  1. The final merge step doesn't need to be sequential. See http://www.drdobbs.com/parallel/parallel-merge/229204454 for a parallel implementation of the merge algorithm.

    You might try depth<1 or 2 if you have more than just 2 cores available.

    ReplyDelete