Sunday, April 10, 2011

Sorting in Java - Part 4

This is part 4 in the series. It is suggested to read the whole series from the beginning.


Simple Merge Sort


Merge Sort is an example of divide and conquer strategy for problem solving. It divides a big problem into smaller sub-problem, until the small problem has a trivial solution. Let's take the following unsorted list as an example.
[86,29,29,20,91,78,73,13]

We divide the list into two smaller sub-lists, by chopping it in the middle.
[86,29,29,20]  [91,78,73,13]

Sort the individual sub-lists with any methods you have on hand. (For example, shell sort in the previous article.)
[20,29,29,86]  [13,73,78,91]

Then merge the sub-lists to create a sorted list. Here is the merging algorithm.
  • Compare the first element of both lists
  • [20,29,29,86]  [13,73,78,91]
    
  • Remove the smaller element and append it to the result list.
    Sublist : [20,29,29,86]  [73,78,91]
    Result  : [13]
    
  • Repeating the above steps, compare the first element of both lists.
    [20,29,29,86]  [73,78,91]
    
  • Remove the smaller element and append it to the result list.
    Sublist : [29,29,86]  [73,78,91]
    Result  : [13 20]
    
  • Repeating the above steps until one of the sub-lists is empty, then append the rest of the other sub-list to the result. That will finally produce a merge sorted list.
    Sublist : [] []
    Result  : [13 20 29 29 73 78 86 91]
    


The merge() method

The above algorithm leads directly to the following implementation.

// merge list B and list C, result stored in list A
  // Note that list B and list C are already sorted.
  public static void merge(int[] a, int[] b, int[] c)
  {
    int ai=0,bi=0,ci=0;  // index of list A,B and C

    while (ai<a.length)
    {
      if (bi<b.length && ci<c.length)  // both lists contains element
      {
        // append the smaller one to the result.
        a[ai++]=(b[bi]<c[ci])?b[bi++]:c[ci++];  
      }    
      else if (bi<b.length)    // only list B contains element
      {
        while (ai<a.length) a[ai++]=b[bi++];
      }
      else                    // only list C contains element
      {
        while (ai<a.length) a[ai++]=c[ci++];      
      }
    }
  }

Merge Sort - the first implementation

Once we have the merge() method ready, we can use the divide and conquer strategy to implements the mergeSort() method. The algorithm is :

  • Divide the list into two halves (two sublists)
  • Sort the two sublist with any sorting method on hand, that of course includes mergeSort() !
  • Merge the two sorted sublist


The problem size is reduced by half for each recursive call, until it reach a trivial problem, a list that contains only 1 element. Then we can stop the recursion there.

public static void mergeSort(int[] a)
  {
    if (a.length<=1) return;   // a list with only 1 element is already sorted
    int mid=a.length/2;        // divided into half
    int r=a.length%2;          // remainder

    // split into two lists
    int[] b=new int[mid];
    int[] c=new int[mid+r];

    for (int i=0;i<mid;i++)
    {
      b[i]=a[i];
      c[i]=a[mid+i];
    }
    
    c[mid+r-1]=a[a.length-1];  // handle the case when a.length is odd

    mergeSort(b);      // sort the sub-list
    mergeSort(c);      // sort the sub-list
    merge(a,b,c);      // merge the sorted lists
  }

Complete Source Code

/******************************************************************************
* File : SimpleMergeSort.java
* Author : http://java.macteki.com/
* Description :
*   A straight forward implementation of Merge Sort
* Tested with : JDK 1.6
******************************************************************************/

class SimpleMergeSort
{

  public static void printArray(int[] a)
  {
    String delim="";
    for (int i=0;i<a.length;i++)
    {
      System.out.print(delim+a[i]);
      delim=",";   // add delimiter after first element
    }
    System.out.println();
  }



  public static void mergeSort(int[] a)
  {
    if (a.length<=1) return;   // a list with only 1 element is already sorted
    int mid=a.length/2;        // divided into half
    int r=a.length%2;          // remainder

    // split into two lists
    int[] b=new int[mid];
    int[] c=new int[mid+r];

    for (int i=0;i<mid;i++)
    {
      b[i]=a[i];
      c[i]=a[mid+i];
    }
    
    c[mid+r-1]=a[a.length-1];  // handle the case when a.length is odd

    mergeSort(b);      // sort the sub-list
    mergeSort(c);      // sort the sub-list
    merge(a,b,c);      // merge the sorted lists
  }

  // merge list B and list C, result stored in list A
  public static void merge(int[] a, int[] b, int[] c)
  {
    int ai=0,bi=0,ci=0;  // index of list A,B and C

    while (ai<a.length)
    {
      if (bi<b.length && ci<c.length)  // both lists contains element
      {
        // append the smaller one to the result.
        a[ai++]=(b[bi]<c[ci])?b[bi++]:c[ci++];  
      }    
      else if (bi<b.length)    // only list B contains element
      {
        while (ai<a.length) a[ai++]=b[bi++];
      }
      else                    // only list C contains element
      {
        while (ai<a.length) a[ai++]=c[ci++];      
      }
    }
  }


  public static void main(String[] args) throws Exception
  {
    int[] a={86,29,29,20,91,78,73,13,98,85,90,19,98,66,12,41,36,33,19,94,27,49,23,97,94};

    mergeSort(a);    
    
    printArray(a);
  }
}

Advantages of Merge Sort

Merge Sort has several advantages :
  1. It is fast. It has n log(n) computation complexity.
  2. It is stable. Elements with equal value don't change their relative position during sorting. This is a desirable feature for multiple keys sorting. For example, sort the list of members by zip code first, then by name. In this case, members with same name should not change relative order because their zip code is already sorted.
  3. It is intuitive and the idea is easy to understand.
  4. The divide and conquer strategy is parallel in nature and we can easily implement a concurrent merge sort in multiprocessor environment.

Disadvantages

  1. merge sort typically requires additional storage space proportional to the length of the original array.
  2. The straight forward implementation repeatedly allocate memory, which is a performance killer. It is also a concurrency killer since memory allocation leads to heap contention problem. That is, multiple threads are competing to access the same heap.

Next

We will present a better version of merge sort which is more concurrency friendly. And a simple concurrent implementation will be shown.

No comments:

Post a Comment