Sunday, April 10, 2011

Sorting in Java - Part 3

This is part 2 in the series. It is suggested to read the whole series from the beginning.


Sorting by phases


First let's define the term "h-sorted list". An h-sorted list is a list of numbers which have the following properties.
  • If we start at any point, highlight every h-th number in the sequence, the highlight list would look sorted.
Consider the following list of numbers.
[86,29,29,20,91,78,73,13]

Let's start at position 0 and look at every 3-rd element.
[86,29,29,20,91,78,73,13]

In order to fulfill the 3-sorted criteria, we sort the highlighted elements now.

[20,29,29,73,91,78,86,13]

The criteria requires that we can start at any point, so we start at position 1 and highlight every 3-rd element.
[20,29,29,73,91,78,86,13]

Sort them :
[20,13,29,73,29,78,86,91]

The next step is highlighting every 3-rd element starting at position 2. Since the highlight list is already sorted, nothing need to be done.
[20,13,29,73,29,78,86,91]

At this point, we have highlight every element once, so we could stop here.
We have just created a 3-sorted list. That means we can start at any point, highlight every 3-rd elements, and the highlight list will look sorted.

2-sorted list

A 2-sorted list can be created in a similar way. Start at position 0 and highlight every 2nd element.
[20,29,29,73,91,78,86,13]

Sort the highlighted elements.
[20,29,29,73,86,78,91,13]

We must also start at position 1 and highlight every 2nd element.
[20,29,29,73,86,78,91,13]

Sorting the highlight element, and we now have a 2-sorted list.
[20,13,29,29,86,73,91,78]

1-sorted list

A 1-sorted list is just a sorted list, that means no matter which point we start, we can highlight every single element, and the highlight list would look sorted.

So, we just highlight every element.
[20,13,29,29,86,73,91,78]

Sort them, this is a 1-sorted list, and this is the desired result
[13,20,29,29,73,78,86,91]


In the sample above, we create h-sorted lists for h={3,2,1}. The sequence 3,2,1 is the h-sequence. We can use any h-sequence as long as it ends with 1. Of course, it must start with a number less than the length of the unsorted list.

Creating h-sorted list is very simple, we may use a modified version of Insertion Sort, which was described in the previous article.

The idea is that we can start at any element, LOOK BACKWARDS at every h-th element, and those elements should look sorted.

// create h-sorted list using insertion sort
  public static void insertionSort(int[] a,int h)
  { 
    for (int i=0;i<a.length;i++)   // we can start at any point in the list
    {
      int element=a[i];     // get the element
      int j=i;              // j : the scanning pointer
      for (j=i;j>=h;j-=h)   // scan backwards, look at every h-th element
      {
        if (element<a[j-h]) // not the correct position yet
        {
            a[j]=a[j-h];    // shift to the right to make room for element
        }
        else break;     // correct position found, break the scanning loop
      }
      a[j] = element;    // insert element in the correct position
    }
  }

Note that when h=1, it is just the normal insertion sort. You may look at the insertionSort() method in the previous article side by side with this one.

Why do we generate 3-sorted list and 2-sorted list before generating the 1-sorted list ? It is because the performance of Insertion Sort is very much depends on the initial ordering of the list. A nearly sorted list performs much better than a randomly order list. Generating 3-sorted and 2-sorted list will improve the initial ordering, with relatively little effort. It is because the scanning process is much faster. Instead of scanning every single element, we just need to scan every 3rd elements.

Shell Sort


Shell sort is name after its inventor, Donald Shell, who published the algorithm in 1959. The idea behind Shell Sort is we can significantly improve the performance of Insert Sort if we can find some ways to improve the ordering of the list. Intuitively, generating h-sorted list is a effective way to improve the ordering.

Shell sort typically uses a decreasing h-sequence which ends with 1. One such sequence is {16,8,4,2,1}. That means it will gradually improve the ordering of the list by first generating the 16-sorted list, then the 8-sorted list, then the 4-sorted list, then the 2-sorted list, and finally the 1-sorted list, which is the desired result.

However, computer scientists found that it is not a particularly good sequence in terms of sorting efficiency. A better sequence typically in use is :
1031612713, 217378076, 45806244, 9651787, 2034035, 428481, 90358, 19001, 4025, 836, 182, 34, 9,1

This sequence is derived from the Fibonacci sequence :

For n = 1, 1, 2, 3, 5, 8, 13, 21,...(Fibonacci sequence)
let g=1.618 (golden ratio)
compute h=n2g  for every n
So h = 1,1,9,34,182,836,... 
Removing the first 1 yields our h-sequence (in reverse order)

For example, for a initial list of 50 elements, we will first generate the 34-sorted list, then the 9-sorted list and finally the 1-sorted list.


Source code

/******************************************************************************
* File : ShellSort.java
* Author : http://java.macteki.com/
* Description :
*   Demonstration of Shell Sort
* Tested with : JDK 1.6
******************************************************************************/

class ShellSort
{

  public static void printArray(int[] a)
  {
    String delim="";
    for (int i=0;i<a.length;i++)
    {
      System.out.print(delim+a[i]);
      delim=",";   // add delimiter after first element
    }
    System.out.println();
  }


  // Fibonacci seq (n) :1,1,2,3,5,8,13,21,34,55,89,144,233,377
  // pow(n, 2*golden_ratio) : 1,1,9,34,182,836,... (this is the h-sequence in reverse order)
  static int[] hTable={
    1031612713, 217378076, 45806244, 9651787, 2034035, 428481, 
    90358, 19001, 4025, 836, 182, 34, 9,1 };

  public static void shellSort(int[] a)
  {
    int n=a.length;
    int k=0;
    for (k=0;k<hTable.length;k++)
    {
      int gap=hTable[k];
      if (gap < n) insertionSort(a,gap);
    }
  }

  public static void insertionSort(int[] a,int h)
  {
    for (int i=0;i<a.length;i++) // we can start at any position in the list
    {
      int element=a[i];     // get the element
      int j=i;              // j : the scanning pointer
      for (j=i;j>=h;j-=h)   // scan backwards, looking at every h-th element
      {
        if (element<a[j-h]) // not the correct position yet
        {
            a[j]=a[j-h];    // shift to the right to make room for element
        }
        else break;     // correct position found, break the scanning loop
      }
      a[j] = element;    // insert element in the correct position
    }
  }

  public static void main(String[] args) throws Exception
  {
    int[] a={86,29,29,20,91,78,73,13,98,85,90,19,98,66,12,41,36,33,19,94,27,49,23,97,94};

    shellSort(a);    
    
    printArray(a);
  }

}

Performance

Although Shell Sort is just a modified version of Insertion Sort, its idea is not as trivial. Does it worth the effort to complicate the algorithm ? The answer is "yes" for most of the time. Because it is much faster than the ordinary insertion sort in many cases. To give a rough idea, in one of my test, Insertion Sort toke 22.8 minutes to sort a million random integers, whereas ShellSort just toke 0.5 second on the same set of input. That is a impressive speed up and therefore ShellSort is a worthy modification of Insertion Sort.

No comments:

Post a Comment