Sunday, May 12, 2013

Java Comment Highlighter

Introduction

I am going to demonstrate how to write a simple comment highlighter in Java, which will highlight all Java style comments. The program will read a text file (a.txt) containing Java source code and convert it to a.html, with all comments highlighted. We will use HtmlWriter.java as a starting point. You are recommended to read it first.

Replacement with Regular Expression

In HtmlWriter.java, we used the following to escape all special characters of the HTML output.

To highlight all comments, all we need to do is adding one more replace statement.

For example, the following statement will bold all comments in the source file.

The following is a typical output:
/***********************
*  A Simple Testing
************************/

class A
{
  public static void main(String[] args)
  {
    int b=10;    
    for (int i=0;i<b;i++)  // just a test
      System.out.print("Hello, \"world\"\n");
  }
}


/* End of File */


Painting the Comment in a Different Color

A little modification of the above replace statement would paint the comment in a different color.

The output will now become:
/***********************
*  A Simple Testing
************************/

class A
{
  public static void main(String[] args)
  {
    int b=10;    // just a test
    for (int i=0;i<b;i++)
      System.out.print("Hello, \"world\"\n");
  }
}


/* End of File */


Explaining the Regular Expression

First we will need to match a start tag /*

Note that * is a meta character in regular expression. If we really want to match a star, we need to escape it.

That completed the beginning and the ending of our regular expression

Regular ExpressionExplanation
/\*(.|[\r\n])*?\*/ Match start of comment /*
/\*(.|[\r\n])*?\*/ Match end of comment */


Matching Anything

Between the begin tag and the end tag, anything is a valid comment.

Normally, to match anything in regular expression, we may use .*
The dot means "any characters", and the star means "any repetitions of the previous group".

However, in Java, the matching of regular expression is using single line mode by default. Hence the dot actually means "any character except linefeed".

Therefore we need to enhance the definition of anything :
(.|[\r\n])*
That actually means any repetitions (including zero repetition) of any characters (including linefeed).

Non-Greedy Match

In Java, the star operation is greedy by default. That is, it will try to match as much as possible.

In our previous example, if we are using greedy match, then the whole file would become a comment because there is a begin tag at the beginning of the file, and there is an end tag at the end of the file.

The problem can be solved by using non-greedy match. Just add a question mark after the star will make the star non-greedy. That completed the explanation of the regular expression.
Regular ExpressionExplanation
/\*(.|[\r\n])*?\*/ Match start of comment /*
/\*(.|[\r\n])*?\*/ Match end of comment */
/\*(.|[\r\n])*?\*/ Any non-greedy repetitions of any characters


Finalizing the Regular Expression

1. We need to add single line comment support. //.*
2. We need to escape the stroke character inside a Java String, hence we have :

String comment="(/\\*(.|[\\r\\n])*?\\*/|//.*)";


Source Code


Note: You will need Reader.java and Writer.java