Wednesday, June 15, 2011

Arithmetic Evaluator (BNF tokenizer)

Previous Work

This is the third article about arithmetic evaluator. Readers are suggested to read the whole series :

To summarize the previous work, let's consider the following arithmetic expression :
31*(2+4)+16

To evaluate the expression, we need three steps :
  1. Tokenize the expression
  2. Convert the expression into postfix form
  3. Evaluate the postfix expression

Tokenization

The second and the third step have been discussed in the preceding articles. We are going to write a tokenizer in this article.

Informally, we may define a token as an atomic piece of information. This definition highly depends on the context. For the arithmetic expression, a token can be a number, an operator, or a bracket sign. We may manually "tokenize" the expression by adding spaces to separate every token. Hence the tokenized expression become :
31 * ( 2 + 4 ) + 16

This is of course not very desirable because we are forcing the user to add spaces in the expression. As a user, it is reasonable to assume the following are all identical expressions :
31 * ( 2 + 4 ) + 16
31*(2+4)+16
31 * (2+4)         + 16
Hence we must implement a tokenizer such that no matter which of the above is input by the user, repeatedly calling the nextToken() function will return the token in the following order :
"31"   "*"   "("   "2"   "+"   "4"   ")"   "+"   "16"

BNF

BNF stands for "Backus Normal Form", which is named after its inventor Backus Naur. It is used for defining the syntax of a programming language.

For example, consider the following simple expression :
3+4
The above expression is the sum of two terms. We may make a simple definition for expression :
<expression> :=  <term> + <term>

The symbol ":=" means "is defined as".

The above definition has an obvious problems. It only works for expression with exactly two terms. In the real world, an expression can contain one term only. Hence we may redefine the expression as :
<expression> :=  <term>  
OR  
<expression> :=  <term> + <term>
In BNF notation, we use a vertical stroke to represent "OR". Hence we should rewrite the above as :
<expression> :=  <term>  |  <term> + <term>

With a recursive definition, we may extend the above definition such that it defines expression with indefinite number of terms :
<expression> :=  <term>  |  <term> + <expression>

Up to now we didn't define what is a term. Assume we will only handle positive integer in our expressions. We may define term as :

<term> := <positive_integer>
<positive_integer> := <digit> | <digit> <positive_integer>
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

We just defined term as a positive integer, and defined positive integer as "either a single digit" or "a single digit followed by a positive integer". This is another recursive definition.

We can extends our definition of expression to include subtraction of terms.

<expression> :=  <term>  |  <term> + <expression> | <term> - <expression>

Or in a more elegant way :
<expression> :=  <term>  |  <term> <additive_operator> <expression> 
<additive_operator> := + | -

We will not handle multiplication and division at the moment. You will see that it is easy to extend the BNF later.

Let's repeat the whole definition of arithmetic expression as follows :
<expression> :=  <term>  |  <term> <additive_operator> <expression> 
<additive_operator> := + | -
<term> := <positive_integer>
<positive_integer> := <digit> | <digit> <positive_integer>
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

BNF Parser

Now we may write a parser for the above BNF. It is very easy to implement such a parser using a top down approach. Let's write the skeleton of a simple parser for the first rule :

We may write :
/********************************************************
<expression> :=  <term>  |  <term> <additive_operator> <expression> 
********************************************************/
  void readExpression()
  {
    readTerm();
    if (isAdditiveOperator(peek())
    {
      readAdditiveOperator();
      readExpression();
    }
  }

In order to parse an expression, we must have a function to read the expression character by character. We will name this function as next(). We also need a function to "look" at the next available character, without consuming it, this function is named peek().

For example, if the expression is "3+5", repeatedly calling next() will return "3","+","5", EOL. (End of Line mark)

Whereas repeatedly calling peek() will return "3","3","3","3","3",...

Let's write the skeleton for every BNF rule :
/********************************************************
<expression> :=  <term>  |  <term> <additive_operator> <expression> 
********************************************************/
  void readExpression()
  {
    readTerm();
    if (isAdditiveOperator(peek())
    {
      readAdditiveOperator();
      readExpression();
    }
  }

/********************************************************
<additive_operator> := + | -
********************************************************/
  void readAdditiveOperator()
  {
    token=next();    // read next character (must be a plus or minus sign)
    outputToken();  
  }

/********************************************************
<term> := <positive_integer>
********************************************************/
  void readTerm()
  {
    readPositiveInteger();
  }

/********************************************************
<positive_integer> := <digit> | <digit> <positive_integer>
********************************************************/
  void readPositiveInteger()
  {
    if (isDigit(peek())   // read all digits
    {
      token += next();       // append digit to token   
      readPositiveInteger();
    }
    outputToken();       // no more digits, so output the token
                         // and then reset the token to empty String
  }

/********************************************************
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
********************************************************/
  boolean isDigit(String ch)
  {
    String digits="0123456789";
    return (digits.indexOf(ch)>=0);  
  }
  


Multiplication and Division

It is easy to extend our BNF definition to include multiplication. We have defined expression as :
<expression> := <term> | <term> <additive_operator> <expression>

And we have defined term as positive_integer. However, we know that a term can actually be a product one or more factors. Hence we may redefine as follows :
<factor> := <number>
<term> := <factor> | <factor> * <term>
We may extend the multiplicative operator to include division as well :
<factor> := <number>
<term> := <factor> | <factor> <multiplicative_operator> <term>
<multiplicative_operator> := * | /

Factors are not always simple numbers. Some factors are rather complex, for example, consider the following expression :
(3+4)*5*7 + 2*3 + 1

There are three terms in the expression, the first term is (3+4)*5*7, the second term is 2*3 and the third term is 1. The first term contains three factors, (3+4), 5 and 7. Hence a factor can be a bracket expression as well as a simple number.

We now extend the definition of factor such that either it is a number, or it is a bracket expression such as (3+4).
<factor> := <number> | ( <expression> )

And of course we need to define what is a <number>. The full definition is as follows :

<expression> := <term> | <term> <additive_operator> <expression>
<additive_operator> := + | -
<term> := <factor> | <factor> <multiplicative_operator> <term>
<multiplicative_operator> := * | /
<factor> := <number> | <signed_number> | ( <expression> )
<signed_number> := <additive_operator> <number>
<number> := <positive_integer> | <positive_integer> . <positive_integer>
<positive_integer> := <digit> | <digit> <positive_integer>
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9


Putting it altogether

We can now write an arithmetic evaluator to evaluate the value of a rather complex expression.
We will use the following as an example in our sample code.
String expr="((-3.14*(2+(-4)))-7)/3+7-2*2*2+9";
You may try to calculate the expression in an Excel spread sheet and you will see the answer is 7.76. We are going to compute this answer with our own evaluator.

We will need four Java source files to achieve the purpose :
  1. CharReader.java
  2. It scans the input String character by character, providing the function next() and peek()
  3. ArithmeticBNF.java
  4. It parses the BNF grammer of arithmetic expression and tokenize it. Spaces are added between each token so that :
    (3.14+41)*2/5 would become "( 3.14 + 41 ) * 2 / 5"
    
  5. PostfixConverter.java
  6. It converts the tokenized expression into postfix form. Hence the expression would become :
    3.14 41 + 2 * 5 /
    
  7. PostfixEvaluator.java
  8. It evaluates the postfix expression. The result of the above postfix would be 17.656

PostfixEvaluator.java and PostfixConverter.java can be found in the following links :

Arithmetic Evaluator(Postfix Evaluator)
Arithmetic Evaluator(Infix to Postfix)

The other two files is listed at the end of this article.

To test, you will need all four Java files in the same folder.
javac *.java
java ArithmeticBNF
The output would be :
((-3.14*(2+(-4)))-7)/3+7-2*2*2+9=7.76

If you wish to compute the other expression, find the following line in main() and change it as needed.
String expr="((-3.14*(2+(-4)))-7)/3+7-2*2*2+9";

/******************************************************************************
* File : CharReader.java
* Author : http://java.macteki.com/
* Description :
*   Scan an input String character by character.
* Tested with : JDK 1.6
******************************************************************************/

class CharReader
{
  private String text;
  private char[] charArray;
  private int charPointer=0;
  public static final String EOL="\n";  // End of Line mark

  public CharReader(String s)
  {
    this.text=s;
    this.charArray = s.toCharArray();
    this.charPointer = 0;
  }

  // read next character, the character is "consumed" and the pointer is advanced
  public String next()
  {
    if (charPointer>=charArray.length) return EOL;
    return ""+this.charArray[charPointer++];
  }

  // "look" at the next character, the character is not "consumed".
  // hence the pointer is not advanced.
  public String peek()
  {
    if (charPointer>=charArray.length) return EOL;
    return ""+this.charArray[charPointer];
  }

  // testing 
  public static void main(String[] args) throws Exception
  {
    String s="apple orange";
    CharReader charReader=new CharReader(s);
    String ch="";
    while ((ch=charReader.next())!=EOL)
    {
      System.out.print("["+ch+"] ");
    }
  }
}

/******************************************************************************
* File : ArithmeticBNF.java
* Author : http://java.macteki.com/
* Description :
*   Parse and tokenize an arithmetic expression according to the BNF syntax.
*   the expression is defined in main() :
*       String expr="((-3.14*(2+(-4)))-7)/3+7-2*2*2+9";
*   The tokenized expression will be :
*      "( ( -3.14 * ( 2 + ( -4 ) ) ) -7 ) / 3 + 7 - 2 * 2 * 2 + 9"
* Tested with : JDK 1.6
******************************************************************************/

/****************************************************************************** 
BNF Definition :
<expression> := <term> | <term> <additive_operator> <expression>
<additive_operator> := + | -
<term> := <factor> | <factor> <multiplicative_operator> <term>
<multiplicative_operator> := * | /
<factor> := <number> | <signed_number> | ( <expression> )
<signed_number> := <additive_operator> <number>
<number> := <positive_integer> | <positive_integer> . <positive_integer>
<positive_integer> := <digit> | <digit> <positive_integer>
<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
******************************************************************************/

class ArithmeticBNF
{
  CharReader reader;
  String token="";
  String tokenList="";

  public ArithmeticBNF(String expr)
  {
    reader = new CharReader(expr);
  }

  void outputToken()
  {
    System.out.println(token);
    tokenList+=token+" ";
    token="";
    skipBlanks();
  }

  void skipBlanks()
  {
    while (peek().equals(" "))
    {
      next();
    }
  }

  String next()
  {
    return reader.next();
  }
  
  String peek()
  {
    return reader.peek();
  }

  // <expression> := <term> | <term> <additive_operator> <expression>
  void expression()
  {
    term();
    if (isAdditiveOperator(peek()))
    {
      additiveOperator();
      expression();
    }
  }


  // <additive_operator> := + | -
  void additiveOperator()
  {
    if (isAdditiveOperator(peek()))
    {
      token=next();
      outputToken();
    }
    else
    {
      throw new RuntimeException("Syntax error : expecting operator");
    }
  }

  // <multiplicative_operator> := * | /
  void multiplicativeOperator()
  {
    if (isMultiplicativeOperator(peek()))
    {
      token=next();
      outputToken();
    }
    else
    {
      throw new RuntimeException("Syntax error : expecting operator");
    }
  }


  // <number> := <positive_integer> | <positive_integer> . <positive_integer>
  void number()
  {
    positiveInteger();
    if (peek().equals("."))
    {
      decimalPoint();
      positiveInteger();
    }
  }

  void decimalPoint()
  {
    token+=next();
  }

  void sign()
  {
    token+=next();
  }

  // <positive_integer> := <digit> | <digit> <positive_integer>
  void positiveInteger()
  {
    if (isDigit(peek()))
    {
      token+=next();      
      positiveInteger();
    }    
  }

  // <factor> := <number> | <signed_number> | ( <expression> )
  // <signed_number> := <additive_operator> <number>
  void factor()
  {
    if (peek().equals("(")) 
    { 
      bracketExpression();  
    }
    else if (isDigit(peek()))
    {
      number();
      outputToken();
    }
    else if (isAdditiveOperator(peek()))
    {
      sign();
      number();
      outputToken();
    }
    else
      throw new RuntimeException("Unexpected character "+peek());
  }

  void bracketExpression()
  {
    leftBracket();

    expression();

    rightBracket();
  }

  void leftBracket()
  {
    if (peek().equals("("))
    {
      token=next();
      outputToken();
    }
    else
    {
      throw new RuntimeException("Expecting left bracket");
    }
  }

  void rightBracket()
  {
    if (peek().equals(")"))
    {
      token=next();
      outputToken();
    }
    else
    {
      throw new RuntimeException("Expecting right bracket");
    }
  }
  

  // <term> := <factor> | <factor> <multiplicative_operator> <term>
  void term()
  {
    factor();

    if (isMultiplicativeOperator(peek()))
    {
      multiplicativeOperator();
      term();
    }
  }

  // <digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
  boolean isDigit(String ch)
  {
    String digits="0123456789";
    return (digits.indexOf(ch)>=0);
  }

  // <additive_operator> := + | -
  boolean isAdditiveOperator(String ch)
  {
    String operators="+-";
    return (operators.indexOf(ch)>=0);
  }

  // <multiplicative_operator> := * | /
  boolean isMultiplicativeOperator(String ch)
  {
    String operators="*/";
    return (operators.indexOf(ch)>=0);
  }

  
  public static void main(String[] args) throws Exception
  {
    String expr="((-3.14*(2+(-4)))-7)/3+7-2*2*2+9";
    ArithmeticBNF bnf=new ArithmeticBNF(expr);

    bnf.expression();
    String postfix=PostfixConverter.convert(bnf.tokenList); 
    double value=PostfixEvaluator.evaluate(postfix);
    
    System.out.println(expr+"="+value);
  }
}

No comments:

Post a Comment