A Calculator with Gregexes

Let's run through the standard parser combinator example and build a simple calculator to introduce the various facilities of gregexes.

For all the examples, we'll assume the imports and enclosing Parser.use{} statements are present:

import gregex.*
import static gregex.Parser.*
Parser.use{
  //put the code from each example here
}

Our first example shows how to parse a positive integer, preceded by spaces. We show sequencing using &, skipping the whitespace result, and modifying the results using the closure, changing the parsed string into a BigInteger in line 1.2 :

  def ws= ~/ */
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)} //line 1.2

  posIntP<< "  7 "
  assert _r == 7g && _p == 3

Next, we expand the example so we can parse a second expression, separated by addition. Lines 2.2 and 2.4 show how to use closures to define and use parameterized parsers. Line 2.5 shows using alternation to make the second expression optional. But there's a little problem:

  def ws= ~/ */
  def symbolP= {String s-> -ws& s} //line 2.2
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)}
  def addExpr= posIntP & -symbolP("+") & posIntP ^{it[0]+it[1]} //line 2.4
  def expr= posIntP | addExpr //line 2.5

  expr<< "  7 + 24 "
  assert _r == 7g && _p == 3

The second expression ' + 24 ' wasn't scanned. That's because backtracking isn't enabled by default, nor is end of input required. For our simple example, the best way to solve the problem is to order the options longest first. Besides fixing this in line 3.5 in the next example, we also add the facility to subtract the second number in line 3.5 :

  def ws= ~/ */
  def symbolP= {String s-> -ws& s}
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)}
  def addExpr= posIntP & (symbolP("+") | symbolP("-")) & posIntP ^{it[1]=='-'? it[0]-it[2]: it[0]+it[2]} //line 3.4
  def expr= addExpr | posIntP //line 3.5

  assert (expr<< " 7 + 24").result == 31g
  assert (expr<< " 7 -  4").result == 3g

We now want to cater for many operands in our calculator. You might think we can use parser recursion here, like in line 4.5, but we can't because of a simple fact about any parser combinators such as gregexes: although we can define parsers left-recursively, they won't run because they'll loop forever. We can, however, put in left-recursion detection if we also use packratting and parse harnessing, as in line 4.6. This is expensive in performance, though, so normally we'd only use it for debugging, or demonstrating our example:

  def ws= ~/ */
  def symbolP= {String s-> -ws& s}
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)}
  def expr //line 4.4
  expr= +{ expr & (symbolP("+") | symbolP("-")) & posIntP ^{it[1]=='-'? it[0]-it[2]: it[0]+it[2]} } //line 4.5
  expr= packrat( expr, true ) //line 4.6

  assert new ParseHarness()(expr, " 7 + 24 +2 - 7") in RecurseFailure

We can cater for many operands using Many-repetition in line 5.4. We also change the regex defining the whitespace to a string with gregex Many-repetition in line 5.1 for the demo:

  def ws= (" ")(Many) //line 5.1
  def symbolP= {String s-> -ws& s}
  def posIntP= -ws& ~/[0-9]+/ ^{new BigInteger(it)}
  def expr= posIntP & ((symbolP("+") | symbolP("-")) & posIntP)(Many) ^{it[1].inject(it[0]){f,i-> i[0]=='-'? f-i[1]: f+i[1]} } //line 5.4

  expr<< " 7 + 24 +2 - 7"
  assert _r == 26g

So where can we use parser recursion without worrying about left recursion? The classic case is when we want to parenthesize expressions. We predefine a variable in line 6.5, use it on the right side of a parse expression in line 6.6, and later assign a parse expression to it in line 6.8. When we use it in line 6.6, we must enclose it inside a closure as shown:

  def ws= (" ")(Many)
  def symbolP= {String s-> -ws& s}
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)}
  def expr //line 6.5
  def parenExpr= -symbolP("(") & { expr } & -symbolP(")") //line 6.6
  def atom= parenExpr | posIntP
  expr= atom & ((symbolP("+") | symbolP("-")) & atom)(Many) ^{it[1].inject(it[0]){f,i-> i[0]=='-'? f-i[1]: f+i[1]} } //line 6.8

  expr<< " 7 - (10 - 6) +  2"
  assert _r == 5g

  expr<< " 7 + (10 -( 7 + 5)) + 2"
  assert _r == 7g

Finally, we complete the calculator by catering for negates in line 7.7 and multiplication in line 7.8. In line 7.10, we also cater for trailing spaces, followed by end-of-file:

  def ws= (" ")(Many)
  def symbolP= {String s-> -ws& s}
  def posIntP= -ws& ~/[0-9]+/ ^ {new BigInteger(it)}
  def expr
  def parenExpr= +{ -symbolP("(") & expr & -symbolP(")") } //line 7.5
  def atom= parenExpr | posIntP
  def unaryExpr= symbolP("-")(Many) &atom ^ {it[0].size() %2 !=0? -it[1]: it[1]} //line 7.7
  def multExpr= unaryExpr & (-symbolP("*") & hide(unaryExpr))(Many) ^{it[1].inject(it[0]){f,i-> f*i} } //line 7.8
  expr= multExpr & ((symbolP("+") | symbolP("-")) & hide(multExpr))(Many) ^{it[1].inject(it[0]){f,i-> i[0]=='-'? f-i[1]: f+i[1]} } //line 7.9
  def exprT= profile("test04-calculator", expr & -ws &'' ) //line 7.10

  [ "3 * - 4 * 5": -60g,
    "7 + ( 8 * (9 - 2*3)) +  2 -1 * 7  ": 26g,
    " 7 + ( 8 * 3) +  2 -- -1 * 7  ": 26g,
  ].each{str, res->
    exprT<< str
    assert _r == res && _p == str.size()
  }

There's many utility parsers available, such as hide() in line 7.8 and profile() in line 7.10. The hide() parser prevents an expansion in the logging, and is useful when there's more than one occurence of a certain parser in a parser definition. In a long list of operators at different precedences, not suppressing such and expansion fills up the log at an exponential rate, so it's essential to use hide() when logging. The profile() parser displays the runtime of the enclosed parser whenever it's run.

Together with the main documentation, this example is a good start in exploring the power of Groovy's Gregexes.

Last edited Mar 17, 2011 at 11:40 PM by gavingrover, version 2

Comments

No comments yet.