Monday, April 29, 2013

Scala Parser Combinators - Part 3

In a previous post on Scala parser combinators, I demonstrated how to use Scala's parser library to write a simple arithmetic parser. This example resulted in a parser that parsed an arithmetic expression and returned the results of the operation. In this tutorial, I'm going to expand on this parser by generating a parse tree, instead of simply evaluating the expression. For the expression "2 * 8 + 6", the parser will output the following parse tree.

      *
    /   \
   2     +
       /   \
      8     6


Notice that every node in our tree is either an expression consisting of an operator and a left and right side, or a single digit. For example, "8+6" is the addition operator (+) with a left side of 8 and a right side of 6. 8 and 6 are the other type of tree node (i.e. number). We'll start by defining these two types of nodes in our parse tree with a case class.

class Expr
case class Number(value: Int) extends Expr
case class Operator(symbol: String, left: Expr, right: Expr) extends Expr


Next, we'll define the rules of our parser. Note that if you read my previous posts on Scala parser combinators, I've simply added the "term" and "factor" class members.

class ExprParser extends RegexParsers {
  val number = "[1-9][0-9]*".r

  def expr: Parser[Int] = (number ^^ { _.toInt }) ~ opt(operator ~ expr) ^^ {
    case a ~ None => a
    case a ~ Some("*" ~ b) => a * b
    case a ~ Some("/" ~ b) => a / b
    case a ~ Some("+" ~ b) => a + b
    case a ~ Some("-" ~ b) => a - b
  }

  def operator: Parser[String] = "+" | "-" | "*" | "/"

  def term: Parser[Expr] = (factor ~ opt(operator ~ term)) ^^ {
    case a ~ None => a
    case a ~ Some(b ~ c) => Operator(b, a, c)
  }
  
  def factor: Parser[Expr] = number ^^ (n => Number(n.toInt))
}

Let's test the parser and evaluate the results.

  def main(args: Array[String]) {
    val parser = new ExprParser
    val result = parser.parseAll(parser.term, "9*8+2")

    println(result.get)
  }

The output from this test is the following parse tree.

Operator(*,Number(2),Operator(+,Number(8),Number(6)))


In order to look at this response and recognize a parse tree, notice that a tree of the form "8+6" is printed as "Operator(+, Number(8), Number(6))". This is because the "+" is the root of the node, so it's listed first. Thus, we have a subtree that looks like this.


         +
       /   \
      8     6



After applying this to the rest of our parse tree response of "Operator(*, Number(2), ...)", our resulting parse tree looks like this.

      *
    /   \
   2     +
       /   \
      8     6


No comments:

Post a Comment