## Monday, April 29, 2013

### Scala Parser Combinators - Part 3

In a previous post on Scala parser combinators, I demonstrated how to use Scala's parser library to write a simple arithmetic parser. This example resulted in a parser that parsed an arithmetic expression and returned the results of the operation. In this tutorial, I'm going to expand on this parser by generating a parse tree, instead of simply evaluating the expression. For the expression "2 * 8 + 6", the parser will output the following parse tree.

```      *
/   \
2     +
/   \
8     6
```

Notice that every node in our tree is either an expression consisting of an operator and a left and right side, or a single digit. For example, "8+6" is the addition operator (+) with a left side of 8 and a right side of 6. 8 and 6 are the other type of tree node (i.e. number). We'll start by defining these two types of nodes in our parse tree with a case class.

```class Expr
case class Number(value: Int) extends Expr
case class Operator(symbol: String, left: Expr, right: Expr) extends Expr
```

Next, we'll define the rules of our parser. Note that if you read my previous posts on Scala parser combinators, I've simply added the "term" and "factor" class members.

```class ExprParser extends RegexParsers {
val number = "[1-9][0-9]*".r

def expr: Parser[Int] = (number ^^ { _.toInt }) ~ opt(operator ~ expr) ^^ {
case a ~ None => a
case a ~ Some("*" ~ b) => a * b
case a ~ Some("/" ~ b) => a / b
case a ~ Some("+" ~ b) => a + b
case a ~ Some("-" ~ b) => a - b
}

def operator: Parser[String] = "+" | "-" | "*" | "/"

def term: Parser[Expr] = (factor ~ opt(operator ~ term)) ^^ {
case a ~ None => a
case a ~ Some(b ~ c) => Operator(b, a, c)
}

def factor: Parser[Expr] = number ^^ (n => Number(n.toInt))
}
```

Let's test the parser and evaluate the results.

```  def main(args: Array[String]) {
val parser = new ExprParser
val result = parser.parseAll(parser.term, "9*8+2")

println(result.get)
}
```

The output from this test is the following parse tree.

```Operator(*,Number(2),Operator(+,Number(8),Number(6)))
```

In order to look at this response and recognize a parse tree, notice that a tree of the form "8+6" is printed as "Operator(+, Number(8), Number(6))". This is because the "+" is the root of the node, so it's listed first. Thus, we have a subtree that looks like this.

```
+
/   \
8     6

```

After applying this to the rest of our parse tree response of "Operator(*, Number(2), ...)", our resulting parse tree looks like this.

```      *
/   \
2     +
/   \
8     6
```