Friday, October 5, 2012

Interpreting an ANTLR Grammar Using a Syntax-Directed Interpreter

There are various types of interpreters we can implement in ANTLR. For very simple DSLs, we can use a syntax-directed interpreter to interpret our language. Note that this only works for simple DSLs. If you're designing a general programming language, you'll need a much more sophisticated interpreter, and I'll cover that in later posts. But, to start with, let's look at a syntax-directed interpreter for a simple DSL that we'll design.

Let's create a DSL that will process customer records. The purpose of a DSL is to create an expressive language, much like a very easy to use API. For this example, our customer DSL will look like this.

Create Customer
With
FirstName "Joe"
LastName "Somebody"
DateOfBirth  "1974-05-04"


Create Customer
With
FirstName "John"
LastName "Doe"
DateOfBirth "1980-01-19"


We've defined how we want our Customer processing DSL to look. In ANTLR, we can create the following grammar to define our language.

grammar CustomerDSL;

options {
  language = Java;
}

rule
    :
        customer*
    ;
   
customer
    :
        'Create' 'Customer'
        'With'
        firstName
        lastName
        dateOfBirth
    ;
   
firstName
    :
        'FirstName' '"' NAME '"'
    ;

lastName
    :
        'LastName' '"' NAME '"'
    ;

dateOfBirth
    :
        'DateOfBirth' '"' date '"'
    ;
   
date
    :
        DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT
    ;
   
DIGIT
    :
        '0'..'9'
    ;

NAME
    :
        ('A'..'Z') ('a..z' | 'A..Z')*
    ;

WS
    :
        (' ' | '\t' | '\n' | '\r') {$channel = HIDDEN;}
    ;


Next, we need to interpret our DSL script and do something with it. For a simple DSL like this, we can use a sytax-directed interpreter by adding Java instructions at the end of any grammar rule to process the DSL input. After adding the syntax-directed interpreter, our grammar looks like this.


grammar CustomerDSL;

options {
  language = Java;
}

@header {
    import java.util.ArrayList;
    import java.util.Date;
    import java.text.SimpleDateFormat;
    import java.text.DateFormat;
    import java.text.ParseException;
}

@members {
    ArrayList<Customer> customers = new ArrayList<Customer>();
    ArrayList<Customer> getCustomers() {
        return customers;
    }
}

rule
    :
        customer*
    ;
  
customer
    :
        'Create' 'Customer'
        'With'
        f=firstName
        l=lastName
        d=dateOfBirth
        {
            DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
            try {
                Date dateOfBirth = (Date)(formatter.parse($d.dateOfBirthValue));
                customers.add(new Customer($f.firstNameValue, $l.lastNameValue, dateOfBirth));
            } catch (ParseException e) {
                System.err.println(e.getMessage());
            }
        }
    ;

firstName returns [String firstNameValue]
    :
        'FirstName' '"' f=NAME '"' {$firstNameValue = $f.text;}
    ;

lastName returns [String lastNameValue]
    :
        'LastName' '"' l=NAME '"' {$lastNameValue = $l.text;}
    ;

dateOfBirth returns [String dateOfBirthValue]
    :
        'DateOfBirth' '"' d=date '"' {$dateOfBirthValue = $d.text;}
    ;
  
date
    :
        DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT
    ;
  
DIGIT
    :
        '0'..'9'
    ;

NAME
    :
        ('A'..'Z') ('a'..'z' | 'A'..'Z')*
    ;

WS
    :
        (' ' | '\t' | '\n' | '\r') {$channel = HIDDEN;}
    ;


Within the @header {...} section, we include any libraries we need. Within the @members {...} section, we declare any global variables, etc. Throughout the grammar, wherever we need to run any Java code, we just wrap it in {...} and ANTLR knows that it's custom action code to execute at that point while parsing the grammar.

We're missing 2 things to tie this all together. The first is to create a Customer class so that we can create POJOs for each customer record. Note that we could actually do whatever we want when we interpret our DSL script. For this example, we'll just create a POJO for each customer defined in our script. From there, you could marshal the records to send to some service, insert the records into a database, etc. Here's a simple Customer class.

import java.util.Date;

public class Customer {
    private String firstName;
    private String lastName;
    private Date dateOfBirth;
   
    public Customer() {
    }
   
    public Customer(String firstName, String lastName, Date dateOfBirth) {
        this.firstName = firstName;
        this.lastName = lastName;
        this.dateOfBirth = dateOfBirth;
    }
   
    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public Date getDateOfBirth() {
        return dateOfBirth;
    }

    public void setDateOfBirth(Date dateOfBirth) {
        this.dateOfBirth = dateOfBirth;
    }
   
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("First Name : " + firstName + ", ");
        sb.append("Last Name : " + lastName + ", ");
        sb.append("Date of Birth : " + dateOfBirth.toString());

        return sb.toString();
    }
}


The next step is to create a simple processor/driver program that will read a file containing customer records, run our lexer over our input, pass the results to our parser, and then do something with our Customer instances that are created by our interpreter. Here's a processor that will process our DSL script and create a POJO for each customer that's defined.

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;

import org.antlr.runtime.ANTLRReaderStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.TokenStream;


public class CustomerDSLProcessor {
    public static void main(String args[]) {
        String filePath = "scripts\\Customers.dsl";
        Reader reader = null;
      
        try {
            reader = new FileReader(filePath);
        } catch (FileNotFoundException e) {
            System.err.println("Error: Unable to find source code file at " + filePath);
            return;
        }

        try {
            CharStream charStream = new ANTLRReaderStream(reader);
            CustomerDSLLexer lexer = new CustomerDSLLexer(charStream);
            TokenStream tokenStream = new CommonTokenStream(lexer);
            CustomerDSLParser parser = new CustomerDSLParser(tokenStream);

            parser.rule();
            ArrayList customers = parser.getCustomers();
            for (Customer customer : customers) {
                System.out.println(customer.toString());
            }
        } catch (IOException e) {
            System.err.println(e.getMessage());
        } catch (RecognitionException e) {
            e.printStackTrace();
        }
    }
}


Syntax-directed interpreters are very simple and easy to write, but they're also not very practical for any large or complex languages you design. They work just fine for simple external DSLs where you have the ability to interpret the grammar as you're parsing it. However, there are times you'll have complex languages you're designing where the context of grammar rules will make a difference on how things get interpreted and this type of interpreter just doesn't have the power to process it. For that, we need to output an AST and walk or visit the tree to process the language. So, in a later post, I'll talk about more sophisticated interpreters and how we can implement them in Java.

No comments:

Post a Comment