Thursday, October 25, 2012

Writing Basic Language Constructs in an ANTLR Grammar

When designing a new general purpose programming language, or anything beyond a simple DSL, you'll encounter the need to write grammar rules for basic language constructs such as branch operations and loops. Here are a few grammar snippets that I re-use just about every time I design a DSL in ANTLR. Of course, the syntax is up to you to define how you want, but this helps as a general template for frequently occurring language constructs.

Credit is due entirely to Scott Stanchfield. I'd recommend checking out his videos as they are excellent tutorials on writing ANTLR grammars. Most of these constructs are taken directly from his tutorials. I usually find myself copying and pasting these basic grammar rules and then modifying the syntax as I need for the language I'm designing.

Expressions

term
    :   
        IDENT
        | '(' expression ')'
        | INTEGER
    ;
   
negation
    :    'not'* term
    ;
   
unary
    :    ('+' | '-')* negation
    ;
   
mult
    :    unary (('*' | '/' | 'mod') unary)*
    ;
   
add
    :    mult (('+' | '-') mult)*
    ;
   
relation
    :    add (('<' | '<=' | '>' | '>=' | '=' | '!=') add)*
    ;
   
expression
    :    relation (('and' | 'or') relation)*
    ;



Assignment Statements


assignmentStatement
    :    IDENT ':=' expression ';'
    ;



If-Else Statements

ifStatement
    :
        'if' '(' expression ')' statement+
        ('else if' '(' expression ')' statement+)*
        ('else' statement+)*
        'endif'
    ;



Loops

exitStatement
    :
        'exit' 'when' expression ';'
    ;

loopStatement
    :
        'loop'
        (statement|exitStatement)*
        'endloop'
    ;
   
whileStatement
    :
        'while' '(' expression
        (statement|exitStatement)*
        'endloop'
    ;


Fragments

fragment LETTER
    :
        ('a'..'z' | 'A'..'Z')
    ;
   
fragment DIGIT
    :
        '0'..'9'
    ;
   
STRING_LITERAL
    :
        '"' ~('\r' | '\n' | '"')* '"'
    ;
   
INTEGER
    :    ('1'..'9')DIGIT*
    ;

IDENT: LETTER (LETTER | DIGIT)*;

WS: (' ' | '\t' | '\r' | '\n' | '\f')+ {$channel = HIDDEN;};

COMMENT
    :    '//' .* ('\r' | '\n') {$channel = HIDDEN;};
   
MULTILINE_COMMENT
    :    '/*' .* '*/' {$channel = HIDDEN;};



No comments:

Post a Comment