I have chosen to divide part III of this Writing an Interpreter tutorial into four sub-sections as follows:
Part IIIA - this tutorial discusses the Basic grammar for our implementation of a Basic interpreter.
Part IIIB - is concerned with parsing the tokens created as part of tutorial II, into a Syntax Tree.
Part IIIC - is concerned with transposing the Syntax Tree(ST) into an Abstract Syntax Tree(AST).
Part IIID - is concerned with transposing the AST into pseudo-code statements suitable for interpreting.
I have tried to keep the theory behind the following four sections to an absolute minimum, as I have discovered that there is a huge void that lays between the theoretical understanding of compiler/interpreter design, and the actual practical implementation of a compiler/interpreter.
The Basic Grammar
Below, you will find a formal representation of the grammar we will implement. This grammar is not entirely practical, as I have not defined any part of the grammar which will give the programmer the ability to call external functions (that is to say functions that exist outside of the code submitted for interpretation). This has been purposefully omitted as it would almost double the complexity of the code and that defeats the object of these tutorials. I may at some later date extend the grammar to include external function calls, if there is enough interest.
Here then is our Basic grammar.
primary-exp: literal ( expr ) symbol postfix-expr: primary-expr postfix-expr ( expr ) unary-expr: postfix-expr unary-operator unary-expr unary-operator: one of ‘*’ ‘&’ ‘+’ ‘-’ ‘!’ multiply-expr: unary-expr multiply-expr ‘*’ unary-expr multiply-expr ‘/’ unary-expr multiply-expr ‘%’ unary-expr additive-expr: multiply-expr additive-expr ‘+’ multiply-expr additive-expr ‘-’ multiply-expr relational-expr additive-expr relational-expr ‘<’ additive-expr relational-expr ‘>’ additive-expr relational-expr ‘<=’ additive-expr relational-expr ‘>=’ additive-expr equality-expr: relational-expr equality-expr ‘==’ relational-expr equality-expr ‘<>’ relational-expr and-expr: equality-expr and-expr ‘BITAND’ equality-expr excl-or-expr: and-expr excl-or-expr ‘BITXOR’ and-expr incl-or-expr: excl-or-expr incl-or-expr ‘BITOR’ excl-or-expr log-and-expr: incl-or-expr log-and-expr ‘AND’ incl-or-expr log-or-expr: log-and-expr log-or-expr ‘OR’ log-and-expr assig-expr: log-or-expr log-or-expr ‘=’ assign-expr expr: assign-expr expr , assign-expr else_block: ‘ELSE’ statement_blockopt if_stmt: ‘IF’ exprn ‘THEN’ stmt_block[size="1"]opt[/size] else_block[size="1"]opt[/size] ‘END’ ‘IF’ while_stmt: ‘WHILE’ expr ‘DO’ stmt_block[size="1"]opt[/size] ‘END’ ‘WHILE’ do_stmt: ‘DO’ stmt_block ‘WHILE’ expr ‘END’ ‘DO’ for_stmt: ‘FOR’ symbol ‘=’ expr ‘TO’ expr ‘STEP’ expr stmt_block[size="1"]opt[/size] ‘END’ ‘FOR’ call_stmt: ‘CALL’ symbol ‘USING’ expr print_stmt: ‘PRINT’ expr input_stmt: ‘INPUT’ expr let_stmt: ‘LET’[size="1"]opt[/size] expr param_list: symbol param_list ‘,’ symbol using_list: ‘USING’ param_list stmt: if_stmt while_stmt do_stmt for_stmt let_stmt call_stmt print_stmt input_stmt stmt_block: stmt stmt[size="1"]opt[/size] function_stmt: ‘FUNCTION’ symbol using_list[size="1"]opt[/size] stmt_block[size="1"]opt[/size] ‘END’ ‘FUNCTION’ procedure_stmt: ‘PROCEDURE’ symbol using_list[size="1"]opt[/size] stmt_block[size="1"]opt[/size]‘END’ ‘PROCEDURE’ main_stmt: ‘MAIN’ stmt_block program_block: function_stmt procedure_stmt main_stmt program_block : ‘PROGRAM’ program_block ‘END’ ‘PROGRAM’
I have not performed an in-depth check of this grammar, but seems reasonable enough ... 'the proof of the pudding will be in the eating' as we Brits say.
The grammar demystified
Unless you are used to looking at grammar specifications, all of the above will seem a little mind-blowing, so we will look at it in more detail.
All of our Basic programs will start with a PROGRAM statement. This must be the first non-remark line of any program submitted for interpretation. Notice in the example below, I have included text after the word PROGRAM. Any tokens after the PROGRAM token up to the end of the line are ignored.
At the end of all programs, the statement END PROGRAM must be present. So a valid program would be
PROGRAM This program does absolutely nothing REM function blocks, procedure blocks and the main block are placed in here! END PROGRAM
As I have stated in the previous tutorial, the case does not matter. Internally, the interperter transposes all lower case letters to upper case. I have chosen to use upper case to denote keywords for this tutorial.
Within the PROGRAM ... END PROGRAM statements, we can place function blocks, procedure blocks and one main block. So expanding our example above, I am going to introduce a function block, a procedure block and a main block. Notice the main block must be the last block specified between the PROGRAM ... END PROGRAM statements.
PROGRAM This program does absolutely nothing FUNCTION name$ USING p1$, p2$ REM our function body statements live here END FUNCTION PROCEDURE count% USING p1%, p2% REM our procedure body statements live here END PROCEDURE MAIN REM our main program statements live here END PROGRAM
The Function statement
A function is defined between the FUNCTION ... END FUNCTION statements. A function may be passed zero or more arguments. The name of the function determines the type of value the function returns. Name$ is a string variable, so the function returns a string. If we had said name%, then the function would of couse have returned an integer and finally name would have returned a real number.
To return a value in a function, one simply says (LET) name$ = value. Within the function, you can assign any number of values to the function name, it's the last assignment before the function exits that is returned. You can even use the function name variable in statements within the function itself. In the floowing example, we are saying that the return value of the function name$ the value of p1$, followed by a space (the value assigned to name$ in the first statement of the function followed by the value of p2$. So if p1$ contained the string "Hello" and p2$ contained the string "World", name$ would return the string "Hello World". Variables and dim'ed variables defined within a function block are local to that block. They cannot be referred to outside the function's scope.
FUNCTION name$ USING p1$, p2$ name$ = " " name$ = p1$ + name$ + p2$ END FUNCTION
The Procedure statement
A procedure is the same as a function, but does not return a value and therefore an attempt to assign a value to the name of the procedure will cause an error to be reported. Variables and dim'ed variables defined within a procedure block are local to that block. They cannot be referred to outside the procedure's scope.
The main statement
The main statement defines the main statement block in the same way that 'int main()' does in C++. The first non-remark statement after the MAIN is the first line of code to execute when the interpreter is told to run the program. It is the only block-type statement that does not end with an END ... statement.
Varaibles and dim'ed statements defined between the MAIN END PROGRAM statements are global, and may be referred to within any block in the program.
Calling Functions and Procedures
Functions and procedures are invoked by means of the 'CALL' statement. The following example calls the function name$ described above.
CALL name$ using "Hello", "World" PRINT name$
Notice that unlike C++, the return value of the call statement is not assigned to a variable. The name of the function itself is also the name of the variable in which the result is found.
Our Basic interpreter will permit variable arrays of upto three dimensions. To specify a single dimension we say.
This creates space within the interpreter run-time system for 10 strings. A$(0) is the first element of the array, and a$(9) is the last element of the array. (Basic programs normally start at one rather than zero, but as this is a C++ tutorial, and C++ programmers will be trying the interpreter I thought it best to start at zero like C++ does).
You can specify any number of dimensioned statements on the same line as follows.
DIM a$(10), b$(5), c$(20)
Variables are defined in an interpreter dictionary as they are encountered for the first time during the interpreters execution of statements. Once they have been entered in the dictionary they remain there. Each function/procedure called, will create it's own local dictionary which is destroyed when the function/procedure is completed. If there is an overlap of global and local names, then the local name effectively hides the global name.
The remainder of the Basic statements speak for themselves and are modeled on C-type statements to make it easier to understand of C++ programmers.
Writing An Interpreter - Part IIIB -->