Page 1 of 1

Writing an Interpreter - Part IIIA The basic grammar Rate Topic: ***** 1 Votes

#1 Martyn.Rae   User is offline

  • The programming dinosaur
  • member icon

Reputation: 555
  • View blog
  • Posts: 1,436
  • Joined: 22-August 09

Posted 09 February 2010 - 05:28 AM

Writing an Interpreter - Part IIIA

I have chosen to divide part III of this Writing an Interpreter tutorial into four sub-sections as follows:

Part IIIA - this tutorial discusses the Basic grammar for our implementation of a Basic interpreter.
Part IIIB - is concerned with parsing the tokens created as part of tutorial II, into a Syntax Tree.
Part IIIC - is concerned with transposing the Syntax Tree(ST) into an Abstract Syntax Tree(AST).
Part IIID - is concerned with transposing the AST into pseudo-code statements suitable for interpreting.

I have tried to keep the theory behind the following four sections to an absolute minimum, as I have discovered that there is a huge void that lays between the theoretical understanding of compiler/interpreter design, and the actual practical implementation of a compiler/interpreter.

The Basic Grammar

Below, you will find a formal representation of the grammar we will implement. This grammar is not entirely practical, as I have not defined any part of the grammar which will give the programmer the ability to call external functions (that is to say functions that exist outside of the code submitted for interpretation). This has been purposefully omitted as it would almost double the complexity of the code and that defeats the object of these tutorials. I may at some later date extend the grammar to include external function calls, if there is enough interest.

Here then is our Basic grammar.

primary-exp:	literal
		( expr )

postfix-expr:	primary-expr
		postfix-expr ( expr )

unary-expr:	postfix-expr
		unary-operator unary-expr

unary-operator: one of ‘*’ ‘&’ ‘+’ ‘-’ ‘!’

multiply-expr:	unary-expr
		multiply-expr ‘*’ unary-expr
		multiply-expr ‘/’ unary-expr
		multiply-expr ‘%’ unary-expr

additive-expr:	multiply-expr
		additive-expr ‘+’ multiply-expr
		additive-expr ‘-’ multiply-expr

relational-expr	additive-expr
		relational-expr ‘<’ additive-expr
		relational-expr ‘>’ additive-expr
		relational-expr ‘<=’ additive-expr
		relational-expr ‘>=’ additive-expr

equality-expr:	relational-expr
		equality-expr ‘==’ relational-expr
		equality-expr ‘<>’ relational-expr

and-expr:	equality-expr
		and-expr ‘BITAND’ equality-expr

excl-or-expr:   and-expr
		excl-or-expr ‘BITXOR’ and-expr

incl-or-expr:   excl-or-expr
		incl-or-expr ‘BITOR’ excl-or-expr

log-and-expr:	incl-or-expr
		log-and-expr ‘AND’ incl-or-expr

log-or-expr:	log-and-expr
		log-or-expr ‘OR’ log-and-expr

assig-expr:     log-or-expr
		log-or-expr ‘=’ assign-expr

expr:		assign-expr
		expr , assign-expr

else_block:	‘ELSE’ statement_blockopt

if_stmt:	‘IF’ exprn ‘THEN’ stmt_block[size="1"]opt[/size] else_block[size="1"]opt[/size] ‘END’ ‘IF’

while_stmt:	‘WHILE’ expr ‘DO’ stmt_block[size="1"]opt[/size] ‘END’ ‘WHILE’

do_stmt:	‘DO’ stmt_block ‘WHILE’ expr ‘END’ ‘DO’

for_stmt:	‘FOR’ symbol ‘=’ expr ‘TO’ expr ‘STEP’ expr stmt_block[size="1"]opt[/size] ‘END’ ‘FOR’

call_stmt:	‘CALL’ symbol ‘USING’ expr

print_stmt:	‘PRINT’ expr

input_stmt:	‘INPUT’ expr

let_stmt:	‘LET’[size="1"]opt[/size] expr

param_list:	symbol 
		param_list ‘,’ symbol

using_list:	‘USING’ param_list

stmt: 		if_stmt

stmt_block: 	stmt stmt[size="1"]opt[/size]

function_stmt:	‘FUNCTION’ symbol using_list[size="1"]opt[/size] stmt_block[size="1"]opt[/size] ‘END’ ‘FUNCTION’

procedure_stmt:	‘PROCEDURE’ symbol using_list[size="1"]opt[/size] stmt_block[size="1"]opt[/size]‘END’ ‘PROCEDURE’

main_stmt:	‘MAIN’ stmt_block 

program_block:	function_stmt

program_block : ‘PROGRAM’ program_block ‘END’ ‘PROGRAM’

I have not performed an in-depth check of this grammar, but seems reasonable enough ... 'the proof of the pudding will be in the eating' as we Brits say.

The grammar demystified

Unless you are used to looking at grammar specifications, all of the above will seem a little mind-blowing, so we will look at it in more detail.

All of our Basic programs will start with a PROGRAM statement. This must be the first non-remark line of any program submitted for interpretation. Notice in the example below, I have included text after the word PROGRAM. Any tokens after the PROGRAM token up to the end of the line are ignored.

At the end of all programs, the statement END PROGRAM must be present. So a valid program would be

PROGRAM This program does absolutely nothing
REM function blocks, procedure blocks and the main block are placed in here!

As I have stated in the previous tutorial, the case does not matter. Internally, the interperter transposes all lower case letters to upper case. I have chosen to use upper case to denote keywords for this tutorial.

Within the PROGRAM ... END PROGRAM statements, we can place function blocks, procedure blocks and one main block. So expanding our example above, I am going to introduce a function block, a procedure block and a main block. Notice the main block must be the last block specified between the PROGRAM ... END PROGRAM statements.

PROGRAM This program does absolutely nothing

FUNCTION name$ USING p1$, p2$
REM our function body statements live here

PROCEDURE count% USING p1%, p2%
REM our procedure body statements live here

REM our main program statements live here

The Function statement

A function is defined between the FUNCTION ... END FUNCTION statements. A function may be passed zero or more arguments. The name of the function determines the type of value the function returns. Name$ is a string variable, so the function returns a string. If we had said name%, then the function would of couse have returned an integer and finally name would have returned a real number.

To return a value in a function, one simply says (LET) name$ = value. Within the function, you can assign any number of values to the function name, it's the last assignment before the function exits that is returned. You can even use the function name variable in statements within the function itself. In the floowing example, we are saying that the return value of the function name$ the value of p1$, followed by a space (the value assigned to name$ in the first statement of the function followed by the value of p2$. So if p1$ contained the string "Hello" and p2$ contained the string "World", name$ would return the string "Hello World". Variables and dim'ed variables defined within a function block are local to that block. They cannot be referred to outside the function's scope.

FUNCTION name$ USING p1$, p2$
    name$ = " "
    name$ = p1$ + name$ + p2$

The Procedure statement

A procedure is the same as a function, but does not return a value and therefore an attempt to assign a value to the name of the procedure will cause an error to be reported. Variables and dim'ed variables defined within a procedure block are local to that block. They cannot be referred to outside the procedure's scope.

The main statement

The main statement defines the main statement block in the same way that 'int main()' does in C++. The first non-remark statement after the MAIN is the first line of code to execute when the interpreter is told to run the program. It is the only block-type statement that does not end with an END ... statement.
Varaibles and dim'ed statements defined between the MAIN END PROGRAM statements are global, and may be referred to within any block in the program.

Calling Functions and Procedures

Functions and procedures are invoked by means of the 'CALL' statement. The following example calls the function name$ described above.

  CALL name$ using "Hello", "World"
  PRINT name$

Notice that unlike C++, the return value of the call statement is not assigned to a variable. The name of the function itself is also the name of the variable in which the result is found.

Dimensioned Statements

Our Basic interpreter will permit variable arrays of upto three dimensions. To specify a single dimension we say.

  DIM a$(10)

This creates space within the interpreter run-time system for 10 strings. A$(0) is the first element of the array, and a$(9) is the last element of the array. (Basic programs normally start at one rather than zero, but as this is a C++ tutorial, and C++ programmers will be trying the interpreter I thought it best to start at zero like C++ does).

You can specify any number of dimensioned statements on the same line as follows.

  DIM a$(10), b$(5), c$(20)


Variables are defined in an interpreter dictionary as they are encountered for the first time during the interpreters execution of statements. Once they have been entered in the dictionary they remain there. Each function/procedure called, will create it's own local dictionary which is destroyed when the function/procedure is completed. If there is an overlap of global and local names, then the local name effectively hides the global name.

The remainder of the Basic statements speak for themselves and are modeled on C-type statements to make it easier to understand of C++ programmers.

Writing An Interpreter - Part IIIB -->

Is This A Good Question/Topic? 1
  • +

Page 1 of 1