Problems With the Syntax Tree
The problem that we have with the syntax tree, is that it contains the structural representation of all nodes down to the leaf. For most statements, this is not a problem as there are but a few steps in the grammar. If we look at expressions on the other hand, we find a very different story. Let us take a very simple example the statement IF 1 + 2 = 3 THEN. What parsing the syntax tree with have given us is the following.
This provides a detailed syntax tree for the expression, but it doesn't do a great deal if we want to get one step closer to producing pseudo-code.
Converting a Syntax Tree to an Abstract Syntax Tree
In order that we may progress this conversion, all statements in the syntax tree are examined, and statements that contain expressions are reworked into simpler statements which consist of simply operator, lhs and rhs.
So an expression held in a statement whose syntax tree is for '1 + 2 * 3' will be transposed into an abstract syntax tree as shown below:
Now, I know what you are thinking. Why didn't we find a way to do that as part of producing the syntax tree? Well, the answer is that we can. Remember, the syntax tree is used to prove that code conforms to the language being compiled/interpreted. We cannot bypass the process of checking the code against the grammar (I shudder to think what we could end up with ... but it would not be pretty). What we can do, is to produce the Abstract Syntax Tree immediately after recording the Syntax Tree and then throwing away the Syntax Tree. Let's say we have a function that parses expression, and produces a Syntax Tree structure. Immediately upon return from that function, we will have a pointer to that part of the syntax tree. That is passed into our abstract_syntax_tree routine which builds the AST and returns a pointer to it. That in turn is recorded with the statement, and the syntax tree pointer is deleted.
In fact, we won't even stop there, as the next section will discuss, how we take statements and expressions, check their conformity to the grammar, produce an intermediate syntax and abstract syntax tree, and end up with pseudo-code effectively all in the same parse of the source tokens.
Expression Parsing and Function Calls
Part of the power behind any high-level programming language is the ability to write expressions without having to worry too much about intermediate results. If this were not possible, statements such as
IF LEFT$(input$, 3) + "..." + RIGHT$(input$, 3) <> "ABC..XYZ" THEN ... END IF
would have to be written as
temp$ = LEFT$(input$, 3) temp$ = temp$ + "..." temp$ = RIGHT$(input$, 3) IF temp$ <> "ABC" THEN ... END IF temp$ = ""
(It is worthy of note at this point, I actually came across a programmer who actually wrote C code just like that because he thought it was more efficient ROFL!).
The problem with the compiler/interpreter handling intermediate results is not actually the issue here. The issue is the function calls, or more importantly the handling of function call parameters. If you study the Basic grammar I have provided, you will notice that the part that handles comma separated expressions is right at the end of the presedence food chain:
expr: assign-expr expr , assign-expr
Function calls themselves are handled by the
postfix-expr: primary-expr postfix-expr ( expr )
aspect of the grammar.
Now I have stated earlier in this section that an expression such as '1 + 2 * 3' produces an AST of the type shown in diagram 2 above. For a function call (say LEFT$(input$, 3) we would end up with an AST segment as shown in diagram 3 below.
If we assume that when we are reading these diagrams, we follow the right-hand side of the structure first, we can see from diagram 2 that '2 * 3' has to be performed first and that 1 + (the result of 2 * 3) comes afterwards, not 1 + 2, then (the result of 1 + 2) * 3. For parameters to function calls however, this may be an issue because sometimes functions require their parameters read left-to-right whilst other function calls are read right-to-left.
This will not be an issue for the Basic interpreter as we have restricted ourselves to internal function calls, so whilst it does not matter whether we choose left to right or right to left, I have made an executive decision that our parameters are read right-to-left.
(In the next section dealing with pseudo-code generation, parameter stacking is then achieved such that right-hand parameters are stacked before left hand parameters. This means that for LEFT$(input$, 3) the value of input$ is pushed onto the stack after the 3 meaning that input$ is at the top of the stack, and 3 is at the bottom.)
Writing An Interpreter - Part IIID -->