Page 1 of 1

Creating a recursive descent parser, OOP style Rate Topic: -----

#1 ishkabible  Icon User is offline

  • spelling expret
  • member icon





Reputation: 1616
  • View blog
  • Posts: 5,707
  • Joined: 03-August 09

Posted 06 June 2011 - 11:59 AM

Object Orientated Recursive Decent Parsing

This tutorial will cover recursive decent parsing in C++ heavily using OO concepts. As an example we will be parsing and evaluating some simple expressions. This tutorial expects you have a basic understanding of OO, polymorphism, and dynamic memory allocation.

First off we need to define what our parser is going to do; we need to a construct a set of terminals and non-terminals to work with, in other words a “grammar.” Let’s look at the following grammar…

expr -> term ‘+’ term
expr -> term ‘-‘ term
exor -> term
term -> unary ‘*’ unary
term -> unary ‘/’ unary
term -> unary
unary -> ‘-‘ factor
unary -> ‘+’ factor
unary -> factor
factor -> '(' expr ')'
factor -> number

In this case a factor is either a number or an expression enclosed by parenthesis, a factor could be a symbol or string literal, as well. For the purpose of this tutorial we will just use numbers. Also we are using a fairly limited set of operators

Let’s get down to business. First we want to set things up a bit; we want to predefine our classes and create some building blocks.

Parser.h
#ifndef PARSER_H_INCLUDED
#define PARSER_H_INCLUDED

#include <istream>
#include <deque>
#include <exception>

//pre-declare are classes so they can be used in a non-discriminate order
class Production;
class Expr;
class Term;
class Unary;
class Number;
class Factor;
class ParseError;

/**
* the base class for all of our productions
*/
class Production {
public:
	virtual ~Production(); //dose nothing, if we don't need to override it we don't want to
	virtual double getValue()=0; //pure virtual, all children MUST implmnet this function
};

/**
* a simple class to handle parsing errors
*/
class ParseError : public std::exception {
	const char* what() const throw();
};


Parser.cpp
#include <istream>
#include <deque>
#include <locale>

#include "Parser.h"

/**
* static functions
*/

//protptype the functions so compiler wont complain
static void ignoreSpace(std::istream& in);
static char getChar(std::istream& in);

/**
* $1 - an input stream
* pre - none
* post - all spaces up to the next non space char are removed
*/
static void ignoreSpace(std::istream& in) {
	while(isspace(in.peek())) {
		in.get();
	}
}

/**
* $1 - an input stream
* pre - none
* post - all  preceding and following space is removed. 1 non-space char is removed
*/
static char getChar(std::istream& in) {
	ignoreSpace(in); //remove all preceding space
	char ret = in.get();
	ignoreSpace(in); //remove all following space
	return ret;
}

/**
* Production
*/

//dose nothing, if we don't need to override it we don't want to
Production::~Production() {}

///ParseError

const char* ParseError::what() const throw() {
	return "a parsing error occured";
}


Now that we have a basic setup, we are going to use the Production class as the base class for each one of our productions. Polymorphism with the Production class will come in shortly. Now the most basic production we have here is number. It’s a terminal production and can be implemented quite easily. We can take a look at that now.

parser.h
/**
* a class to parse a number from the input stream
*/
class Number : public Production {
private:
	double value;
public:
	Number(std::istream& in);
	///Override
	double getValue();
};


parser.cpp
/**
* Number
*/

Number::Number(std::istream& in) {
	ignoreSpace(in); //remove all preceding space
	in>>value;
	if(!in) {
		throw ParseError();
		return;
	}
	ignoreSpace(in); //remove all following space
}

///Override
double Number::getValue() {
	return value;
}



Now I can explain some of the setup in more detail. First, let’s take a look at the constructor; it is the parsing function for this production. This method of parsing will be used in all of our productions. The next thing to note is how simple it is to retrieve a value from the input stream: it’s just 1 line!! Now take a look at the if statement, this is to check to see if input was valid or if any other error occurred in reading the value from the input stream. If an error did occur then we throw a ParseError, which denotes an error occurred when parsing. The next thing to note is that we plan to use polymorphism in our input. In this way we can pass a stringstream or ifstream to our production and have uniform behavior without the fuss. Lastly take a look at the ignoreSpace calls, which is to remove any surrounding spaces from the token.

In the next function we implement the pure virtual method in Production: getValue. We do this by simply returning the value retrieved from the input stream.

The next production we need to tackle is the Factor production. It’s not the simplest production, but that’s ok, we can mange :).

parser.h
/**
* a class to parse a factor from the input stream
*/
class Factor : public Production {
private:
	Production* expr; //we need to use a Production pointer in order handle the dual behavoir of factors
public:
	Factor(std::istream& in);
	///Override
	~Factor();
	///Override
	double getValue();
};


parser.cpp
/**
* Factor
*/
Factor::Factor(std::istream& in) {
	ignoreSpace(in); //remove all preceding space
	if(in.peek() == '(') { //check to see if a paren was used
		in.get();
		expr = new Expr(in);
		ignoreSpace(in); //remove all following space
		if(in.peek() != ')') { //make sure the paren was matched
			throw ParseError();
		} else {
			in.get();
		}
	} else { //if there is no paren then we just have a number
		expr = new Number(in);
	}
}
///Override
Factor::~Factor() {
	delete expr;
}
///Override
double Factor::getValue() {
	return expr->getValue();
}


To begin with we have another call to ignoreSpace to remove the proceeding space. Next we check to see if the next value in the buffer is a left parenthesis. If it is then we take it out of the buffer and construct an Expr production and match the closing parenthesis. If there was not a parenthesis, then we simply construct Number. I told you polymorphism was just around the corner ;).

In this class we have a destructor. We simply delete the expression, and because the destructor of expr is virtual, the correct destructor will be called to delete it.

In our getValue function we call the virtual method getValue to either retrieve a value from an expression if we constructed an expression or retrieve a value from a number if we constructed a number.

We only have 3 more classes to go!

Next up is the Unary production, which is pretty easy.

parser.h
/**
* a class to parse unary operators and operands from an input stream
*/
class Unary : public Production {
private:
	int sign;
	Factor* value;
public:
	Unary(std::istream& in);
	///Override
	~Unary();
	///Override
	double getValue();
};


parser.cpp
/**
* Unary
*/

Unary::Unary(std::istream& in) {
	sign = 1; //initlize sign to 1
	ignoreSpace(in); //get rid of any preceding space
	while(in.peek() == '-' || in.peek() == '+') { //while we have an operator to parse
		if(getChar(in) == '-') { //if the operator actully has an effect
			sign = -sign;
		}
	}
	value = new Factor(in); //parse the factor following the unary operators
}
///Override
Unary::~Unary() {
	delete value;
}
///Override
double Unary::getValue() {
	return sign * value->getValue();
}


First we initialize sign to 1 to store the information regarding the sign of the operator. Next we take out the preceding space (do you need the term here). Then we have a loop that peeks in the buffer to see if the next value is an operator. So long as we still have an operator to parse, we take it out of the buffer, then change sign accordingly. After there are no more corresponding operators, we parse a factor out of the buffer.

Our destructor is simple, we just delete value and we are done!

The getValue method is simple as well; we take the value returned by value’s getValue method and times it by sign to get the correct value.

Just 2 more to go and they’re almost identical, too! The Term class is up next.

parser.h
/**
* a class to parse binary operators(* and /) and operands from the input stream
*/
class Term : public Production {
private:
	std::deque<Unary*> values;
	std::deque<char> ops;
public:
	Term(std::istream& in);
	///Override
	~Term();
	///Override
	double getValue();
};



parser.cpp
/**
* Term
*/

Term::Term(std::istream& in) {
	values.push_back(new Unary(in)); //construct the first value
	ignoreSpace(in); //ignore preceding space
	while(in.peek() == '*' || in.peek() == '/') {
		ops.push_back(getChar(in)); //push back the operator
		values.push_back(new Unary(in)); //push back the left operand
	}
}
///Override
Term::~Term() {
	for(unsigned int i=0;i<values.size();++i) {
		delete values[i];
	}
}
///Override
double Term::getValue() {
	double ret = values[0]->getValue(); //get the first value
	for(unsigned int i=1;i<values.size();++i) { //loop though the rest of the values
		if(ops[i-1] == '*') { //check to see which operator it is and preform the acoridng action
			ret *= values[i]->getValue();
		} else {
			ret /= values[i]->getValue();
		}
	}
	return ret;
}


The constructor here is almost identical to the unary constructor, except for 2 things. First off, before we take an operator out of the buffer we take a Unary production out. Then we start the loop to take the operators out. Secondly, after we take the operator out of the input stream, we store it and take out another Unary production. Both operators and Unary productions are stored in arrays such that their order is maintained.

In the destructor, we loop though each element of values and delete them as we go.

The getValue method is a bit more complicated, but still reasonably simple. To start, we store the first value in ret. Next we loop though the reaming values and perform the corresponding operation on ret. To find the correct operation, we take the index value i and subtract 1 from it to get the correct index in ops.

Where almost done, all we have left is the expr production. The nice thing about this one is that all the logic from Term is exactly the same. The reason we have separate classes to handle each production is because addition operators have a lower precedence than multiplication operators do. The only difference between the classes is that instead of using the * and / operators we use + and -.

parser.h
/**
* a class to parse binary operators(+ and -) and operands from the input stream, acts as the full expreison parser becuase + and - have the lowest precdence.
* this code works just like Term only it uses the + and - operators instead.
*/
class Expr : public Production {
private:
	std::deque<Term*> values;
	std::deque<char> ops;
public:
	Expr(std::istream& in);
	///Override
	~Expr();
	///Override
	double getValue();
};

#endif // PARSER_H_INCLUDED



note: this also contains the end of the #endif to close off the header guards in parser.h

parser.cpp
/**
* Expr
*/

Expr::Expr(std::istream& in) {
	ignoreSpace(in);
	values.push_back(new Term(in));
	while(in.peek() == '+' || in.peek() == '-') {
		ops.push_back(getChar(in));
		values.push_back(new Term(in));
	}
}
///Override
Expr::~Expr() {
	for(unsigned int i=0;i<values.size();++i) {
		delete values[i];
	}
}
///Override
double Expr::getValue() {
	double ret = values[0]->getValue();
	for(unsigned int i=1;i<values.size();++i) {
		if(ops[i-1] == '+') {
			ret += values[i]->getValue();
		} else {
			ret -= values[i]->getValue();
		}
	}
	return ret;
}


Also let’s look at some test code to see how it all works.

#include <iostream>
#include <sstream>
#include <exception>

#include "Parser.h"

int main() {
	try {
		std::stringstream ss(" (- 3 +  5 ) * - 4 "); //a simple expresion to test
		Expr expr(ss); //construct the parse tree
		std::cout<<expr.getValue()<<std::endl;
	} catch (std::exception& e) {
		std::cerr<<e.what()<<std::endl;
	}
	try {
		std::stringstream ss(" - 3 +  5 * - 4 "); //a simple expresion to test
		Expr expr(ss); //construct the parse tree
		std::cout<<expr.getValue()<<std::endl;
	} catch (std::exception& e) {
		std::cerr<<e.what()<<std::endl;
	}
	try {
		std::stringstream ss(" 3 +  5 * 4  "); //a simple expresion to test
		Expr expr(ss); //construct the parse tree
		std::cout<<expr.getValue()<<std::endl;
	} catch (std::exception& e) {
		std::cerr<<e.what()<<std::endl;
	}
	try {
		std::stringstream ss(" (- 3 +  5  * - 4 "); //a simple expresion to test with an error
		Expr expr(ss); //construct the parse tree
		std::cout<<expr.getValue()<<std::endl;
	} catch (std::exception& e) {
		std::cerr<<e.what()<<std::endl;
	}
	try {
		std::stringstream ss(" (- a +  5  * - 4 "); //a simple expresion to test with an error
		Expr expr(ss); //construct the parse tree
		std::cout<<expr.getValue()<<std::endl;
	} catch (std::exception& e) {
		std::cerr<<e.what()<<std::endl;
	}
    return 0;
}



this should show the correct output for the first 3 and report that there was an error in the last 2.

That’s all! I hope you enjoyed it :)

Is This A Good Question/Topic? 3
  • +

Page 1 of 1