Page 1 of 1

C++ BEGINNER PARSER CLASS TUTORIAL C++ BEGINNER PARSER CLASS TUTORIAL Rate Topic: -----

#1 Elcric  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 102
  • View blog
  • Posts: 453
  • Joined: 02-May 09

Posted 18 June 2009 - 05:23 PM

C++ BEGINNER PARSER CLASS TUTORIAL



WHAT YOU WILL LEARN IN THIS TUTORIAL

1. You will learn beginner parser vocabulary.

2. You will learn how to parse an expression.


• I. ASSUMPTIONS

I assume you have never programmed a parser.
I assume you have a C++ compiler.

• II. INTRODUCTION

I love to solve math problems. Many years ago I could solve them in my head, as time went by I had to use paper and pencil. When computers came along computers just did not feel right. Computers would not let me use the logic and math symbols I loved. My equations had no elegance no beauty they did not sing to me. I dreamed of the freedom of my paper and pencil but by now I could not live without the speed of the computer. Therefore, my favorite project became the parser. I have high hopes for the parser. I have not found the right match of keyboard keys and parser classes to fulfill my dreams but I have not given up working on it.

The field of C++ parser programs is an untapped computer science goldmine. Cleverly programmed parsers have tremendous potential in all fields of science. For example, using a chemical engineering C++ parser class, chemists could type chemical engineering equations, containing all of the symbols of their trade, which would model nature and be solved faster than they could be typed. Each field of science, with its own C++ parser class, would be relieved of the tedious computation and programming side of their work and have more time for creative endeavors. C++ does not provide an expression parser class. However, as you will see, they are very easy to program.

• III. BEGINNER PARSER VOCABULARY

What is expression parsing? Expression parsing works according to the rules of algebra. Expressions are composed of numbers; the operators +, -, /, *, ^, %, =, parentheses; and variables. The ^ operator represents exponentiation. All variables are non case sensitive single letters.

What are the rules of precedence? Assume the following precedence for each operator: + - (unary); ^; * / %; + -; =. Operators of equal precedence evaluate from left to right.

What is a token? In order to evaluate expressions, you need to be able to break the expression into its components. Each component of an expression is called a token. Each token represents an indivisible unit of the expression.

• IV. PARSING AN Expression

If we assume expressions can only use +, -, *, / and parentheses, all expressions can be defined with the following rules:

Expression > term [ + term] [ - term]

Term > factor [ * factor] [ / factor]

Factor > variable, number, or (expression)

The square brackets designate an optional element and the > means produces. In fact the rules are usually called production rules of the expression. Therefore, you could say “Term produces factor times factor divided by factor” for the definition of term.

The expression 12 + 7 / D has two terms: 12, and 7/D. The second term contains two factors: 7, and D. These factors consist of one number and one variable.

The expression 15 * (8 + D) has two factors: 15, and (8 + D). The factors consist on one number and one parenthesized expression. The parenthesized expression contains two terms: one number and one variable.

This process forms the basis for a parser, which is a set of mutually recursive functions that work in a chainlike fashion and implement the production rules. At each appropriate step, the parser performs the specified operations in the algebraically correct sequence.

• V. PROGRAM ANALYSIS

This tutorial will be limited to numeric expressions such as (2+3)*3.

To keep the beginner concepts easy to understand, the attached parser class will parse and evaluate floating-point expressions of type double that consist only of constant values. Please copy and paste the attached program into a file your compiler can read. Then build, compile, and debug the program. Following along, back and forth, between the tutorial and your copy of the program will help you understand the parser functionality.

//************************************
// C++ BEGINNER PARSER CLASS TUTORIAL
//************************************
#include <iostream>
#include <cstdlib>
#include <cctype>
#include <cstring>
using namespace std;

enum types { DELIMITER = 1, VARIABLE, NUMBER};	//DEL = 1; VAR = 2; NUM = 3.

class analyzer
{
	private:
	  char *pExpresion;		// Points to the expression.
	  char token[80];		// Holds current token.
	  char tokenType;		// Holds token's type.

	  void recursive1(double &culmination);
	  void recursive2(double &culmination);
	  void recursive3(double &culmination);
	  void recursive4(double &culmination);
	  void recursive5(double &culmination);
	  void bomb(double &culmination);
	  void nextToken();
	  void serror(int error);
	  int isdelim(char c);

	public:
	  analyzer();					   // analyzer constructor.
	  double sniff(char *exp);			// analyzer entry point.
	  ~analyzer();						// analyzer destructor.
};

//************************************
// 13 analyzer constructor.
//************************************
analyzer::analyzer()
{
	cout << endl << endl << "  13 Constructor called." << endl << endl;

	pExpresion = NULL;
  
}

//************************************
// 14 analyzer entry point.
//************************************
double analyzer::sniff(char *exp)
{

	cout << endl << endl << "  14 analyzer entry point called." << endl << endl;

	double culmination;

	pExpresion = exp;

	nextToken();
	if(!*token)
	{
	serror(2);						// No expression present.
	return 0.0;
	}
	recursive1(culmination);
	if(*token) serror(0);			// Last token must be null.
	return culmination;
}

//************************************
// 4 Add or subtract two terms.
//************************************
void analyzer::recursive1(double &culmination)
{

	cout << endl << endl << "  4 Add or subtract two terms called." << endl << endl;

	register char beer;
	double bucket;

	recursive2(culmination);
	while((beer = *token) == '+' || beer == '-')
	{
		nextToken();
		recursive2(bucket);
			switch(beer)
		{
		  case '-':
			culmination = culmination - bucket;
			break;
		  case '+':
			culmination = culmination + bucket;
			break;
		}
	}
}

//************************************
// 5 Multiply or divide two factors.
//************************************
void analyzer::recursive2(double &culmination)
{

	cout << endl << endl << "  5 Multiply or divide two factors called." << endl << endl;

	register char beer;
	double bucket;

	recursive3(culmination);
	while((beer = *token) == '*' || beer == '/' || beer == '%')
	{
		nextToken();
		recursive3(bucket);
		switch(beer)
		{
		  case '*':
			culmination = culmination * bucket;
			break;
		  case '/':
			culmination = culmination / bucket;
			break;
		  case '%':
			culmination = (int) culmination % (int) bucket;
			break;
		}
	}
}

//************************************
// 6 Process an exponent.
//************************************
void analyzer::recursive3(double &culmination)
{

	cout << endl << endl << "  6 Process an exponent called." << endl << endl;


	double bucket, ex;
	register int t;

	recursive4(culmination);
	if(*token == '^')
	{
		nextToken();
		recursive3(bucket);
		ex = culmination;
		if(bucket == 0.0)
		{
			culmination = 1.0;
			return;
		}
		for(t=(int)bucket-1; t>0; --t) culmination = culmination * (double)ex;
	}
}

//************************************
// 7 Evaluate a unary + or -.
//************************************
void analyzer::recursive4(double &culmination)
{

	cout << endl << endl << "  7 Evaluate a unary + or - called." << endl << endl;

	register char  beer;

	beer = 0;
	if((tokenType == DELIMITER) && *token == '+' || *token == '-')
	{
		beer = *token;
		nextToken();
	}
	recursive5(culmination);
	if(beer == '-') culmination = -culmination;
}

//************************************
// 8 Process a parenthesized expression.
//************************************
void analyzer::recursive5(double &culmination)
{

	cout << endl << endl << "  8 Process a parenthesized expression called." << endl << endl;


	if((*token == '('))
	{
		nextToken();
		recursive1(culmination);
		if(*token != ')')
			serror(1);
		nextToken();
	}
	else bomb(culmination);
}

//************************************
// 9 Get the value of a number.
//************************************
void analyzer::bomb(double &culmination)
{

	cout << endl << endl << "  9 Get the value of a number called." << endl << endl;

  switch(tokenType)
  {
	case NUMBER:
	  culmination = atof(token);
	  nextToken();
	  return;
	default:
	  serror(0);
  }
}

//************************************
// 11 Display a syntax error.
//************************************
void analyzer::serror(int error)
{

	cout << endl << endl << "  11 Display a syntax error called." << endl << endl;

	static char *e[]= 
	{
		"Syntax Error",
		"Unbalanced Parentheses",
		"No expression Present"
	};

	cout << e[error] << endl;
}

//************************************
// 10 Obtain the next token.
//************************************
void analyzer::nextToken()
{

	cout << endl << endl << "  10 Obtain the next token called." << endl << endl;

	register char *bucket;

	tokenType = 0;
	bucket = token;
	*bucket = '\0';

	if(!*pExpresion) return; // At end of expression.

	while(isspace(*pExpresion)) ++pExpresion; // Skip over white space.

	if(strchr("+-*/%^=()", *pExpresion))
	{
		tokenType = DELIMITER;
		// Advance to next char.
		*bucket++ = *pExpresion++;
	}
	else if(isalpha(*pExpresion))
	{
		while(!isdelim(*pExpresion)) *bucket++ = *pExpresion++;
		tokenType = VARIABLE;
	}
	else if(isdigit(*pExpresion))
	{
		while(!isdelim(*pExpresion)) *bucket++ = *pExpresion++;
		tokenType = NUMBER;
	}

	*bucket = '\0';
}

//************************************
// 12 Return true if c is a delimiter.
//************************************
int analyzer::isdelim(char c)
{

	cout << endl << endl << "  12 Return true if c is a delimiter called." << endl << endl;

	if(strchr(" +-/*%^=()", c) || c == 9 || c == '\r' || c == 0)
		return 1;
	return 0;
}

//************************************
// 15 analyzer destructor.
//************************************
analyzer::~analyzer()
{
	cout << endl << endl << "  15 Destructor called." << endl << endl;
	  
}

//************************************
// Main function
//************************************
int main(int argc, char* argv[])
{
  char expstr[80];
  
  analyzer equation; // Instantiates equation as an object of type analyzer.

  for(;;)	// Creates an infinite loop.
  {
	cout << endl << endl << endl;
	cout << "  To stop the program" << endl << endl;
	cout << "  enter a period then press return." << endl << endl;
	cout << "  To run the program" << endl << endl;
	cout << "  enter an algebraic express; for example, (12+5)*6 " << endl << endl;
	cout << "  then press return ==> ";

	cin.getline(expstr, 79);

	if(*expstr == '.')
	{
		equation.~analyzer();

		break;		// Infinite loop exited by typing a period.
	}
	
	cout << endl << endl << "  Answer is: " << equation.sniff(expstr) << "\n\n";

	cout << endl << endl << endl;
	cout << "  ";
	system("PAUSE");

	system("CLS");

  };

	cout << endl << endl << endl;
	cout << "  ";
	system("PAUSE");

	return 0;
}

//************************************





The program has a lot of embedded prints to create a console printed audit trail showing when each member function is executed. This audit trail will scroll; therefore, you will have to scroll up to read them all. This approach of having your program print out to the screen when each member function is executed will help you tremendously in learning C++ classes.

The attached sample program consists of the following fifteen member functions.

1. char *pExpresion; private member function points to the first character in the expression string. The parser evaluates expressions that are contained in null-terminated standard ASCII strings pointed to by pExpresion. As the parser executes, it works its way through the string until the null-terminator is encountered.

2. char token[80]; private member function holds the current token received from nextToken().

3. char tokenType; private member function holds the token's type received from nextToken(). To keep the tutorial at a beginner level easy to understand the program uses three types: variable, number and delimiter.

4. void recursive1(double &culmination); private member function adds or subtracts two terms. This function is called by sniff(). This function calls recursive2().

5. void recursive2(double &culmination); private member function will multiply or divide two factors. This function is called by recursive1(). This function calls recursive3().

6. void recursive3(double &culmination); private member function will process an exponent. This function is called by recursive2(). This function calls recursive4().

7. void recursive4(double &culmination); private member function evaluates a unary + or -. This function is called by recursive3(). This function calls recursive5().

8. void recursive5(double &culmination); private member function will process a parenthesized expression. This function is called by recursive4(). recursive1() thru recursive5() are mutually recursive functions that work in a chainlike fashion and implement the production rules. Recursive5() either recursively calls recursive1() as in the case of a parenthesized expression or calls bomb() to find the value of a number.

9. void bomb(double &culmination); private member function will get the value of a number. This function is called by recursive5(). This function calls sniff() and another token is retrieved.

10. void nextToken(); private member function will obtain the next token and put it into member variable token. It puts the type of token into the member variable tokenType. Each token is an individual unit of an expression. The function skips over spaces and tabs and detects the end of the expression.

If there are still more tokens to be retrieve from the expression, nextToken() first skips over any leading spaces. Once the spaces have been skipped, pExpression is pointing toward either a number, a variable, an operator, or if trailing spaces end the expression, a null. If the next character is an operator, it is returned as a string in token, and tokenType is assigned the value variable. If the next character is a digit, the entire number is read and placed in its string form in token and its type is number. Finally if the next character is none of the preceding, it is assumed that the end of the expression has been reached. In this case, token is null, which signals the end of the expression.

11. void serror(int error); private member function handles syntax errors in the expression and will display a syntax error when they occur.

12. int isdelim(char c); private member function is used to dissect the expression into its component parts and returns true if c is a delimiter.

13. analyzer(); public member function parser class constructor.

14. double sniff(char *exp); public member function parser entry point, it gets the first token. If the token is null, the function prints the message, “No expression present.” and returns. If the first token it not null recursive1() is called.

15. ~analyzer(); public member function parser class destructor.

• VI. CONTACT ME

Continue to leave constructive comments and I will make up to date revisions to the tutorial based on your comments.

WHAT YOU HAVE LEARNED IN THIS TUTORIAL

1. You have learned beginner parser vocabulary.

2. You have learned how to parse an expression.

Is This A Good Question/Topic? 0
  • +

Page 1 of 1