Recall last post: [link]
Let's think about tokens.
The distinction is important on INT and REAL because we want integer exponents.
Let's consider some examples:
You get the idea.
We'll have an object which can return the next token in the stream provided.
An example run might be:
Given input is "9x^4 + 14.5"
Seems to work fine.
So, the idea at play here is finding the longest matching substring. Our concern is with this portion of the grammar:
I may not have mentioned this before, but the * is kleene-star, which is matching zero or more occurrences. For more information: https://en.wikipedia...iki/Kleene_star
We get append next character to answer until next is not DIGIT or DOT. If DOT was encountered, token is REAL, and append next character until next is not DIGIT. return (token_type, lexeme).
Alright. It's time to move into some code:
Here's a sample run:
So, now that input is handled, we can start to tokenize. We'll apply the logic to a Lexer class now:
Here are some sample outputs:
So, that seems to work. We'll move onto parsing next.
poly -> term | term + poly | term - poly term -> coef | coef x | coef x ^ DIGIT DIGIT* coef -> INT | REAL INT -> - DIGIT DIGIT* | DIGIT DIGIT* REAL -> - DIGIT DIGIT* . DIGIT DIGIT* | DIGIT DIGIT* . DIGIT DIGIT* DIGIT-> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Let's think about tokens.
+ = PLUS - = MINUS ^ = POWER . = DOT x = X INT REAL EOF = end of file
The distinction is important on INT and REAL because we want integer exponents.
Let's consider some examples:
3x^4 = INT X POWER INT EOF 2.5x^12 = REAL X POWER INT EOF 8x + 5 = INT X PLUS INT EOF -2x^5 - 14 = MINUS INT X POWER INT MINUS INT EOF
You get the idea.
We'll have an object which can return the next token in the stream provided.
An example run might be:
$ poly.exe 9x^4 + 14.5
Given input is "9x^4 + 14.5"
get_token() reads 9 and token is INT. Then reads x, and there is no matching token, so put x back, and return INT. get_token() reads x and token is X. Then reads ^, and there is no matching token, so put ^ back, and return X. get_token() reads ^ and token is POWER. Then reads 4, and there is no matching token, so put 4 back, and return POWER. get_token() reads 4 and token is INT. Then reads +, and there is no matching token, so put + back and return INT. get_token() reads + and token is PLUS. Then reads 1, no match, put back, return PLUS. get_token() reads 1 and token is INT. Then reads 4 and token is INT. Then reads DOT and token is REAL. Then reads 5 and token is REAL. Then reads EOF, and no match so put EOF back and return REAL with value (lexeme) 14.5. get_token() reads EOF, and returns EOF. get_token() reads nothing. returns EOF. Final stream: INT X POWER INT PLUS REAL EOF
Seems to work fine.
So, the idea at play here is finding the longest matching substring. Our concern is with this portion of the grammar:
INT -> - DIGIT DIGIT* | DIGIT DIGIT* REAL -> - DIGIT DIGIT* . DIGIT DIGIT* | DIGIT DIGIT* . DIGIT DIGIT* DIGIT-> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
I may not have mentioned this before, but the * is kleene-star, which is matching zero or more occurrences. For more information: https://en.wikipedia...iki/Kleene_star
We get append next character to answer until next is not DIGIT or DOT. If DOT was encountered, token is REAL, and append next character until next is not DIGIT. return (token_type, lexeme).
Alright. It's time to move into some code:
class Input { private: string buffer; public: Input(string data) { cout << "data=" << data << endl; for(char c : data) { if(c != ' ') buffer += c; } } char pop() { if(buffer.length() == 0) return '\0'; char c = buffer[0]; buffer = buffer.substr(1); return c; } void push(char c) { buffer = c + buffer; } string get_buffer() { return buffer; } }; int main() { string line; getline(cin, line); Input input(line); cout << "input=" << input.get_buffer() << endl; while(input.get_buffer() != "") { cout << input.pop() << endl; } return 0; }
Here's a sample run:
17x^2 + 12x - 17 data=17x^2 + 12x - 17 input=17x^2+12x-17 1 7 x ^ 2 + 1 2 x - 1 7
So, now that input is handled, we can start to tokenize. We'll apply the logic to a Lexer class now:
#include <iostream> #include <cctype> using namespace std; enum TokenType { NONE, PLUS, MINUS, POWER, X, INT, REAL, END_OF_FILE }; struct Token { TokenType type; string lexeme; }; class Input { private: string buffer; public: Input(string data) { cout << "data=" << data << endl; for(char c : data) { if(c != ' ') buffer += c; } } char pop() { if(buffer.length() == 0) return '\0'; char c = buffer[0]; buffer = buffer.substr(1); return c; } void push(char c) { buffer = c + buffer; } string get_buffer() { return buffer; } }; class Lexer { private: Input *input; public: Lexer(string data) { input = new Input(data); } Token get_token() { Token token; token.type = NONE; token.lexeme = ""; char c = input->pop(); switch(c) { case '+': token.type = PLUS; token.lexeme = "+"; break; case '-': token.type = MINUS; token.lexeme = '-'; break; case '^': token.type = POWER; token.lexeme = '^'; break; case 'x': token.type = X; token.lexeme = "x"; break; case '\0': token.type = END_OF_FILE; token.lexeme = ""; break; default: if(isdigit(c) || c == '.') { token.type = INT; token.lexeme += c; bool has_dot = (c == '.'); token.type = has_dot ? REAL : INT; while(isdigit(c) || c == '.') { c = input->pop(); if(c == '.' && !has_dot) { token.type = REAL; token.lexeme += c; has_dot = true; } else if(c == '.') { cout << "ERROR" << endl; exit(1); } else { if(isdigit(c)) token.lexeme += c; } } input->push(c); } else { cout << "ERROR" << endl; exit(1); } break; } return token; } }; int main() { string names[] = { "NONE", "PLUS", "MINUS", "POWER", "X", "INT", "REAL", "EOF" }; string line; getline(cin, line); Lexer *lex = new Lexer(line); Token t; while((t = lex->get_token()).type != END_OF_FILE) cout << names[t.type] << ", " << t.lexeme << endl; return 0; }
Here are some sample outputs:
data=13x^5 + 17x^2 - 0.5 INT, 13 X, x POWER, ^ INT, 5 PLUS, + INT, 17 X, x POWER, ^ INT, 2 MINUS, - REAL, 0.5
data=-7x ^ 3+ -4x^2 + 2 MINUS, - INT, 7 X, x POWER, ^ INT, 3 PLUS, + MINUS, - INT, 4 X, x POWER, ^ INT, 2 PLUS, + INT, 2
data=3.14.15 ERROR
data=12y + 3 INT, 12 ERROR
So, that seems to work. We'll move onto parsing next.
0 Comments On This Entry
← April 2021 →
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 |
Tags
My Blog Links
Recent Entries
Recent Comments
Search My Blog
2 user(s) viewing
2 Guests
0 member(s)
0 anonymous member(s)
0 member(s)
0 anonymous member(s)