1 Replies - 16934 Views - Last Post: 09 February 2012 - 02:01 AM Rate Topic: -----

#1 Midwest Product   User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 74
  • Joined: 05-February 10

jFlex Lexical Analyzer and Java

Posted 09 February 2012 - 12:38 AM

I just got my hands on jFlex and finally got it installed correctly, and I'm struggling to understand how to begin to use it. I haven't been given any instruction, but I have searched the internet and read the user manual, some blog pages, and even downloaded 3 or 4 powerpoint presentations from random Universities to try to understand how to create the .flex input file. I know that by doing that correctly, jFlex will create a Java parser for me. I understand how to create regular expressions for the parameters that I need to find, but I don't understand the syntax of the .flex examples I have seen! Here is the goal of my assignment:

I have been given a domain specific language (let's just call it a toy language) and a few tiny programs to parse. I need to parse those using the parser generated by jFlex to separate out the tokens, which doesn't sound hard (in fact, my teacher keeps telling me how easy this assignment it! I feel stupid. ) Here is one of the example programs that we need to parse:

1 manual "abc.res"; 

2 auto 192.168.225.100; 



var ROUND1 := 20;

var INTER1 := 0;

var SWIM := 0;

var TRANS1 :=0;

var ROUND2 := 105;

var INTER2 :=0;

var BIKE := 0;

var TRANS2 :=0;

var ROUND3 := 55;

var INTER3 := 0;

var RUN := 0;



mp[1] -> agnt[1] {
 
  (true) -> upd SWIM;
   
  (true) -> dec ROUND1;

}



mp[2] -> agnt[1] {
  
  (true) -> upd TRANS1;

}



mp[3] -> agnt[2] {

  (true) -> upd INTER2;
  
  (true) -> dec ROUND2;
  
  (ROUND2 == 0) -> upd BIKE;

}



mp[4] -> agnt[2] {
 
  (true) -> upd INTER3;
  
  (ROUND3 == 55) -> upd TRANS2;
  
  (true) -> dec ROUND3;
  
  (ROUND3 == 0) -> upd RUN;

}



We were also given a BNF for the toy language (I have no idea what a BNF is, but I assume it is some sort of grammar)

<program> ::= <agents> <decs> <mes_places>
<agents> ::= <agent> <agents> | epsilon
<agent> ::= int auto ip ; | int manual file ;
<decs> ::= <dec> <decs> | epsilon
<dec> ::= var id := int ;
<mes_places> ::= <mes_place> <mes_places> | <mes_place>
<mes_place> ::= mp [ int ] -> agnt [ int ] { <stmts> }
<stmts> ::= <stmt> ; <stmts> | <stmt> ;
<stmt> ::= dec id | upd id | id := <expr> | ( <lexpr> ) -> <stmt>
<expr> ::= int | id
<lexpr> ::= true | false | <expr> == <expr> | <expr> != <expr>

From this, we are supposed to know what kind of tokens we are looking for.

Here is my problem...I have absolutely no clue how to construct a .flex file to feed into jFlex. Can anyone offer some guidance?

Is This A Good Question/Topic? 0
  • +

Replies To: jFlex Lexical Analyzer and Java

#2 blackcompe   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1159
  • View blog
  • Posts: 2,547
  • Joined: 05-May 05

Re: jFlex Lexical Analyzer and Java

Posted 09 February 2012 - 02:01 AM

*
POPULAR

Quote

I know that by doing that correctly, jFlex will create a Java parser for me.


JFlex is a lexer generator, not a compiler generator. CUP is a compiler generator, but you must have a lexer (generated by JFlex or JLex) to interface with it.

Quote

I need to parse those using the parser generated by jFlex to separate out the tokens


Lexing is separating input into tokens. Parsing is creating a tree structure that represents the syntactical structure of the input.

Quote

We were also given a BNF for the toy language (I have no idea what a BNF is, but I assume it is some sort of grammar)


Be thankful, the grammar is the hardest part! All you really need to do is write the rules in a parser specification, which is pretty easy when you look at the examples. Like I said you need a parser generator first, and you need a lexer.

Quote

I have absolutely no clue how to construct a .flex file to feed into jFlex


I took a look at the user manual. It's pretty straight forward. Did you look at it?

If you decide to use CUP, which I recommend, you need to use %cup in the lexer spec to enable compatibility, or else the symbols you define won't mean anything to CUP. Does your assignment mention anything about using CUP?

Here's a few examples of flex actions (in accordance with the example in the manual) for the keywords in your language:

<YYINITIAL> "var"  { return symbol(sym.VAR); }
<YYINITIAL> "auto"  { return symbol(sym.AUTO); }


Here's a macro and action for an integer:

Integer = [1-9][0-9]*
<YYINITIAL> { 
 
     {Integer} { return symbol(sym.INT); }
}


It's necessary for me to have your assignment spec to know all the aspects of this language. But I can tell you need to define an 'int' token by looking at your BNF. I can also tell you have a 'mp [ int ]' construct.

<mes_place> ::= mp [ int ] -> agnt [ int ] { <stmts> }


In that BNF, 'mp' is a token (or terminal) right? There's no non-terminal rule for it, so it must be. The next token is '['. Then 'int' is a token which we defined above already. Then you have ']', '->', 'agnt', and so on. So you need your lexer to match those. The actions are:

<YYINITIAL> "mp"  { return symbol(sym.MP); }
<YYINITIAL> "["  { return symbol(sym.LBRACKET); }
<YYINITIAL> "->"  { return symbol(sym.ARROW); }
.
.
.


That's pretty much explains how to do the lexer. The manual gives a good example of defining an Identifier (or in your case id) token.

Now for the parser. Have a look at the manual.

If you'll notice above, I call those symbol methods. In the JFlex examples, those methods end up returning a Symbol. That class is java_cup.runtime.Symbol, which is a class defined in the CUP implementation. That's why I said to turn the %cup on and that's why you see the import java_cup.runtime.*; statement in the example. JFlex and CUP were made to work together.

Once you got a working lexer, you need to write the parser spec. In the beginning of the parser spec example they have:

terminal       SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;



That specifies the tokens (or Symbols) that the parser should expect to receive from the lexer. You need to specify the tokens you've created:

terminal           VAR, AUTO, INT, MP;



Then you need to turn the BNF into CUP specific grammar. If you'll look at the grammar for an addition expression, you see that an instance of Integer is returned.

expr      ::= expr:e1 PLUS expr:e2    
	      {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} 



Basically the parser will build up a parse tree of objects that you define and return the root to you. You can get that Integer by saying Symbol.value. The docs should say that. The docs suck IMO.

In the manual they didn't do a good job of showing you how to build a parse tree of custom classes. They simply made a tree with exactly one Integer instance since it's a parser for simple arithmetic expressions. If your parsing a language and you want a parse tree that's capable of generating some assembler you have to create your own classes to represent your language constructs. E.g., you might represent an assignment statement with the following class. CUP will put these classes in your parse tree if you return them in your grammar actions.

class AssignStmt {

     StringLiteral variableId;
     IntLiteral variableVal;

     AssignStmt(StringLiteral variableId, IntLiteral variableVal) {
            this.variableId = variableId;
            this.variableVal = variableVal;
     }
}

class IntLiteral {
    int val;
    IntLiteral(int val){this.val = val;}
}

class StrLiteral {
    String val;
    StrLiteral(String val){this.val = val;}
}



For the non-terminal 'stmt' (a statement) in your BNF:

<stmt> ::= dec id | upd id | id := <expr> | ( <lexpr> ) -> <stmt> 
<expr> ::= int | id



the spec would be the following for an assignment statement:

<stmt> ::= id := <expr>


/* terminal */

terminal Integer   INT;
terminal String    ID;
terminal           SEMICOLON;
terminal           ASSIGN


/* non-terminals */

non-terminal      stmt;
non-terminal      expr;

/*grammar */

stmt   ::= expr:e1 SEMICOLON ASSIGN expr:e2 
          {: RESULT = new AssignStmt(new StringLiteral(e1.stringValue()), new IntLiteral(e2.intValue())):}

expr   ::= INT:n {: RESULT = n; :} 
          |ID:n {: RESULT = n; :}

 



And, that's pretty much it. That's a comprehensive introduction to creating your parser.

In your code, create the parser as follows:

main() {
     Parser p = new Parser(new Lexer()); 
     Symbol tree = p.parse();
     recursiveWalk(tree);
}



Lexer is the class JFlex generated. Parser is the parser that CUP generates. Make sure you put the CUP runtime and your lexer on the classpath.

Start small. Try successfully lexing a few source code statements first. Same with the parser stuff. If your just getting this assignment, I definitely wouldn't wait to work on this. It can take some time to read through the parser docs and understand what's happening.

In case your interested I have a tutorial on the theoretical aspects of lexical analysis, but I don't think it will help much in using lexer generators.

Good luck! :tup:

This post has been edited by blackcompe: 09 February 2012 - 08:13 AM

Was This Post Helpful? 11
  • +
  • -

Page 1 of 1