Page 1 of 1

Beginning Flex

#1 erik.price  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 485
  • View blog
  • Posts: 2,690
  • Joined: 18-December 08

Posted 23 October 2009 - 07:11 PM

Basics of Flex

Well, here we go, my first tutorial. Any feedback would be wonderful.

Starting off : What is Flex?

Flex is the free version of the Unix Lexical analyzer, lex.
The Official Flex Manual says this

Quote

flex is a tool for generating scanners. A scanner is a program which recognizes lexical patterns in text. The flex program reads the given input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules.


What does this all mean? Basically, it means, combined with the right tools (bison tutorial coming soon!), you can create your very own programming language, syntax analyzer, parser et cetera.

I learned flex from a book on the proprietary Unix tools, lex & yacc (made in 1992), so I've had to adopt some bizarre conventions, if you find another source which uses a different syntax or style, it's probably more correct than I am.

DO I HAVE FLEX INSTALLED ALREADY?
If you are a linux user, yes.
If you are a mac user, probably (if unsure, type "flex" into a terminal and see what happens).
If you are a Windows user, not quite.
You can try the download instructions on this page if you want, but I can't guarantee it'll work (never tried)

Make sure you also have a working C compiler (I use gcc on both Mac and Linux. Windows hates me though, and broke my compiler)

REGULAR ExpressionS
The Horror!

Regular expressions, you're either magically gifted at them, or you're like me and tend to stare at things like '/("|\').*?\1/' and cry. (Not valid in flex by the way, but entirely similar. I believe that's Perl)
Honestly, I haven't had to use Regular Expressions that much while using flex, but it's always helpful knowing some quick whitespace and unterminated string regexes so that you can handle them appropriately.

So, let's say we want to use flex, combined with other tools to create a simple command line calculator. What would we need regular expressions for?\
Well, we would need to be able to find and identify :
  • Integers
  • Real (floating point) numbers
  • Operators
  • Possibly, variables
To do this, we need a basic knowledge of regexes and how to use them first.

.		Match any character except '\n'(newline)
?		Matches exactly one or zero copies of the preceeding expression
*		Match zero or more copies of the preceding expression
+		Matches one or more copies of the preceding expression
^		Matches the beginning of a line as the first character 
		of the regular expression ^"Hello" will match 
		"Hello, my name is Erik" but not "My name is Erik. Hello!"
$		Same as the ^ character, except matches end of line
|		Matches either expression (think of OR)
[]		Brackets require some explanation, it will match against
		the range of numbers, letters, or symbols inserted, so
		[0123456789] matches any number, 0 through 9. However,
		this is frustrating to do everytime you want to match a number,
		and just imagine trying to match a string, so there is a short hand
		way, [0-9] it is equivalent to the longer approach, and more 
		convienent. You can do this with anything, [a-zA-Z] will match
		any upper or lowercase letter. Putting a "^" as the first character
		in brackets changes the meaning to, "Match all EXCEPT"



Above is a list of some useful and commonly used regular expressions, and although these are the most common (at least to me), there are many others. (Note that the link isn't flex specific,for flex regular expressions, go here).

So, now that we've got the basics down (hopefully), we can start creating regular expressions to match what we need for our calculator.

First off, integers. We begin our expression with [0-9]+ to match any number with 1 or more digits. Simple enough? Alright, now let's have the ability to use negative numbers. The only options for the leading "-" is either have one, or have nothing at all in front. So, we use "?". Our final regular expression for integers is -?[0-9]+

Reals can get tricky, if you reaallly want to do it right, you would use -?(([0-9]+)|([0-9]*\.[0-9]+)([eE][+-]?[0-9]+)?) but by the time you can handle something like that in flex, you won't need some measly tutorial on regular expressions.
So, for now, I'll leave the rules for floating point numbers as [0-9]*\.[0-9]+
You should be able to understand all this, except maybe the "\.", depending on your favorite programming language. This is an escape sequence, and what it does is give a character that usually means one thing another meaning. In this case, "." usually means "All characters except a newline" and that's not really what we're looking for, so "\." means a literal "."

All clear? Let's hope so, because now it's time to begin using flex!

BASIC FLEX SYNTAX

If you've ever used C or C++, you're at a definite advantage here. flex is based on C, and the code that it generates (which you later compile) is true C (but we'll get into that a little later).

Let's do the "hello world" of lexical analysis!

%%
.|\n		ECHO;
%%



This program doesn't accurately depict a real flex source because of it's simplicity, but we need to start somewhere.
Let's break it down.

The part before the first "%%" is called the definition section, any code here will be copied part by part to the resulting C output. C code is surrounded by %{ and then %}. More on that later

The next section (starting with ".|\n") is called the rules section and this is where we plug in all our regular expressions and behaviors for flex. Every rule is made up of two parts, a pattern, and an action. The patter is in essence the regular expression, and the action is what to do when that pattern is found. In this case, the pattern is ".|\n" and the action is ECHO (which is a built in macro. I'll let you guess what this does).

The last section (after the %%) is much like the first section in purpose, only any code here doesn't have to be surrounded by {% and %}, and is copied to the end of the program.


So now you know the basics of flex! If you want to keep exploring, unfortunately, the internet doesn't offer that many great tutorials, but I would recommend the book "Flex and Bison" (available from Amazon for under $20).
Hopefully, this tutorial will turn into a series on both flex and bison.

feel free to PM me with any questions :pirate:

Quote



Is This A Good Question/Topic? 0
  • +

Page 1 of 1