Subscribe to 10 GOTO 10        RSS Feed
-----

Boost::Spirit Parser

Icon 2 Comments
So someone was asking about how to parse ini files ( here ) which led me to find this Boost::Spirit ini parser which made me interested in learning about Boost::Spirit.

Boost::Spirit is a C++ Template Metaprogramming (see previous post) based lexer/parser/translator which allows one to define a grammar inline in C++ code. This is in contrast to tools like LEX/YACC which are tools which generate C/C++ source files but require you to define the grammar in a separate language. Spirit uses a format similar to Extended Backus Naur Form (EBNF) to define grammars and builds recursive decent parsers at compile time along with the rest of your source.

As it turns out the newest version of Boost::Spirit (version 2.3) has some pretty good documentation! So I found myself intrigued by how easy it looked and so I jumped in!

I had my troubles, For example I tried to use C++0x lambdas rather than Boost::Phoenix -- the reason was that my compiler was conflicted on using boost::lambda::ref and std::tr1::ref. However my attempt to use C++0x lambdas failed so I fought my way though with the Boost::Phoenix and in the end got things working with a relative ease.

So here is my first Boost::Spirit parser designed to parse name value pairs. It borrows heavily from the examples in the tutorial part of the documentation; so you might think of this as just one more extension of those examples (specifically this one)

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

#include <string>
#include <map>

namespace client
{
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    namespace phoenix = boost::phoenix;
    
    template <typename Iterator>
    bool parse_nameValuePair(Iterator first, Iterator last, std::map<std::string, std::string>& map)
    {
        using boost::spirit::qi::_1;    
        using boost::spirit::qi::_2;    
        using qi::lexeme;
        using ascii::char_;
        using qi::phrase_parse;
        using ascii::space;
        using phoenix::ref;
        
        std::vector<char> c1;
        std::vector<char> c2;
        
        bool r = phrase_parse(
            first,                          
            last,                           
            (+(char_ - '='))[phoenix::ref(c1) = _1 ] 
                >> '=' >> *space 
                >> lexeme[(+char_)[phoenix::ref(c2) = _1 ]],        
            space                           
        );
        
        
        if (!r || first != last) {
            return false;
        }
        map.insert(std::pair<std::string, std::string>(std::string(c1.begin(), c1.end()), std::string(c2.begin(), c2.end())));
        return r;
    }
}

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector<string> names;
    names.push_back("Name1 = value1");
    names.push_back("Number = 12343");
    names.push_back("Ignore Some Whitespace = 0x543323 = 5518115");
    names.push_back("LongString = This is a long string value");

    map<string, string> nvps;

    for(auto iter = names.begin(); iter != names.end(); ++iter) {
        client::parse_nameValuePair(iter->begin(), iter->end(), nvps);
    }


    for(auto iter = nvps.begin(); iter != nvps.end(); ++iter) {
        cout << "\"" << iter->first << "\" = \"" << iter->second << "\"" << endl;
    }

    return 0;
}



The "parser" is composed completely of lines 32 - 34. (neat eh!)

First lets look at just a bit of code the "recognizes" our expected structure of "Name = value":

here I have taken all of the "semantic actions" which are used to extract the data, leaving just the ability to "recognize" or "match" a Name-Value pair.

(+(char_ - '=')) >> '=' >> *space >> lexeme[(+char_)]


The overall structure is a sequence:

a >> '=' >> *space >> b


Here a is (+(char_ - '=')) (a sequence of chars not containing '=')
_char -- a parser to match a char.
A - B -- matches A excluding B: _char - '=' -- match a char but not '='
+A -- a sequence of A, so +(char_ - '=') -- match a sequence of chars not including '='
The outer parens are just there to group that as one parser.

and b is lexeme[(+char_)] (a sequence of chars with whitespace)
lexeme[] will stop the "skip" defined by the outter phrase_parse. In my program I used boost::spirit::ascii::space to skip all whitespace, but when I read the value I don't want to skip WS so I used lexeme[].
_char will match a char and +_char will match a sequence of chars...

I then used "semantic actions" to capture what the parser found.

When the parser finishes recognizing something it will call the associated "Semantic action" (if one exists) which is one way to tell the parser what to do with the data. In this case I just want to save the data in an externally scoped variable (c1 or c2). Sequences match to std::vector<type>, so my char sequences matched above become std::vector<char> which I captured using a Boost::Phoenix lambda:

(+(char_ - '='))[phoenix::ref(c1) = _1 ] -- capture the sequence of chars and put it in c1;
(+char_)[phoenix::ref(c2) = _1 ] -- capture the sequence and puts it in c2;

Then at the bottom of the function, if there was a successful match I will convert these vectors to strings and put them in a std::pair<std::string, std::string> that is then added to a std::map.

Not nearly has hard a time to get up and running as I have spent with LEX/YACC or other parser generators -- and that includes all of the drama about trying to use C++0x lambdas.

[note: I did use C++0x auto, this can be replaces with a proper iterator of type std::map<std::string, std::string>::iterator but it was just easier to use the auto... I really like auto when it comes to iterators and lambdas).

2 Comments On This Entry

Page 1 of 1

taylorc8 

19 July 2010 - 12:15 AM
:bananaman: :tt1: :clap:

this makes me happy..
0

NickDMax 

20 July 2010 - 08:10 PM
As I have been working with Spirit one of the problems I have had was figuring out what the Attribute type of a particular structure was. Well it turns out that if you read though the Spirit Blog you can find this post which contains a function for determining the attribute type (actually you can find a link to the SVN with the function in a nice hpp header and a little sample main() to test with).

So the above program I posted has morphed slightly since posting and the current parser is:
        lexeme[(+alpha >> *alnum )]
                >> '=' 
                >> lexeme[(+(char_ ))]


and the Attribute type is (after cleaning up a bit):
struct boost::fusion::vector2<
    struct boost::fusion::vector2<
        class std::vector<char>, 
        class std::vector<char> 
    >, 
    class std::vector<char>
>


:) Not sure how this helps me yet but its good to know none the less!
0
Page 1 of 1

June 2020

S M T W T F S
 12345 6
78910111213
14151617181920
21222324252627
282930    

Recent Entries

Search My Blog

3 user(s) viewing

3 Guests
0 member(s)
0 anonymous member(s)