14 Replies - 9881 Views - Last Post: 07 December 2012 - 01:36 PM

#1 teharris  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 27-July 12

C++ Compiler "grammar"

Posted 28 November 2012 - 06:37 PM

The last time I posted, I had just begun the book "Elements of Computing Systems" and was attempting to learn C++ (and programming in general) and write a basic assembler using it at the same time. Since completing the book, I am interested in delving more into compilers, assemblers etc. Currently I am looking for resources that would provide me with a C++ compiler's "grammar" (this is what ECS calls it, not sure if it is a widely used term). Essentially, this would be a document covering how the compiler reads and parses the code. Any help is appreciated. Thanks.

Note, I am wanting to get a little more in depth knowledge on the C++ grammar before I grab books on compiler theory / assembly language.

Is This A Good Question/Topic? 0
  • +

Replies To: C++ Compiler "grammar"

#2 blackcompe  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1150
  • View blog
  • Posts: 2,528
  • Joined: 05-May 05

Re: C++ Compiler "grammar"

Posted 28 November 2012 - 06:52 PM

*
POPULAR

Quote

Note, I am wanting to get a little more in depth knowledge on the C++ grammar before I grab books on compiler theory / assembly language.


I think you've got it backwards. If anything you want to become comfortable parsing simple languages before delving into C++. I can't think of a harder syntax to work with.

Here's (direct PDF link) a draft of one of the open ISO standards. Look at the grammar section towards the end of the PDF. I think it's pretty much the same thing as the accepted standard. You can buy the latest published standard here.

Some good resources on Compiler writing are: The Dragon Book, Modern Compiler Implementation in C/Java/ML and Engineering a Compiler. There's also a Compilers course on Coursera that runs twice a year.

This post has been edited by blackcompe: 28 November 2012 - 07:36 PM

Was This Post Helpful? 6
  • +
  • -

#3 teharris  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 27-July 12

Re: C++ Compiler "grammar"

Posted 28 November 2012 - 07:05 PM

Thanks. That was exactly what I was looking for.

Also, welcome to the ACC.
Was This Post Helpful? 0
  • +
  • -

#4 Xupicor  Icon User is offline

  • Nasal Demon
  • member icon

Reputation: 249
  • View blog
  • Posts: 582
  • Joined: 31-May 11

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 12:50 AM

The latest C++11 ISO/IEC working draft is in my signature, if you need that.

If you're into compiler creation though, you should rather start with books focusing on the topic, and not on some language syntax. ;)
Was This Post Helpful? 1
  • +
  • -

#5 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7576
  • View blog
  • Posts: 12,729
  • Joined: 19-March 11

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 07:48 AM

I don't see anything wrong with learning something about the grammar of the language if you want to think about compilers, but as blackcompe and Xupicor have said, an overview text on compilers will be a great help. Bill Campbell, the fellow I learned about compilers from, has written a pretty good book on compiling Java along with Swami Iyer (another guy I've learned a lot from). Although it's Java and not C++, this might be useful, since it includes a minimal compiler for a subset of java ("j--") which serves as the basis for instruction - you end up writing code for most of the interesting parts of the compiler, but you don't have to build it all from scratch. It's a nice middle road between passive observation of a complete compiler for a real language and actively creating a partial or a trivial compiler for an unused language.
Was This Post Helpful? 2
  • +
  • -

#6 Xupicor  Icon User is offline

  • Nasal Demon
  • member icon

Reputation: 249
  • View blog
  • Posts: 582
  • Joined: 31-May 11

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 08:56 AM

Oh no no, maybe I barked too fast, but certainly learning more about syntax is NOT something bad. Just that C++ has quite context sensitive, bit complicated syntax, and I think if you want to get how to make a compiler you should rather start with general concepts and then implement them in practice. Intricate C++ syntax will surely wait until you're ready to tackle it. ;)

Great suggestion by the way, OP - just google "Bill Campbell compilers java".
Was This Post Helpful? 1
  • +
  • -

#7 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 12:14 PM

start with scheme RSR5 if you want to implement a standard; even that's not easy. even the worlds best compiler writers take years to write C++ compilers. Lua would be another neat thing to do as it has a pretty simple grammar and simple semantics. even better you should learn to create your own simple language.
Was This Post Helpful? 0
  • +
  • -

#8 vividexstance  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 651
  • View blog
  • Posts: 2,231
  • Joined: 31-December 10

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 12:51 PM

View PostXupicor, on 29 November 2012 - 10:56 AM, said:

Oh no no, maybe I barked too fast, but certainly learning more about syntax is NOT something bad. Just that C++ has quite context sensitive, bit complicated syntax, and I think if you want to get how to make a compiler you should rather start with general concepts and then implement them in practice. Intricate C++ syntax will surely wait until you're ready to tackle it. ;)/>/>/>

I agree 100%, there's so much you can do with the C and C++ languages that makes it very difficult to write a compiler for them. One example is how certain operators do certain things depending on the context. Just a simple one, like the minus sign ('-'), can be either a unary or binary operator depending on the context. Example:
/* Binary minus operator */
int a = 5 - 2;    /* a == 3 */

/* Unary minus operator */
int b = -3;    /* b == -3 */


If you still want to learn about C/C++ compilers, you may want to look into LLVM (Low Level Virtual Machine) and Clang which is an "LLVM native" C/C++/Objective C compiler.

This post has been edited by vividexstance: 29 November 2012 - 12:52 PM

Was This Post Helpful? 0
  • +
  • -

#9 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7576
  • View blog
  • Posts: 12,729
  • Joined: 19-March 11

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 12:59 PM

View Postvividexstance, on 29 November 2012 - 02:51 PM, said:

I agree 100%, there's so much you can do with the C and C++ languages that makes it very difficult to write a compiler for them. One example is how certain operators do certain things depending on the context. Just a simple one, like the minus sign ('-'), can be either a unary or binary operator depending on the context. Example:


Sure, and the '-' token can also be part of a pre- or post-fix decrement operator, or part of the -= operator. But as I recall, there was nothing very difficult about parsing those, and once you've finished parsing them you don't have to worry about it any more (the operators themselves are obviously distinct entities)
Was This Post Helpful? 0
  • +
  • -

#10 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2089
  • View blog
  • Posts: 3,181
  • Joined: 21-June 11

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 01:52 PM

-- and -= are one token each - not expressions that contain the token '-'. And yes, there's nothing complicated about parsing subtraction/negation. That's not what makes C++'s grammar context-sensitive.
Was This Post Helpful? 0
  • +
  • -

#11 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: C++ Compiler "grammar"

Posted 29 November 2012 - 04:33 PM

there isn't anything difficult about disambiguating weather '-' is being used as unary or binary. - and -= and -- are all different tokens. I could probably teach a sophomore computer science student to tackle these issues.

examples of where it gets tricky:

take the arguments of a template, they can be both types and expressions(which are parsed differently) the only way to know is to use contextual disambiguation based on the kind of template being used.

what about this?
y * x;

is that multiplication or am I declaring a pointer? again this is contextual

now here is a really tricky one
x<y>::z

which is it?
((x) < (y)) > (::z)

or is x<y> a type and z is a static member?

again the answer is contextual.

templates are the primary source of ambiguities in the language, but by no means the only.

in C++11 they added they maid it so that '>>' can end two templates which is a tough one to get right. my best bet is that most implementations will parse it like a single '>' in the case of templates and put a '>' back into the stream. that requires that the parser have a way to communicate with the lexer or the underlying character stream. some might place more of the burden on the lexer by simply telling it "if you get a '>>' split it into two '>'s". some might use a scanerless parser in which case the parse only requests a '>' and not a '>>' but that's a pretty rare case.

add to this that C++ requires non-trivial look-ahead, is context sensitive, and that the grammar (and language for that matter) is huge you have one of the hardest languages to parse in mainstream use if not the hardest.

This post has been edited by ishkabible: 29 November 2012 - 04:45 PM

Was This Post Helpful? 3
  • +
  • -

#12 code_m  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 24
  • View blog
  • Posts: 197
  • Joined: 21-April 09

Re: C++ Compiler "grammar"

Posted 07 December 2012 - 11:18 AM

I'd think all the uses of '*' and '&' would make it quite hard. Multiply, pointer creation, pointer dereference, iterator dereference... bitwise operations, get-address-of operator...
Was This Post Helpful? 0
  • +
  • -

#13 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5780
  • View blog
  • Posts: 12,595
  • Joined: 16-October 07

Re: C++ Compiler "grammar"

Posted 07 December 2012 - 11:44 AM

C++ is a nasty, complex, multiparadigm beastie. It's probably the last thing I'd want to write a compiler for. Which gets me to thinking, what would the easiest compiler be...

On the list would actually be C. Not C++, not even modern C, but plain, old, K&R ( read THAT book ) C. It really is a simple language. Free of most modern gotchas, including the ones people are concerned with in C++.

The easiest I can think of would be BASIC, the one with the line numbers. A seriously dead language of my youth, it is lean and practically pre parsed. With GOTO and line numbers, you can do a one to one to assembly with a lot of it. I've considered doing this for fun, but never got around to it.
Was This Post Helpful? 0
  • +
  • -

#14 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7576
  • View blog
  • Posts: 12,729
  • Joined: 19-March 11

Re: C++ Compiler "grammar"

Posted 07 December 2012 - 11:57 AM

You know, I've been trying to put together some good python challenges for the coming year, and I think this might be a good one.
Was This Post Helpful? 0
  • +
  • -

#15 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: C++ Compiler "grammar"

Posted 07 December 2012 - 01:36 PM

I tried implementing dartmouth basic from an old ECMA standard...even as small as that was it was far from trivial. It was very non-uniform, every command seemed like it had it's own syntax. I think a simple stack based language or lisp like syntax would be easier. Stack based is probably the easiest by far

This post has been edited by ishkabible: 07 December 2012 - 01:46 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1