14 Replies - 2605 Views - Last Post: 13 June 2012 - 07:21 AM

#1 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4469
  • View blog
  • Posts: 7,780
  • Joined: 08-June 10

Why allow Undefined Behavior?

Posted 12 June 2012 - 08:45 AM

*
POPULAR

In my few ventures into the "lower-level" style of programming, I've noticed something: undefined behavior. There's a thread on the front page right now that has it:

http://www.dreaminco...i-i-i-i-i-i-i;/

My question is: why do compilers allow for it? Why do specs allow for it? If something is explicitly stated to have undefined behavior, why allow it to compile at all? Why make things that have no concrete implementation in a spec legal syntax?

Understand that I'm coming from a background of high-level languages only. I've never run into undefined behavior in C#, for instance. I don't even know if it exists. Actually, what I've read is that you can only get "undefined behavior" in C# inside an unsafe block, which allows for pointer operations.

What I'm getting at is that I don't understand the reason to allow for it. As I understand, the result of an operation that is undefined will by definition be undefined, and everything after that that relies on it will be undefined as well. It's up to the compiler to determine what to do, and as such, code like that is completely unreliable. If that's the case, why not make the syntax itself illegal rather than just allowing the compiler to do whatever its designers wanted?

Is This A Good Question/Topic? 5
  • +

Replies To: Why allow Undefined Behavior?

#2 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 09:04 AM

C existed for a long time before the ANSI standard did, and a lot of people implemented compilers that implemented weird corner cases (such as the one you point to) in different ways. At that point, there was no way to standardize on one implementation, and therefore "unspecified" was a necessary recourse at times.

P.J. Plauger wrote some essays on the ANSI C standards process, which he was involved in. They're in one of hie three volumes of essays, but I don't know which off the top of my head. He goes into the politics and the practicalities involved, which might help answer your question in more detail than I can.
Was This Post Helpful? 3
  • +
  • -

#3 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2102
  • View blog
  • Posts: 3,207
  • Joined: 21-June 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 09:07 AM

*
POPULAR

First you need to realize that undefined behavior isn't generally detectable at compile time. Yes, the compiler could detect that i++ + ++i is modifying the same variable twice without a sequence point in between (and in fact gcc does detect that and produce a warning), but if you have something like (cond1 ? i++ : j++) + (cond2 ? ++i : ++j it will only be UB if cond1 == cond2 and the compiler can't statically know whether that will or might be the case.

So categorically making undefined behavior into compile errors is just not possible. So what other ways are there to eliminate UB and why were they not chosen for C(++)?

Some cases of UB can be replaced by compilation errors at the cost of making the compiler reject some valid programs. As an example it's UB in C if the end of a non-void function is reached without finding a return statement. In Java it's a compilation error if there's any possible path through a non-void method that does not end in a return statement. The downside of this is that the following perfectly valid function will fail to compile in Java (but work perfectly fine in C without invoking UB):

int f(int x, int y) {
    if(x == y) return 23;
    if(x != y) return 42;
}


This won't compile in Java because in the (impossible) case that x==y and x!=y are both false, there's no return statement. Of course this can be fixed in this case by turning the second if into an else. I assume the C standard didn't want to make code like this illegal because it generally assumes that the programmer knows what he's doing and it doesn't want to reject valid programs just to make it harder to make mistakes.

Then the only other option to avoid UB is to define the behavior of every program that compiles. That would mean making everything a sequence point and defining the evaluation order of functions and operators. However this would mean that the compiler is no longer able to choose the most efficient implementation for the particular platform because he'd have to do what the standard says. It would also mean doing bounds checks every time you access an array (accessing arrays out of bound is also UB), which would introduce a runtime cost.

Further more removing UB would prevent certain cases of compiler optimization. For example signed integer overflow is undefined. Because of this gcc (with optimizations enabled) optimized expressions like x + 1 > x to true (which wouldn't be legal if integer overflow was defined to wrap around (because then it could be false) or cause a runtime error (because an optimization which turns a crashing program into a not-crashing program is illegal if the crash is defined behavior)).

So a major reason why things are kept undefined is performance.

This post has been edited by sepp2k: 12 June 2012 - 09:08 AM

Was This Post Helpful? 9
  • +
  • -

#4 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 09:42 AM

Quote

I assume the C standard didn't want to make code like this illegal because it generally assumes that the programmer knows what he's doing and it doesn't want to reject valid programs just to make it harder to make mistakes.


The C standard could not change that behavior without making twenty years of programming suddenly invalid. It would have meant rewriting, among other things, UNIX. All of the Unices, of which there were quite a few in 1990, and all of the libraries. That standard would have been ignored by the world, and rightly so, and everyone would have ignored it in favor of the way they were already doing it - in other words, there would have been no standard.

Remember the ANSI standard was issued in 1990, before that the specification was basically K&R. What K&R didn't specify, was left up to the compiler.
Was This Post Helpful? 1
  • +
  • -

#5 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 11:19 AM

sepp2k hit the nail on the head; these things can't be checked at compile time. anywhere a language construct(not library) would throw an exception in C# you can expect undefined behavior or a lower level language(becuase these checks are not preformed). in code where you want greater control over behavior, bounds checks, null pointer checks, etc... are not desirable. operating systems do lots of stuff with memory and pointers that bounds checking doesn't even apply to. speed is also a consideration. if every time you dereferenced a pointer or accessed an element of an array you preformed some kind of check like that it would slow the language down(1 reason why Java and C# will never be as fast as C++)

jon also stated that backward compatibility is VERY important in languages like C and C++ where tons of old code would be broken if these things where changed. C++11 caused some minor backward compatibility issues and that was a big deal(but thankfully it allowed the language to move forward).

This post has been edited by ishkabible: 12 June 2012 - 11:23 AM

Was This Post Helpful? 1
  • +
  • -

#6 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 12:37 PM

Quote

sepp2k hit the nail on the head; these things can't be checked at compile time.


Actually, a lot of this stuff can be checked perfectly fine. For example, the behavior of compound postfix assignments is defined in Java - the troublesome ternary (cond1 ? i++ : j++) + (cond2 ? ++i : ++j) causes no trouble in Java. Checking returns is something that Java does obsessively, obviously.

Some of this would have been computationally expensive in 1970, prohibitively so, and that is probably why C inclines to the macho coder stance ("Real Coders don't need a nanny compiler!"), but much of it was because C was not designed before it was implemented. Defining the behavior of the postfix, for example, is not difficult. Dennis Ritchie did it in practice, and so did everyone else who compiled C. No C compiler ever had trouble interpreting those interpretations. Their interpretations are not mutually consistent, but they are internally consistent. The trouble is not that the behavior isn't defineable, it's that nobody set a single definition, so by the time there was a standard, there was no way to choose one.


Digression:

Quote

int f(int x, int y) {
    if(x == y) return 23;
    if(x != y) return 42;
}


This won't compile in Java because in the (impossible) case that x==y and x!=y are both false, there's no return statement.



Of course, this is perfectly well-defined behavior. :) The java compiler requires that all branches lead to a good return statement, and it's not going to try to evaluate whether all "possible" branches are covered. It wants all branches covered.


This is reasonable, in fact. Imagine a slightly different case:

int f(Foo x, Foo y) {
    if(x.getValue() == y.getValue()) return 23;
    if(x.getValue() != y.getValue()) return 42;
}



Do you really want to put your hand on your heart and swear that x and y don't have setters for value that could be accessed in the critical period? Suppose, for example, Foo.getValue() includes a call to user input - oops! (we can also imagine thread interactions, if you like: this code runs on thread A, but some other code on Thread B sneaks in and modifies Y in between steps)
So it's not so impossible for both to return false.
Now it's possible in principle to separate conditionals involving primitives from others, and then work out whether they cover all of the possible branches. This is even possible computationally. It would slow things down, but you could do it. There are two good reasons not to, neither of which has to do with speed. The first is that if we do this, someone then will want you to check to see if foo.getValue() calls an invariant, and if so, apply the same logical tests we use for primitives. And if we concede this, someone else will come up with another case, and so forth. This points to the second reason: complete coverage of possible branches is easily stated, and therefore you can code to it. If you try to make things "easy" usually you end up making a lot more mistakes available to the programmer. "Syntactic sugar leads to cancer of the semicolon".
Was This Post Helpful? 1
  • +
  • -

#7 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2102
  • View blog
  • Posts: 3,207
  • Joined: 21-June 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 01:01 PM

View Postjon.kiparsky, on 12 June 2012 - 09:37 PM, said:

Quote

sepp2k hit the nail on the head; these things can't be checked at compile time.


Actually, a lot of this stuff can be checked perfectly fine. For example, the behavior of compound postfix assignments is defined in Java - the troublesome ternary (cond1 ? i++ : j++) + (cond2 ? ++i : ++j) causes no trouble in Java.


I think you misunderstood my point. I didn't say it couldn't be made defined behavior (in fact I specifically said that it could be by making everything a sequence point, which is what Java does). I said it couldn't be made into a compilation error (without also hitting some false positives). And as you know, it isn't a compile error in Java.

Quote

Defining the behavior of the postfix, for example, is not difficult. Dennis Ritchie did it in practice, and so did everyone else who compiled C. No C compiler ever had trouble interpreting those interpretations. Their interpretations are not mutually consistent, but they are internally consistent. The trouble is not that the behavior isn't defineable, it's that nobody set a single definition, so by the time there was a standard, there was no way to choose one.


I don't think that's all there is to it. Making everything a sequence point would force the compiler to write to memory more often (or at least make it harder for the compiler to determine when it doesn't have to).

Quote

Quote

int f(int x, int y) {
    if(x == y) return 23;
    if(x != y) return 42;
}


This won't compile in Java because in the (impossible) case that x==y and x!=y are both false, there's no return statement.



Of course, this is perfectly well-defined behavior. :)


Ehrm, yes, of course it is. C has undefined behavior. Java doesn't. In this case Java achieved this by rejecting a well-behaved program that C doesn't reject. That was my point.


Quote

The java compiler requires that all branches lead to a good return statement, and it's not going to try to evaluate whether all "possible" branches are covered.


Right because that's not possible. That's why your only two choices are to either reject some well-behaved programs or to not reject badly-behaving programs. And if you go the latter way (like C did), you'd then have to decide whether a missing return statement should be a runtime error or UB. And C chose the latter because of performance.

Quote

This is reasonable, in fact. Imagine a slightly different case:

int f(Foo x, Foo y) {
    if(x.getValue() == y.getValue()) return 23;
    if(x.getValue() != y.getValue()) return 42;
}



Do you really want to put your hand on your heart and swear that x and y don't have setters for value that could be accessed in the critical period?


I wouldn't have to if the compiler made sure for me, but again I know that that's not possible.

It seems to me as if you think that I think that Java's way of handling missing return statements is worse. I don't. All I said is that you can't do it without rejecting some well-behaved programs and that when in doubt, C errs in favor of assuming that programmer's don't make mistakes.

This post has been edited by sepp2k: 12 June 2012 - 01:06 PM

Was This Post Helpful? 1
  • +
  • -

#8 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 02:10 PM

Quote

It seems to me as if you think that I think that Java's way of handling missing return statements is worse. I don't. All I said is that you can't do it without rejecting some well-behaved programs and that when in doubt, C errs in favor of assuming that programmer's don't make mistakes.


Well, I think my issue is more with the assertion that undefined behavior is a matter of computability. It's not. It's a matter of definition. If C consistently rejected the constructs in question or handled them consistently, then we wouldn't have anything to talk about here. This is not a matter of computability - java specifies the behaviors in question, the behavior is defined. So the existence of undefined behavior in C comes from someplace else.


Quote

C has undefined behavior. Java doesn't. In this case Java achieved this by rejecting a well-behaved program that C doesn't reject.


The fact that the behavior in C is undefined is not a product of its accepting that construct. It's undefined because it doesn't specify how to handle the case of falling off the end of function. That could be (as in java) to reject the program, but it needn't be. It would be trivial to make that behavior well-defined: if you get to the end of a function which declares a return, and you have no return, return a -1, or whatever value you like. Suddenly, the behavior is defined. Why doesn't C do that today? It's not because it would be technically unfeasible.

Quote

Quote

Do you really want to put your hand on your heart and swear that x and y don't have setters for value that could be accessed in the critical period?

I wouldn't have to if the compiler made sure for me, but again I know that that's not possible.


Well, the point I was getting at is that the asserted impossibility of !(A==B || A!=B) only holds if A and B are invariant for the duration of the evaluation, and that's not always the case. Obviously, I don't think this is news to you. This is just nitpicking on my part. :)
Was This Post Helpful? 1
  • +
  • -

#9 ishkabible  Icon User is offline

  • spelling expret
  • member icon




Reputation: 1622
  • View blog
  • Posts: 5,709
  • Joined: 03-August 09

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 02:43 PM

Quote

Well, I think my issue is more with the assertion that undefined behavior is a matter of computability. It's not. It's a matter of definition. If C consistently rejected the constructs in question or handled them consistently, then we wouldn't have anything to talk about here. This is not a matter of computability - java specifies the behaviors in question, the behavior is defined. So the existence of undefined behavior in C comes from someplace else.


bounds checking, null pointer checks, checking if a function pointer points to a valid item, casting an pointer to an integer then back again after preforming a computation on the integer, etc... can't be checked at compile so there IS some(most in fact) undefined behavior that can't be. some undefined behavior is a matter of definition but not most.

actually in an un-hosted environment almost everything is defined accept for those things that *could* be defined(return at end of function, sequence points in expressions, etc...). there are certainly some exceptions(divide by zero for instance which is handled differently on different hardware) but for the most part undefined code that can only be detected at run time is really actually well defined for OS level code. for instances if you have 4 gigabytes of ram and have a 32-bit pointer literally anytime you dereference a pointer it's totally defined. the effect may not be desired but it's well defined.

This post has been edited by ishkabible: 12 June 2012 - 02:47 PM

Was This Post Helpful? 0
  • +
  • -

#10 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2102
  • View blog
  • Posts: 3,207
  • Joined: 21-June 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 02:49 PM

View Postjon.kiparsky, on 12 June 2012 - 11:10 PM, said:

Well, I think my issue is more with the assertion that undefined behavior is a matter of computability.


I didn't mean to assert that. I said (or tried to say) that one way to make the behavior defined would be to say that all programs that would show undefined behavior (for some input) should be rejected by the compiler. And that is uncomputable unless you're also willing to reject some programs that would not exhibit UB (as Java does).


Quote

This is not a matter of computability - java specifies the behaviors in question, the behavior is defined. So the existence of undefined behavior in C comes from someplace else.


I don't know why you think that I disagree with you on that. I already said that much myself.

Quote

Quote

C has undefined behavior. Java doesn't. In this case Java achieved this by rejecting a well-behaved program that C doesn't reject.


The fact that the behavior in C is undefined is not a product of its accepting that construct.


I didn't say it was.

Quote

It's undefined because it doesn't specify how to handle the case of falling off the end of function.


Obviously.

Quote

That could be (as in java) to reject the program, but it needn't be.


Again I already said as much.

Quote

It would be trivial to make that behavior well-defined: if you get to the end of a function which declares a return, and you have no return, return a -1, or whatever value you like. Suddenly, the behavior is defined. Why doesn't C do that today? It's not because it would be technically unfeasible.


Okay, what I was thinking of here is that adding runtime checks for this kind of thing would add performance costs. That's not really true in this case though. It would increase the binary size a little bit though. And it'd be a bit more work for the compiler (another thing C doesn't like).


Quote

Well, the point I was getting at is that the asserted impossibility of !(A==B || A!=B) only holds if A and B are invariant for the duration of the evaluation, and that's not always the case. Obviously, I don't think this is news to you. This is just nitpicking on my part. :)


Yes, but if the standard were to say "The compiler shall reject any non-void method with a 'Y U no return?' error message if and only if it would, for some input, terminate without reaching a return statement" the burden to proof whether A and B are invariant for the duration of the evaluation would be on the compiler. Of course that's not possible and that's why the standard doesn't say that (or at least it's one reason why it couldn't say that).
Was This Post Helpful? 0
  • +
  • -

#11 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 03:16 PM

Quote

bounds checking, null pointer checks, checking if a function pointer points to a valid item, casting an pointer to an integer then back again after preforming a computation on the integer, etc... can't be checked at compile so there IS some(most in fact) undefined behavior that can't be. some undefined behavior is a matter of definition but not most.



Quote

I didn't mean to assert that. I said (or tried to say) that one way to make the behavior defined would be to say that all programs that would show undefined behavior (for some input) should be rejected by the compiler. And that is uncomputable unless you're also willing to reject some programs that would not exhibit UB (as Java does).


I think we may be talking at cross purposes here. I was under the impression that we were talking about behavior undefined by the standard, not behavior that can't be predicted by the compiler. That, at least, is the sense in which a complex expression in postincrements (the original spur for this) is undefined: K&R don't specify when the postincrement is to be applied, so if you're writing the compiler, you don't know whether i ++ + i ++ means i+ (i+1) + 1 or i + i + 1 + 1 or what it means for the language, but you end up making it mean something for your particular implementation.
If we're talking about compiler theory, then yes, I agree that you can't know any of that list of stuff at compile time, but I don't think that was the issue.

From the original post:

Quote

Why make things that have no concrete implementation in a spec legal syntax?

Was This Post Helpful? 0
  • +
  • -

#12 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2102
  • View blog
  • Posts: 3,207
  • Joined: 21-June 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 03:40 PM

View Postjon.kiparsky, on 13 June 2012 - 12:16 AM, said:

I think we may be talking at cross purposes here.


It certainly seems that way.

Quote

I was under the impression that we were talking about behavior undefined by the standard, not behavior that can't be predicted by the compiler.


I was talking about both. Specifically I talked about ways in which behavior that is undefined by the standard, could have been defined by the standard. As one possibility I mentioned that the standard could simply define the behavior to be caught at compile time (in fact the OP mentioned that and I picked it up). And THEN I started talking about what can and can't be predicted by the compiler.


Quote

From the original post:

Quote

Why make things that have no concrete implementation in a spec legal syntax?


Exactly. He asked why the standard doesn't say "If the program does this or that, it shall not compile" instead of "If the program does this or that, the behavior is undefined". To answer that I talked about what the compiler can and can't catch at compile time without rejecting well-behaved programs in the first paragraph of my answer.

After that part I started talking about ways to make undefined behavior defined other than saying it shall not compile. At this point I no longer talked about what the compiler can and can't detect at compile time.
Was This Post Helpful? 0
  • +
  • -

#13 jon.kiparsky  Icon User is offline

  • Pancakes!
  • member icon


Reputation: 7649
  • View blog
  • Posts: 12,905
  • Joined: 19-March 11

Re: Why allow Undefined Behavior?

Posted 12 June 2012 - 05:42 PM

Okay, the pieces begin to come together. Two readings of "why allow undefined behavior", both allowable in English: "why allow a standard to leave behavior undefined?" and "why allow a compiler to pass - that is why does not a compiler prohibit - behavior which is not defined in the standard?". Quite different questions.

It sounded to me like you were arguing that the reason for leaving certain standards undefined was because they would be difficult to handle in the compiler, which seemed quite strange to me, particularly coming from you, who know what you're talking about. Sorry I was so obtuse.

So if I can paraphrase, your position is that it is difficult to correctly cut out all and only the undefined behavior, so the correct response is to allow what the compiler does not declare illegal?
That's a reasonable position to take.
Was This Post Helpful? 0
  • +
  • -

#14 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2102
  • View blog
  • Posts: 3,207
  • Joined: 21-June 11

Re: Why allow Undefined Behavior?

Posted 13 June 2012 - 04:15 AM

View Postjon.kiparsky, on 13 June 2012 - 02:42 AM, said:

Okay, the pieces begin to come together. Two readings of "why allow undefined behavior", both allowable in English: "why allow a standard to leave behavior undefined?" and "why allow a compiler to pass - that is why does not a compiler prohibit - behavior which is not defined in the standard?". Quite different questions.


If he had only said "Why allow it?" I probably would have understood the question the same way you did. But he said "Why make [it] legal syntax?", the "legal syntax" bit is what made me think he wanted a compilation error (also "why allow it to compile at all?" in the sentence right before the one you quoted).

Quote

So if I can paraphrase, your position is that it is difficult to correctly cut out all and only the undefined behavior, so the correct response is to allow what the compiler does not declare illegal?


My position is that it's impossible to correctly cut out all and only the undefined behavior by replacing each undefined behavior in the standard with compilation errors, yes.

As you said it would be perfectly possible to remove UB in other ways.

PS: I just came up with another reason why it would be difficult for C to define a default return value for functions that don't have a return statement: It would make it impossible to return values using inline assembler (or at least it would require some way of declaring an inline assembly block to set the return value, so the compiler knows not to overwrite it - so it would require implementations to change their inline assembly syntax).
Was This Post Helpful? 0
  • +
  • -

#15 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4469
  • View blog
  • Posts: 7,780
  • Joined: 08-June 10

Re: Why allow Undefined Behavior?

Posted 13 June 2012 - 07:21 AM

Quote

If he had only said "Why allow it?" I probably would have understood the question the same way you did. But he said "Why make [it] legal syntax?", the "legal syntax" bit is what made me think he wanted a compilation error (also "why allow it to compile at all?" in the sentence right before the one you quoted).


To be honest, I was asking both questions. I understand that there are some things that can't be caught statically. So, I was basically asking why not define behavior for what can't, and make what can be illegal syntax. But you guys have explained it quite well.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1