It has come to my attention, both here on DIC and in the real world, that many programmers do not actually know that there is a difference between a switch statement and a string of if-else clauses. This seems worthy of correction. You may have been told at some point in your path to becoming a developer that a switch statement is "just like a string of if statements". This is simply not the case, and the difference is worth thinking about.
Before I begin, I should point out that this is really a somewhat academic question. In Java, if you are tempted to use either a string of if statements or a switch statement, you have probably made a mistake and should be doing object-oriented programming instead. However, it is an interesting question in that it takes us down into some implementation decisions, and offers you, the beginner, a chance to see the javap disassembler in action and to examine some jvm bytecode, which is well worth the trouble. So, with that disclaimer disclaimed, let's begin.
Here is a simple java class that has equivalent code executed three ways - as a packed switch, a sparse switch, and an if-chain.
Perfectly simple stuff.
If your Java 101 teacher were correct, all three of these methods would compile to the same jvm code. (as will happen, for example, with equivalent for- and while-loops) If we examine the jvm code, however, we see something quite different.
To see the jvm code, use the javap disassembler, which comes with the java compiler. The syntax is detailed in the man page, of course, but all you need to know for this example is:
$ javap -v CLASSNAME
(note that the classname is not the filename - the filename is CLASSNAME.class.)
So what do we see actually happening?
First, let's look at the if chain.
As you might expect, this simply runs down the list of choices. If we hit a match, we return. This should not be surprising.
What about the switch, then? Well, it turns out there are actually two ways your switch might get compiled, and the difference is about whether the cases are close together or spread out. In the first example, they are close together:
This compiles to
What you'll notice here is the "tableswitch" instruction. What is this?
In the unforgettable words of the JVM Spec:
Tedious and incomprehensible? Yeah, I guess. What it means is: the jvm pops your switch index off the stack, and if it's in the range that you've got instructions for, it goes to one of your instructions, finding the correct instruction by simple arithmetic on your keys. If it's outside that range, it goes to the default case. It is permissible for a compiler to insert keys pointing to default in order to allow the use of a tableswitch.
On the other hand, if your cases are more spread out ("sparse" is the CS jargon) then it'll use a lookup table instead. Here we have cases that are all over the map. We surely wouldn't want to generate a complete table for the values from 13 to 70 just to match four of them and send the rest to the default. So we use a different table. This time, it's a sparse list of
What you see happening here is a lookup table. Instead of a continuous range of integers without breaks, a lookup table is an ordered list of key-offset pairs with arbitrary spacing between keys. The jvm evaluates this lookup table by examining values in sequence from low to high, until it can determine that it will not find the key in question, at which point it proceeds to the default case. If the key is found in the lookup, obviously we proceed to the line at the indicated offset and execution continues from there. Or, as the jvm spec puts it:
So you can see that there is a real difference between a series of if statements and a switch statement. In practical terms, these implementation differences have interesting implications for execution. For example, this explains why a series of if statements can be more complex to debug. Since the if statements are evaluated in sequence, it is entirely possible for multiple branches to be executed. In the switch, multiple cases consequences can be evaluated, but only if the execution "falls through" due to a missing break statement. In no case can more than one condition be evaluated. This analysis also demonstrates why a case clause must be a literal, and cannot be a derived value: since the jump table must be calculated at compile time, the final values for the cases must be known at compile time.
And it should be clear that the switch statement can in principle be more efficient than the if-chain, since the compiler can optimize against it, which cannot be done with a series of ifs.
However, and finally, it should be blindingly obvious that this efficiency will almost certainly never be realized, since a program would have to have a densely-packed key space of hundreds of values (minimally) in order to realize any noticeable execution savings. Therefore, one should not let this potential efficiency play a role in implementation decisions, and instead should choose the implementation strategy that most clearly expresses the program's purpose. In java, this will almost always be done using object-oriented dispatch and not by either switches or if-chains.
Before I begin, I should point out that this is really a somewhat academic question. In Java, if you are tempted to use either a string of if statements or a switch statement, you have probably made a mistake and should be doing object-oriented programming instead. However, it is an interesting question in that it takes us down into some implementation decisions, and offers you, the beginner, a chance to see the javap disassembler in action and to examine some jvm bytecode, which is well worth the trouble. So, with that disclaimer disclaimed, let's begin.
Here is a simple java class that has equivalent code executed three ways - as a packed switch, a sparse switch, and an if-chain.
public class SwitchExample { public int packedSwitch (int a) { switch (a) { case 0: return 7; case 1: return 3; case 2: return 4; case 3: return 0; default: return 123; } } public int sparseSwitch (int a) { switch (a) { case 70: return 7; case 13: return 3; case 26: return 4; case 63: return 0; default: return 123; } } public int ifs (int a) { if (a == 70) return 7; else if (a == 13) return 3; else if (a == 26) return 4; else if (a == 63) return 0; else return 123; } }
Perfectly simple stuff.
If your Java 101 teacher were correct, all three of these methods would compile to the same jvm code. (as will happen, for example, with equivalent for- and while-loops) If we examine the jvm code, however, we see something quite different.
To see the jvm code, use the javap disassembler, which comes with the java compiler. The syntax is detailed in the man page, of course, but all you need to know for this example is:
$ javap -v CLASSNAME
(note that the classname is not the filename - the filename is CLASSNAME.class.)
Spoiler
So what do we see actually happening?
First, let's look at the if chain.
public int ifs (int a) { if (a == 70) return 7; else if (a == 13) return 3; else if (a == 26) return 4; else if (a == 63) return 0; else return 123; }
public int ifs(int); Code: Stack=2, Locals=2, Args_size=2 0: iload_1 1: bipush 70 3: if_icmpne 9 6: bipush 7 8: ireturn 9: iload_1 10: bipush 13 12: if_icmpne 17 15: iconst_3 16: ireturn 17: iload_1 18: bipush 26 20: if_icmpne 25 23: iconst_4 24: ireturn 25: iload_1 26: bipush 63 28: if_icmpne 33 31: iconst_0 32: ireturn 33: bipush 123 35: ireturn
As you might expect, this simply runs down the list of choices. If we hit a match, we return. This should not be surprising.
What about the switch, then? Well, it turns out there are actually two ways your switch might get compiled, and the difference is about whether the cases are close together or spread out. In the first example, they are close together:
public int packedSwitch (int a) { switch (a) { case 0: return 7; case 1: return 3; case 2: return 4; case 3: return 0; default: return 123; } }
This compiles to
public int packedSwitch(int); Code: Stack=1, Locals=2, Args_size=2 0: iload_1 1: tableswitch{ //0 to 3 0: 32; 1: 35; 2: 37; 3: 39; default: 41 } 32: bipush 7 34: ireturn 35: iconst_3 36: ireturn 37: iconst_4 38: ireturn 39: iconst_0 40: ireturn 41: bipush 123 43: ireturn
What you'll notice here is the "tableswitch" instruction. What is this?
In the unforgettable words of the JVM Spec:
Quote
A tableswitch is a variable-length instruction. Immediately after the tableswitch opcode, between 0 and 3 null bytes (zeroed bytes, not the null object) are inserted as padding. The number of null bytes is chosen so that the following byte begins at an address that is a multiple of 4 bytes from the start of the current method (the opcode of its first instruction). Immediately after the padding follow bytes constituting three signed 32-bit values: default, low, and high. Immediately following those bytes are bytes constituting a series of high - low + 1 signed 32-bit offsets. The value low must be less than or equal to high. The high - low + 1 signed 32-bit offsets are treated as a 0-based jump table. Each of these signed 32-bit values is constructed as (byte1 << 24) | (byte2 << 16) | (byte3 << 8) | byte4.
The index must be of type int and is popped from the operand stack. If index is less than low or index is greater than high, then a target address is calculated by adding default to the address of the opcode of this tableswitch instruction. Otherwise, the offset at position index - low of the jump table is extracted. The target address is calculated by adding that offset to the address of the opcode of this tableswitch instruction. Execution then continues at the target address.
The target address that can be calculated from each jump table offset, as well as the ones that can be calculated from default, must be the address of an opcode of an instruction within the method that contains this tableswitch instruction.
The index must be of type int and is popped from the operand stack. If index is less than low or index is greater than high, then a target address is calculated by adding default to the address of the opcode of this tableswitch instruction. Otherwise, the offset at position index - low of the jump table is extracted. The target address is calculated by adding that offset to the address of the opcode of this tableswitch instruction. Execution then continues at the target address.
The target address that can be calculated from each jump table offset, as well as the ones that can be calculated from default, must be the address of an opcode of an instruction within the method that contains this tableswitch instruction.
Tedious and incomprehensible? Yeah, I guess. What it means is: the jvm pops your switch index off the stack, and if it's in the range that you've got instructions for, it goes to one of your instructions, finding the correct instruction by simple arithmetic on your keys. If it's outside that range, it goes to the default case. It is permissible for a compiler to insert keys pointing to default in order to allow the use of a tableswitch.
On the other hand, if your cases are more spread out ("sparse" is the CS jargon) then it'll use a lookup table instead. Here we have cases that are all over the map. We surely wouldn't want to generate a complete table for the values from 13 to 70 just to match four of them and send the rest to the default. So we use a different table. This time, it's a sparse list of
public int sparseSwitch (int a) { switch (a) { case 70: return 7; case 13: return 3; case 26: return 4; case 63: return 0; default: return 123; } }
public int sparseSwitch(int); Code: Stack=1, Locals=2, Args_size=2 0: iload_1 1: lookupswitch{ //4 13: 47; 26: 49; 63: 51; 70: 44; default: 53 } 44: bipush 7 46: ireturn 47: iconst_3 48: ireturn 49: iconst_4 50: ireturn 51: iconst_0 52: ireturn 53: bipush 123 55: ireturn
What you see happening here is a lookup table. Instead of a continuous range of integers without breaks, a lookup table is an ordered list of key-offset pairs with arbitrary spacing between keys. The jvm evaluates this lookup table by examining values in sequence from low to high, until it can determine that it will not find the key in question, at which point it proceeds to the default case. If the key is found in the lookup, obviously we proceed to the line at the indicated offset and execution continues from there. Or, as the jvm spec puts it:
Quote
A lookupswitch is a variable-length instruction. Immediately after the lookupswitch opcode, between zero and three null bytes (zeroed bytes, not the null object) are inserted as padding. The number of null bytes is chosen so that the defaultbyte1 begins at an address that is a multiple of four bytes from the start of the current method (the opcode of its first instruction). Immediately after the padding follow a series of signed 32-bit values: default, npairs, and then npairs pairs of signed 32-bit values. The npairs must be greater than or equal to 0. Each of the npairs pairs consists of an int match and a signed 32-bit offset. Each of these signed 32-bit values is constructed from four unsigned bytes as (byte1 << 24) | (byte2 << 16) | (byte3 << 8) | byte4.
The table match-offset pairs of the lookupswitch instruction must be sorted in increasing numerical order by match.
The key must be of type int and is popped from the operand stack. The key is compared against the match values. If it is equal to one of them, then a target address is calculated by adding the corresponding offset to the address of the opcode of this lookupswitch instruction. If the key does not match any of the match values, the target address is calculated by adding default to the address of the opcode of this lookupswitch instruction. Execution then continues at the target address.
The table match-offset pairs of the lookupswitch instruction must be sorted in increasing numerical order by match.
The key must be of type int and is popped from the operand stack. The key is compared against the match values. If it is equal to one of them, then a target address is calculated by adding the corresponding offset to the address of the opcode of this lookupswitch instruction. If the key does not match any of the match values, the target address is calculated by adding default to the address of the opcode of this lookupswitch instruction. Execution then continues at the target address.
So you can see that there is a real difference between a series of if statements and a switch statement. In practical terms, these implementation differences have interesting implications for execution. For example, this explains why a series of if statements can be more complex to debug. Since the if statements are evaluated in sequence, it is entirely possible for multiple branches to be executed. In the switch, multiple cases consequences can be evaluated, but only if the execution "falls through" due to a missing break statement. In no case can more than one condition be evaluated. This analysis also demonstrates why a case clause must be a literal, and cannot be a derived value: since the jump table must be calculated at compile time, the final values for the cases must be known at compile time.
And it should be clear that the switch statement can in principle be more efficient than the if-chain, since the compiler can optimize against it, which cannot be done with a series of ifs.
However, and finally, it should be blindingly obvious that this efficiency will almost certainly never be realized, since a program would have to have a densely-packed key space of hundreds of values (minimally) in order to realize any noticeable execution savings. Therefore, one should not let this potential efficiency play a role in implementation decisions, and instead should choose the implementation strategy that most clearly expresses the program's purpose. In java, this will almost always be done using object-oriented dispatch and not by either switches or if-chains.
2 Comments On This Entry
Page 1 of 1
Page 1 of 1
Trackbacks for this entry [ Trackback URL ]
← January 2021 →
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
Tags
My Blog Links
Recent Entries
Recent Comments
Search My Blog
0 user(s) viewing
0 Guests
0 member(s)
0 anonymous member(s)
0 member(s)
0 anonymous member(s)