Subscribe to ...considered harmful        RSS Feed
-----

Switch Statements in Java

Icon 1 Comments
It has come to my attention, both here on DIC and in the real world, that many programmers do not actually know that there is a difference between a switch statement and a string of if-else clauses. This seems worthy of correction. It is simply not the case.

Before I begin, I should point out that this is really a somewhat academic question. In Java, if you are tempted to use either a string of if statements or a switch statement, you have probably made a mistake and should be doing object-oriented programming instead. However, it is an interesting question in that it takes us down into some implementation decisions, and offers you, the beginner, a chance to see the javap disassembler in action and to examine some jvm bytecode, which is well worth the trouble. So, with that disclaimer disclaimed, let's begin.

Here is a simple java class that has equivalent code executed three ways - as a packed switch, a sparse switch, and an if-chain.

public class SwitchExample {
	public int packedSwitch (int a)
 	{
		
		switch (a) {
			case 0:	return 7;
			case 1: 	return 3;
			case 2:	return 4;
			case 3: 	return 0;
			default:	return 123;
		}
 
	}
	public int sparseSwitch (int a)
 	{
		
		switch (a) {
			case 70:		return 7;
			case 13: 	return 3;
			case 26:		return 4;
			case 63: 	return 0;
			default:	return 123;
		}
 
	}
	public int ifs (int a)
 	{
			if (a == 70) 		return 7;
			else if (a == 13) 	return 3;
			else if (a == 26)		return 4;
			else if (a == 63) 	return 0;
			else 	return 123;
 
	}
}




Perfectly simple stuff.

If your Java 101 teacher were correct, all three of these methods would compile to the same jvm code. (as will happen, for example, with equivalent for- and while-loops) If we examine the jvm code, however, we see something quite different.
To see the jvm code, use the javap disassembler, which comes with the java compiler. The syntax is detailed in the man page, of course, but all you need to know for this example is:

$ javap -v CLASSNAME

(note that the classname is not the filename - the filename is CLASSNAME.class.)

Spoiler



So what do we see actually happening?

First, let's look at the if chain.

	public int ifs (int a)
 	{
			if (a == 70) 		return 7;
			else if (a == 13) 	return 3;
			else if (a == 26)		return 4;
			else if (a == 63) 	return 0;
			else 	return 123;
 
	}


public int ifs(int);
  Code:
   Stack=2, Locals=2, Args_size=2
   0:	iload_1
   1:	bipush	70
   3:	if_icmpne	9
   6:	bipush	7
   8:	ireturn
   9:	iload_1
   10:	bipush	13
   12:	if_icmpne	17
   15:	iconst_3
   16:	ireturn
   17:	iload_1
   18:	bipush	26
   20:	if_icmpne	25
   23:	iconst_4
   24:	ireturn
   25:	iload_1
   26:	bipush	63
   28:	if_icmpne	33
   31:	iconst_0
   32:	ireturn
   33:	bipush	123
   35:	ireturn



As you might expect, this simply runs down the list of choices. If we hit a match, we return. This should not be surprising.


What about the switch, then? Well, it turns out there are actually two ways your switch might get compiled, and the difference is about whether the cases are close together or spread out. In the first example, they are close together:

	public int packedSwitch (int a)
 	{
		
		switch (a) {
			case 0:	return 7;
			case 1: 	return 3;
			case 2:	return 4;
			case 3: 	return 0;
			default:	return 123;
		}
 
	}



This compiles to
public int packedSwitch(int);
  Code:
   Stack=1, Locals=2, Args_size=2
   0:	iload_1
   1:	tableswitch{ //0 to 3
		0: 32;
		1: 35;
		2: 37;
		3: 39;
		default: 41 }
   32:	bipush	7
   34:	ireturn
   35:	iconst_3
   36:	ireturn
   37:	iconst_4
   38:	ireturn
   39:	iconst_0
   40:	ireturn
   41:	bipush	123
   43:	ireturn



What you'll notice here is the "tableswitch" instruction. What is this?

In the unforgettable words of the JVM Spec:

Quote

A tableswitch is a variable-length instruction. Immediately after the tableswitch opcode, between 0 and 3 null bytes (zeroed bytes, not the null object) are inserted as padding. The number of null bytes is chosen so that the following byte begins at an address that is a multiple of 4 bytes from the start of the current method (the opcode of its first instruction). Immediately after the padding follow bytes constituting three signed 32-bit values: default, low, and high. Immediately following those bytes are bytes constituting a series of high - low + 1 signed 32-bit offsets. The value low must be less than or equal to high. The high - low + 1 signed 32-bit offsets are treated as a 0-based jump table. Each of these signed 32-bit values is constructed as (byte1 << 24) | (byte2 << 16) | (byte3 << 8) | byte4.

The index must be of type int and is popped from the operand stack. If index is less than low or index is greater than high, then a target address is calculated by adding default to the address of the opcode of this tableswitch instruction. Otherwise, the offset at position index - low of the jump table is extracted. The target address is calculated by adding that offset to the address of the opcode of this tableswitch instruction. Execution then continues at the target address.

The target address that can be calculated from each jump table offset, as well as the ones that can be calculated from default, must be the address of an opcode of an instruction within the method that contains this tableswitch instruction.



Tedious and incomprehensible? Yeah, I guess. What it means is: the jvm pops your switch index off the stack, and if it's in the range that you've got instructions for, it goes to one of your instructions, finding the correct instruction by simple arithmetic on your keys. If it's outside that range, it goes to the default case. It is permissible for a compiler to insert keys pointing to default in order to allow the use of a tableswitch.

On the other hand, if your cases are more spread out ("sparse" is the CS jargon) then it'll use a lookup table instead. Here we have cases that are all over the map. We surely wouldn't want to generate a complete table for the values from 13 to 70 just to match four of them and send the rest to the default. So we use a different table. This time, it's a sparse list of
	public int sparseSwitch (int a)
 	{
		
		switch (a) {
			case 70:	return 7;
			case 13:	return 3;
			case 26:	return 4;
			case 63:	return 0;
			default:	return 123;
		}
 
	}



public int sparseSwitch(int);
  Code:
   Stack=1, Locals=2, Args_size=2
   0:	iload_1
   1:	lookupswitch{ //4
		13: 47;
		26: 49;
		63: 51;
		70: 44;
		default: 53 }
   44:	bipush	7
   46:	ireturn
   47:	iconst_3
   48:	ireturn
   49:	iconst_4
   50:	ireturn
   51:	iconst_0
   52:	ireturn
   53:	bipush	123
   55:	ireturn


What you see happening here is a lookup table. Instead of an continuous range of integers without breaks, a lookup table is an ordered list of key-offest pairs with arbitrary spacing between keys. The jvm evaluates this lookup table by examining values in sequence from low to high, until it can determine that it will not find the key in question, at which point it proceeds to the default case. If the key is found in the lookup, obviously we proceed to the line at the indicated offset and execution continues from there. Or, as the jvm spec puts it:

Quote

A lookupswitch is a variable-length instruction. Immediately after the lookupswitch opcode, between zero and three null bytes (zeroed bytes, not the null object) are inserted as padding. The number of null bytes is chosen so that the defaultbyte1 begins at an address that is a multiple of four bytes from the start of the current method (the opcode of its first instruction). Immediately after the padding follow a series of signed 32-bit values: default, npairs, and then npairs pairs of signed 32-bit values. The npairs must be greater than or equal to 0. Each of the npairs pairs consists of an int match and a signed 32-bit offset. Each of these signed 32-bit values is constructed from four unsigned bytes as (byte1 << 24) | (byte2 << 16) | (byte3 << 8) | byte4.
The table match-offset pairs of the lookupswitch instruction must be sorted in increasing numerical order by match.
The key must be of type int and is popped from the operand stack. The key is compared against the match values. If it is equal to one of them, then a target address is calculated by adding the corresponding offset to the address of the opcode of this lookupswitch instruction. If the key does not match any of the match values, the target address is calculated by adding default to the address of the opcode of this lookupswitch instruction. Execution then continues at the target address.



So you can see that there is a real difference between a series of if statements and a switch statement. In practical terms, these implementation differences have interesting implications for execution. For example, this explains why a series of if statements can be more complex to debug. Since the if statements are evaluated in sequence, it is entirely possible for multiple branches to be executed. In the switch, multiple cases consequences can be evaluated, but only if the execution "falls through" due to a missing break statement. In no case can more than one condition be evaluated. This analysis also demonstrates why a case clause must be a literal, and cannot be a derived value: since the jump table must be calculated at compile time, the final values for the cases must be known at compile time.
And it should be clear that the switch statement can in principle be more efficient than the if-chain, since the compiler can optimize against it, which cannot be done with a series of ifs.

However, and finally, it should be blindingly obvious that this efficiency will almost certainly never be realized, since a program would have to have a densely-packed key space of hundreds of values (minimally) in order to realize any noticeable execution savings. Therefore, one should not let this potential efficiency play a role in implementation decisions, and instead should choose the implementation strategy that most clearly expresses the program's purpose. In java, this will almost always be done using object-oriented dispatch and not by either switches or if-chains.

1 Comments On This Entry

Page 1 of 1

modi123_1 Icon

23 April 2014 - 07:33 AM
Definitely something to chew on.

Posted Image
0
Page 1 of 1

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

Recent Entries

July 2014

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
272829 30 31