Subscribe to cfoley's Blog        RSS Feed
***** 5 Votes

s = new String(s);

Icon 12 Comments
I recently discovered a memory leak in one of my Java applications.

"What's this?", I hear you say.
"Isn't the point of the garbage collector to keep Java free from memory leaks?"

Well yes, to a point. The garbage collector will clear up after you when you stop using old objects but if you leave a reference to obsolete data then it still counts as being in use and the garbage collector can't free up the memory. This is exactly what happened in my application but it was due to some undocumented behaviour in the String class.



Wastefully Hoarding Characters


I was reading a large number of long strings from a file, extracting only a few characters from each and discarding the rest. Here's a quick example with a single String:

String s = "abcdefghijklmnopqrstuvwxyz";
s = s.substring(3, 6);

Now the String s contains the String "def" as expected. But the memory consumption was way over the top. I actually only noticed when I attempted to use a 500 MB text file and ran out of memory. Time to look at the inner workings of the String class!



Inner Workings of a String


There are three interesting instance variables:

value, a char[] which stores the characters in the String.
offset, an int for the index of the first character.
count, an int for the length of the String.

This already looks suspicious. Why would you need offset and count? Surely the char array shouldn't contain any extra characters! The answer is that it can contain extra characters, and using the substring() method pretty much guarantees it!

The substring() method creates a new String object that shares the same char array as the original, with an appropriate offset and count (3 and 3 in my example above). This is safe because Strings in Java are immutable: Once created, they can never be changed.

We can prove this with reflection. Here's a method that uses reflection to get the char array from a String.

	static char[] getInnerChars(String s) throws SecurityException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException {
		Field innerCharArray = String.class.getDeclaredField("value");
		innerCharArray.setAccessible(true);
		char[] chars = (char[])innerCharArray.get(s);
		return chars;
	}

We can use this to analyse the example above:

String s = "abcdefghijklmnopqrstuvwxyz";
s = s.substring(3, 6);
System.out.println(s);
System.out.println(Arrays.toString(getInnerChars(s)));
def
[a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z]



Topping and Tailing to a Solution


It's not so bad to waste one alphabet like this, but it's a lot of waste to keep 500 MB file in memory when you only need three chars from each line! Fortunately, the solution is simple: create a new String from the substring.

s = new String(s);
System.out.println(s);
System.out.println(Arrays.toString(getInnerChars(s)));
def
[d, e, f]

The char array is trimmed down and that old one is free to be garbage collected. Almost perfect! I say almost because this is a workaround for undocumented behaviour. It's possible (though unlikely) for Oracle to change their mind about their String implemetation, changing the behaviour of substring() or breaking my workaround. If the workaround is sprinkled throughout all my code in all my projects this could be a major headache. Better to isolate it in one place. To that end I've placed these methods in a utility class:

	public static String freshSubstring(String s, int beginIndex, int endIndex) {
		return new String(s.substring(beginIndex, endIndex));
	}

	public static String freshSubstring(String s, int beginIndex) {
		return new String(s.substring(beginIndex));
	}

Now if future changes to Java break my workaround, I have one small piece of code to update.



Only Use the Workaround When It's Essential


One final question remains. Should you and I use this technique every time we want a substring?

No!

The normal way of doing it is very efficient and rarely problematic. The memory leak only happens when we discard the original string, and even then is only a problem if we are holding onto lots of unused characters (and I do mean lots). It's also possible to imagine a scenario where you keep the original String and also make lots of substrings. It's definitely more memory efficient to share the char array in this example. Then there is a typical usage. How many times have you done something like this:

int x = Integer.parseInt(s.substring(3, 6,).trim());

Trim() behaves similarly to substring in sharing the char array with the new String.(indeed, it makes a call to substring() after it works out where the whitespace is). If these methods copied characters, some would be copied twice before being passed to parseInt. The way the String class is set up, they are never copied even once. Perfect for a String object which is used once and immediately discarded.



In Conclusion


Here I've discussed a potential memory leak for Java programmers to watch out for, had a look at the implementation of the String class, described an easy solution, and commented on when the solution is applicable. I hope you enjoyed reading and thanks for making it to the end.

12 Comments On This Entry

Page 1 of 1

Dogstopper Icon

12 January 2011 - 07:00 AM
Wow. Nice discovery and great research cfoley. Well done! I plan on using this next time I have long Strings to manage.
2

cfoley Icon

12 January 2011 - 07:11 AM
Thanks!

Another workaround is the String.intern() method. You have to be careful there too since once a String is interned, it stays there forever. If you're dealing with large numbers of identical substrings it does make sense. I might use it since I'm dealing with element symbols, and the same ~20 elements appear over and over in the biological systems I'm working with.
2

KYA Icon

12 January 2011 - 08:08 AM
There should be a programming MythBusters-esque show.
2

Munawwar Icon

12 January 2011 - 12:10 PM
+1. Excellent find. Thanks for sharing.
1

Locke Icon

12 January 2011 - 12:53 PM

KYA, on 12 January 2011 - 09:08 AM, said:

There should be a programming MythBusters-esque show.


Make one. :D
0

bronze Icon

12 January 2011 - 05:59 PM
thanks for a good read :D Kind of a genius for tracking this down ;)
1

cfoley Icon

13 January 2011 - 05:47 AM
Thanks to whoever linked this on dzone.com.
It's nice to log on and see it at the top of the front page. :)
0

MrLuke187 Icon

13 January 2011 - 05:59 AM
Nice :tup:
1

NickDMax Icon

21 January 2011 - 05:03 PM
nice! good knowledge.
1

bhandari Icon

02 February 2011 - 10:18 AM
Thanks for the information

<a href="http://extreme-java.blogspot.com">http://extreme-java.blogspot.com</a>
0

blackcompe Icon

12 December 2011 - 07:04 PM
Very Interesting. Great research!
1

jon.kiparsky Icon

13 February 2012 - 10:45 AM
Good stuff. Bookmarked.
1
Page 1 of 1

July 2014

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
272829 30 31  

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    1 user(s) viewing

    1 Guests
    0 member(s)
    0 anonymous member(s)