String Split

Odd little function...

Page 1 of 1

6 Replies - 1361 Views - Last Post: 27 April 2009 - 09:13 PM Rate Topic: -----

#1 NickDMax  Icon User is offline

  • Can grep dead trees!
  • member icon

Reputation: 2250
  • View blog
  • Posts: 9,245
  • Joined: 18-February 07

String Split

Posted 26 April 2009 - 09:07 PM

Case A:
	public static void main(String[] args) {
		String str ="aa";
		String[] lines = str.split("a");
		System.out.println(lines.length);
		
	}
prints 0

Case B:
	public static void main(String[] args) {
		String str =" aa";
		String[] lines = str.split("a");
		System.out.println(lines.length);
		
	}
prints 1...

Case C:
public static void main(String[] args) {
		String str ="a a";
		String[] lines = str.split("a");
		System.out.println(lines.length);
		
	}
prints 2...

Case D:
public static void main(String[] args) {
		String str ="aa ";
		String[] lines = str.split("a");
		System.out.println(lines.length);
		
	}
prints 3!

This is not very funny to me right now. I really needed this to work correctly but it does not and I just can't get away with saying, "its a bug in java" -- I am not even sure that it IS a bug -- reading the documentation the first condition makes sense and I can even accept the second, but the last one baffles me (in light of the first one).

Funny little thing!! If you find youself scatching your head and saying "Its a bug in java!" YOUR WRONG (just as I was). The key is this little line form the documentation:

Quote

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero.
When you goto the two-argument version it tells you:

Quote

...yields the same result as the expression yields the same result as the expression

Pattern.compile(regex).split(str, n)
And here you find your answer:

Quote

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.


So... if you use: str.split("a", -1); you will consistently give you an answer of 3...

Of course is still a little strange...
"aa" -> {"", "", ""}
" aa" -> {" ", "", ""}
"a a" -> {"", " ", ""}
"aa " -> {"", "", " "}

Its the first three that get on my nerves... why the extra result? -- well it is returning the "remaining" portion of the string... and since you told it that you DID want empty results, well you are going to get them.

I am still trying to figure out a way to easily take out that last answer when it is just an empty "whats left"

Is This A Good Question/Topic? 0
  • +

Replies To: String Split

#2 SayMoi  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 135
  • Joined: 08-April 09

Re: String Split

Posted 27 April 2009 - 12:57 AM

Ok I definitely understand why it works the way it does. But wouldn't just calling trim() before you split the String do the trick? (Except the third case of course, then you'd want replace() or something…)
Was This Post Helpful? 0
  • +
  • -

#3 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: String Split

Posted 27 April 2009 - 04:09 AM

nah, you shouldnt trim it
Was This Post Helpful? 0
  • +
  • -

#4 NickDMax  Icon User is offline

  • Can grep dead trees!
  • member icon

Reputation: 2250
  • View blog
  • Posts: 9,245
  • Joined: 18-February 07

Re: String Split

Posted 27 April 2009 - 04:50 AM

Actually to do what I need I think that I will need to ditch split and just go to the regex matcher directly. What I am trying to do is to split up a block of text into lines. I have told people over and over that: string.split("\n") or something like string.split("\r\n|\n") etc. These don't work! I mean for your average use they are probably ok, but for my use (a print stream) they don't work since a string like "\n".split("\n") returns no results... but I need it to return 1 since I need the print stream to insert a new line.

str = "\n";
str.split("\n"); -- returns too few
str.split("\n", -1); -- returns too many
str.split("\n", str.length); -- returns just right but is broken for strings like "hello\n\n"

its just one of these times when the "close enough" solution provided by the API was not good enough. I need str.split("\n", -1/2) -- I need something that does not ignore ALL empty strings on the end, just the last one.
Was This Post Helpful? 0
  • +
  • -

#5 NickDMax  Icon User is offline

  • Can grep dead trees!
  • member icon

Reputation: 2250
  • View blog
  • Posts: 9,245
  • Joined: 18-February 07

Re: String Split

Posted 27 April 2009 - 02:43 PM

If anyone is interested here is a string splitter that works a little different. This one will return a string for each find, and if the remaining portion of the string is not empty then it will return that as well.

So to use java's example "foo:bar:foo"

if you search for ":" you will get {"foo", "bar", "foo"}

but if you search for "o" you will get {"f","",":bar:f",""}

if you search for "a" in "aaa" you will get { "", "", "" }

Most of the time this is probably not important but for my specific needs it was.... might come up again some day:
public class StringSpliter {
	/**
	 * Splits a string according to a regex.
	 * Although similar to <tt>String.split(string, -1)</tt> this utility function as a slightly different functionality.
	 * This will split a string  for each subsequent instance of the pattern. Unlike the split function using a 
	 * limit of -1, this function will not return the remaining portion of the string if it is empty. This means
	 * that a usage like:<br>
	 * 
	 * <p> The input <tt>"boo:and:foo"</tt>, for example, yields the following
	 * results with these parameters:
	 *
	 * <blockquote><table cellpadding=1 cellspacing=0 
	 *			  summary="Split examples showing regex, limit, and result">
	 * <tr><th><P align="left"><i>Regex&nbsp;&nbsp;&nbsp;&nbsp;</i></th>
	 *	 <th><P align="left"><i>Result&nbsp;&nbsp;&nbsp;&nbsp;</i></th></tr>
	 * <tr><td align=center>:</td>
	 *	 <td><tt>{ "boo", "and", "foo" }</tt></td></tr>
	 * <tr><td align=center>:</td>
	 * <tr><td align=center>o</td>
	 *	 <td><tt>{ "b", "", ":and:f", "" }</tt></td></tr>
	 * </table></blockquote>	 *  
	 * @param in
	 * @param pattern
	 * @return
	 */
	public static String[] splitString(String in, String regex) {
		return splitString(in, Pattern.compile(regex));
	}
	
	/**
	 * Splits a string according to a regex.
	 * Although similar to <tt>String.split(string, -1)</tt> this utility function as a slightly different functionality.
	 * This will split a string  for each subsequent instance of the pattern. Unlike the split function using a 
	 * limit of -1, this function will not return the remaining portion of the string if it is empty. This means
	 * that a usage like:<br>
	 * 
	 * <p> The input <tt>"boo:and:foo"</tt>, for example, yields the following
	 * results with these parameters:
	 *
	 * <blockquote><table cellpadding=1 cellspacing=0 
	 *			  summary="Split examples showing regex, limit, and result">
	 * <tr><th><P align="left"><i>Regex&nbsp;&nbsp;&nbsp;&nbsp;</i></th>
	 *	 <th><P align="left"><i>Result&nbsp;&nbsp;&nbsp;&nbsp;</i></th></tr>
	 * <tr><td align=center>:</td>
	 *	 <td><tt>{ "boo", "and", "foo" }</tt></td></tr>
	 * <tr><td align=center>:</td>
	 * <tr><td align=center>o</td>
	 *	 <td><tt>{ "b", "", ":and:f", "" }</tt></td></tr>
	 * </table></blockquote>	 *  
	 * @param in
	 * @param pattern
	 * @return
	 */
	public static String[] splitString(String in, Pattern regex) {
		ArrayList matches = new ArrayList(); //used to hold results as they are found
		int index = 0; //used to keep track of the end of the last result found.
		Matcher m = regex.matcher(in);
		while (m.find()) {
			//get everything from end of last match up to (but not including) the match...
			String match = in.subSequence(index, m.start()).toString();
			matches.add(match);
			index = m.end();
		}
		String last = in.subSequence(index, in.length()).toString();
		//if the remaining text is not empty then we will add it to the list...
		if (last != null && last.length() != 0) { matches.add(last); }
		return (String[])results.toArray(new String[matches.size()]);
	}

}


Note this is essentially the one from the Java API I just removed all of the limit checking and added the last check to see if the final result should be added.
Was This Post Helpful? 0
  • +
  • -

#6 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: String Split

Posted 27 April 2009 - 04:04 PM

thanks for sharing this Nick, java can be tricky sometimes...btw you rock dude (Y)
Was This Post Helpful? 0
  • +
  • -

#7 pbl  Icon User is offline

  • There is nothing you can't do with a JTable
  • member icon

Reputation: 8328
  • View blog
  • Posts: 31,857
  • Joined: 06-March 08

Re: String Split

Posted 27 April 2009 - 09:13 PM

public class NickDMax {
	
	  public static void main(String[] args) {
		  // extract all sting delimed by a and a
			String str ="aa";
			String[] lines = str.split("a");
			// there are none should print 0
			System.out.println(lines.length);
			
			str =" aa";
			lines = str.split("a");
			// there are one "aa"
			System.out.println(lines.length);
			
			str ="a a";
			lines = str.split("a");
			// there are 2 "a" and "a" (the leftone and the right one)
			System.out.println(lines.length);
			
			str ="aa ";
			lines = str.split("a");
			// there are 3 "a", "aa", "a " 
			System.out.println(lines.length);

		}
}


// you are right it is not obviousat all... split is badly described
// better to use a StringTokenizer... I wrote mmillions lines of Java... nver used String.slpit() method
//and have to go back to the API to answer your question
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1