10 Replies - 2875 Views - Last Post: 09 November 2011 - 04:27 AM Rate Topic: -----

#1 Mr. Ed  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 08-November 11

Easy way to remove/subsitute from string?

Posted 08 November 2011 - 09:05 AM

I come from a background in other programming and scripting languages and it is usually fairly straight forward. However, in Java, I'm having some difficulty trying to do what I consider a pretty trivial task.

String url = "http://www.google.com";

Pattern pcleanurl = Pattern.compile("regex");
Matcher mcleanurl = pcleanurl.matcher(url);
mcleanurl.replaceAll("");



How do I remove anything before and after Google? I understand sometimes there won't be http, sometimes there won't be www. But what would be the method of getting just the domain from the URL?

I did a little looking in pattern matcher and found a replaceAll. It seems a little out of the ordinary to have to perform matching/substitions in this manner at all when in some others it's a built-in function.

my $url = "http://www.google.com";
$url =~ s/http://;
$url =~ s/www\.://>/;



The above isn't perfect nor complete but it's so simple to remove anything from a string.

Is This A Good Question/Topic? 0
  • +

Replies To: Easy way to remove/subsitute from string?

#2 Fuzzyness  Icon User is offline

  • Comp Sci Student
  • member icon

Reputation: 669
  • View blog
  • Posts: 2,438
  • Joined: 06-March 09

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 09:14 AM

You can use 3 methods to get the full Domain

indexOf()
lastIndexOf()
substring()

Use the char ' . ' as the parameter for indexOf(). This will give you the index of the first . ignoring any http://www. <-- indexOf

use the char ' . ' again for lastIndexOf(), this will return the last . in the URL, so usually stopping at the .com portion

Use both of those commands in a substring, the indexOf +1, so you grab the char after the first . and then for the end portion use lastIndexOf('.'). That will return the int that the last . is at. In substring, the end will be 1 index behind. So tell it to stop at '.' and it will show you the char before the '.' but not the '.'

So calling substring and the other 2 commands on the url www.google.com will return google.

make sense?
Was This Post Helpful? 0
  • +
  • -

#3 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 2002
  • View blog
  • Posts: 4,167
  • Joined: 11-December 07

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 09:26 AM

How about:

String result = original.replaceAll("http://", "").replaceAll("www.", "");


To clarify, by design Java is a small language. Most of its higher level functionality comes from its class library and regex is no exception. Unfortunately, Java's regex library syntax is a bit clumsy but many of the common classes have a simple implementation in the String class.
Was This Post Helpful? 0
  • +
  • -

#4 Fuzzyness  Icon User is offline

  • Comp Sci Student
  • member icon

Reputation: 669
  • View blog
  • Posts: 2,438
  • Joined: 06-March 09

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 09:28 AM

That would still leave him with all of the extensions. Like this topic here:
http://www.dreaminco...te-from-string/

Would leave him with: dreamincode.net/forums/topic/254827-easy-way-to-removesubsitute-from-string/

he wants just the domain which would be Dreamincode.. can get that by:
String url = "http://www.dreamincode.net/forums/topic/254827-easy-way-to-removesubsitute-from-string/";
String domain = url.substring(url.indexOf('.'), url.lastIndexOf('.'));
System.out.println(domain);


Would print only dreamincode
Was This Post Helpful? 1
  • +
  • -

#5 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 2002
  • View blog
  • Posts: 4,167
  • Joined: 11-December 07

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 09:41 AM

Oh yes. Looks like I misread his question. :)
Was This Post Helpful? 1
  • +
  • -

#6 askageek  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 21-October 11

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 10:05 AM

View PostFuzzyness, on 08 November 2011 - 09:28 AM, said:

That would still leave him with all of the extensions. Like this topic here:
http://www.dreaminco...te-from-string/

Would leave him with: dreamincode.net/forums/topic/254827-easy-way-to-removesubsitute-from-string/

he wants just the domain which would be Dreamincode.. can get that by:
String url = "http://www.dreamincode.net/forums/topic/254827-easy-way-to-removesubsitute-from-string/";
String domain = url.substring(url.indexOf('.'), url.lastIndexOf('.'));
System.out.println(domain);


Would print only dreamincode


I don't think that's quite right. My output shows
.google



instead of
google



Also when I tested the url as http://www.google.co...n/test/ajax.cgi it showed
.google.com/cgi-bin/test/ajax



Instead of just the domain
google


Was This Post Helpful? 0
  • +
  • -

#7 Fuzzyness  Icon User is offline

  • Comp Sci Student
  • member icon

Reputation: 669
  • View blog
  • Posts: 2,438
  • Joined: 06-March 09

Re: Easy way to remove/subsitute from string?

Posted 08 November 2011 - 10:18 AM

So you change it to url.indexOf('.') +1 , Been awhile since I used it my apologies.

So you have to take it one step at a time incase there is a second .ext
String url = "http://www.google.co...n/test/ajax.cgi";
String domain = url.substring(url.indexOf('.'));
domain = domain.substring(0, domain.indexOf('.'));



Reason have to do it this way then is separate it so you get rid of the http://www. in the first domain, and then just assign the variable again grabbing the first letter all the way to the next .
Was This Post Helpful? 0
  • +
  • -

#8 masijade  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 196
  • View blog
  • Posts: 580
  • Joined: 03-April 10

Re: Easy way to remove/subsitute from string?

Posted 09 November 2011 - 01:22 AM

How about URL, getHost, and replaceFirst?

	public static void main(String[] args) throws Exception {
		String urlString = "http://www.google.co...n/test/ajax.cgi";
		URL url = new URL(urlString);
		System.out.println(url.getHost().replaceFirst("^[^\\.]+\\.([^\\.]+)\\..*$", "$1"));
	}


Was This Post Helpful? 0
  • +
  • -

#9 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 2002
  • View blog
  • Posts: 4,167
  • Joined: 11-December 07

Re: Easy way to remove/subsitute from string?

Posted 09 November 2011 - 02:27 AM

what about http://google.co.uk? These kinds of things are difficult to program.
Was This Post Helpful? 0
  • +
  • -

#10 masijade  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 196
  • View blog
  • Posts: 580
  • Joined: 03-April 10

Re: Easy way to remove/subsitute from string?

Posted 09 November 2011 - 03:14 AM

View Postcfoley, on 09 November 2011 - 11:27 AM, said:

what about http://google.co.uk? These kinds of things are difficult to program.

If that was for me, my example would still work. It returns, of course, co, since you are using google as a hostname rather than domain name, but you can, always, (when you've got a URL object as you do here) actually open the connection and get the "real" url back from the open connection, which the url you gave gets "rerouted" to www.google.co.uk.

This post has been edited by masijade: 09 November 2011 - 03:15 AM

Was This Post Helpful? 0
  • +
  • -

#11 g00se  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2728
  • View blog
  • Posts: 11,470
  • Joined: 20-September 08

Re: Easy way to remove/subsitute from string?

Posted 09 November 2011 - 04:27 AM

Quote

How do I remove anything before and after Google? I understand sometimes there won't be http, sometimes there won't be www. But what would be the method of getting just the domain from the URL?


Your question is really 'how do i capture the domain'. I would suggest the URI class, but it doesn't really like there being no protocol present. You could of course prepend one (http:) and try that, or you could try something like the following. You will need to OR in all the other domain extensions you're likely to encounter


	final String INTERNET_HOST_PATTERN = "(?:http://)?(?:www\\.)?(.*?(?:(/>\\.com)|(\\.co\\.uk)))";
	Pattern p = Pattern.compile(INTERNET_HOST_PATTERN);
	Matcher m = p.matcher(args[0]);
	if (m.find()) {
	    System.out.println(m.group(1));
	}


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1