7 Replies - 486 Views - Last Post: 22 February 2014 - 10:21 PM

#1 Ntwiles  Icon User is offline

  • D.I.C Addict

Reputation: 119
  • View blog
  • Posts: 716
  • Joined: 26-May 10

RegEx to Detect URL

Posted 20 February 2014 - 01:31 PM

I'm trying to write a regular expression which will detect a url. For my purposes, a url needs to meet the following criteria:

-Begins with (http:// or https://) or (www.) or both.
-Then any number of the following characters: a-z 0-9 - _ .
-Then a top-level domain (.com,.net,.org,etc) I'll add common ones manually.
-Then any number of the following characters: a-z 0-9 - _ / + % ? & =. (may be missing a few here)

Here's the regex I've come up with:

(https?:\/\/|www\.)[a-z0-9\-_\.]*(\.com|\.net)[a-z0-9/\-_\.\+\%\&\?\=]*


This seems to do well, but javascript's string.match() function returns extra substrings that aren't of use to me. For example, the string 'http://www.google.com' will return:

http://www.google.com
http://
.com



I assume I'm misunderstanding the (|) syntax somehow. Can someone explain to me what's going on here?

This post has been edited by Ntwiles: 20 February 2014 - 02:12 PM


Is This A Good Question/Topic? 0
  • +

Replies To: RegEx to Detect URL

#2 BetaWar  Icon User is offline

  • #include "soul.h"
  • member icon

Reputation: 1105
  • View blog
  • Posts: 6,918
  • Joined: 07-September 06

Re: RegEx to Detect URL

Posted 20 February 2014 - 01:40 PM

match in most languages returns an array of every substring that followed the pattern you were searching for. Typically the item at index 0 (first in the array) is the entire input string, then you get each substring that matched a portion of the pattern after it in separate indexes.

Every time regular expressions see (...) they will likely return a new element in the return array. If you want to have the entire URL found returned then you can just wrap the pattern in a set or parentheses and it will (likely) be the second item in the array.
((https?:\/\/|www\.)[a-z0-9\-_\.]*(\.com|\.net)[a-z0-9/\-_\.\+\% \& \? \=]*)



More info here.
Was This Post Helpful? 1
  • +
  • -

#3 Ntwiles  Icon User is offline

  • D.I.C Addict

Reputation: 119
  • View blog
  • Posts: 716
  • Joined: 26-May 10

Re: RegEx to Detect URL

Posted 20 February 2014 - 02:10 PM

Huh. That seems a little arbitrary. If parentheses are just for grouping, I don't see why it would be helpful to return these sub-patterns. That makes sense though, thanks!
Was This Post Helpful? 0
  • +
  • -

#4 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 2885
  • View blog
  • Posts: 9,581
  • Joined: 12-December 12

Re: RegEx to Detect URL

Posted 20 February 2014 - 02:20 PM

If you only what the first (complete) match, which would be http://www.google.com from your first post, then you could use exec().

If you only need to know if there is a match then you could use test(), which returns true or false.
Was This Post Helpful? 0
  • +
  • -

#5 Ntwiles  Icon User is offline

  • D.I.C Addict

Reputation: 119
  • View blog
  • Posts: 716
  • Joined: 26-May 10

Re: RegEx to Detect URL

Posted 20 February 2014 - 02:51 PM

What I actually need is an array of every complete match, so match() I think is what I want.

I finally used the 'g' flag, and this seems to be my solution. This returns all matches, but also for some reason excludes matches of these sub-patterns created by parentheses:

/(https?:\/\/|www\.)[a-z0-9\-_\.]+(\.com|\.net|\.org)[a-z0-9/\-_\.\+\%\&\?\=]*/g

Was This Post Helpful? 0
  • +
  • -

#6 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 2885
  • View blog
  • Posts: 9,581
  • Joined: 12-December 12

Re: RegEx to Detect URL

Posted 20 February 2014 - 03:00 PM

It would help if you provided an example of a string you are searching, and the matches you want to obtain from it.
Was This Post Helpful? 0
  • +
  • -

#7 Ntwiles  Icon User is offline

  • D.I.C Addict

Reputation: 119
  • View blog
  • Posts: 716
  • Joined: 26-May 10

Re: RegEx to Detect URL

Posted 20 February 2014 - 03:06 PM

I think you misunderstood; my post above was the solution to my problem. But for posterity, that regex is meant to pull only the url out of any string:

input: This is an example string. It has a link to www.google.com in it, and another one to http://dreamincode.net, too.
output: www.google.com,http://dreamincode.net

Was This Post Helpful? 0
  • +
  • -

#8 felgall  Icon User is offline

  • D.I.C Head

Reputation: 10
  • View blog
  • Posts: 79
  • Joined: 22-February 14

Re: RegEx to Detect URL

Posted 22 February 2014 - 10:21 PM

(...) in a regular expression is capturing by default. To make the groups non-capturing add ?: to the front of the group so that it reads (?:...) instead [where ... is the content of the group]
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1