Subscribe to A Programmer's Casual Blog        RSS Feed
-----

Regex sources

Icon 1 Comments
Well, there's nothing casual about regex. I swear it's gotta be the most confounding thing I've ever encountered, and I've seen FLCL before!

So I'm studying it, and I guess I need a place to keep all of my favorite information sources, so here's that place.

My Favorite Sources of regex information:
http://rubular.com/
http://www.rubyist.n...uby/regexp.html
http://www.regular-e...o/examples.html

Examples:
http://www.dreaminco...3&#entry1360163

Get multiple things at the same time:
http://www.pastie.org/1468271

Get multiple things, plus some expert tips!
http://www.dreaminco...se-into-a-hash/

In my country, it is considered rude to create a blog entry with out at least some helpful, original information. Therefore I shall demonstrate the basics of using regex in a few of the languages I know:

C# (presently my favorite language)
            string myText = "Oh god it's good to be back in c#.  My IP address on this comp is 192.168.0.2.  Wanna hear a joke?  \nserver{ \nlocation { \nroot  I'm right behind you!\n  } \n} I didn't say it was a funny joke, I just said it was a joke!";

            Match match = Regex.Match(myText, @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",
                RegexOptions.IgnoreCase);

            MessageBox.Show("Found a match for an IP Address!  " + match.ToString());



Ruby (presently my unfavorite, but going to be my new favorite)
my_text = "Oh god it's good to be back in c#.  My IP address on this comp is 192.168.0.2.  Wanna hear a joke?  \nserver{ \nlocation { \nroot  I'm right behind you!\n  } \n} I didn't say it was a funny joke, I just said it was a joke!"

"hello world" =~ /world/   # returns 6, because index slot 6 is where 'world' was in the string.  
#  =~ is kind of like str.IndexOf(), but for regular expressions, not just 'find'

puts $~          # returns "world" because that was the last match caught by the =~ operator!
                 # Kind of like magic isn't it?  
                 # The variable is thread local and method local, fyi.  


my_text =~ /server *{(...).*}/m

$~.to_s   # returns "server \nserver{ \nlocation { \nroot  I'm right behind you!\n  } \n}"  
          # Holy shit balls that's cool!




Let me explain that last one through and through. on the right side of the operator:


/server *{(...).*}/m

1) everything between the '/' characters is a part of a search expression.


/server *{(...).*}/m

2) The '*' makes it look for zero or more occurrences of the preceding character (space).


/server *{(...).*}/m

3) the '{' sign is just a bracket sign, like the word 'server' was just the word 'server'.


/server *{(...).*}/m

4) The '(...)' was there to say, "Hey! Capture three of any character (that's what dot stands for, any character). I'm not sure how to make use of this in C#, but with ruby it's really cool, and I'll elaborate after this.


/server *{(...).*}/m

5) The '.' meant to search for any... one... character
6) The '*' sign was paired with the dot sign, allowed us to search for ZERO or MORE occurances of the previously specified character type ('.' ie all)

/server *{(...).*}/m

7) The m argument at the end made sure that the '.' character worked for newline tokens as well as any other characters.



========================================

Ok, now thet that is all explained, I should tell you something a little more advanced. Consider that same string we used in the last example.

We could also use server\s*\{(.*?}) to get what we want, and capture the good stuff! By that I mean, capture everything regarding the 'location' declaration in the sample string (eg "location {etc.}").


my_text = "Oh god it's good to be back in c#.  My IP address on this comp is 192.168.0.2.  Wanna hear a joke?  \nserver{ \nlocation { \nroot  I'm right behind you!\n  } \n} I didn't say it was a funny joke, I just said it was a joke!"

my_text =~ /server\s*\{(.*?})/m

$~.captures[0]     # => "location { \nroot  I'm right behind you!\n  }"



That's cool, right. Let's try the same thing without the new questionmark symbol in our regex string.

my_text = "Oh god it's good to be back in c#.  My IP address on this comp is 192.168.0.2.  Wanna hear a joke?  \nserver{ \nlocation { \nroot  I'm right behind you!\n  } \n} I didn't say it was a funny joke, I just said it was a joke!"

my_text =~ /server\s*\{(.*})/m

$~.captures[0]     #   => " \nlocation { \nroot  I'm right behind you!\n  } \n}"




BLAMO! Did you see that? The question mark was modifying the '.*' unit. Yikes, so the dot means, "ANY CHARACTER" and the * means "zero or more of DOT", and the ? means, "Don't be greedy, stop capturing as soon as possible. So when we ran the query without the question mark, it gobbled up the capture right past the first } and stopped at the second one. Cool distinction, defiantly something worth remembering.

1 Comments On This Entry

Page 1 of 1

NickDMax 

11 June 2011 - 09:51 AM
One of the major requirements for an editor for me is to have regex search and replace. As a programmer I think it is a must. It is not uncommon to have some data in the form of a table or list that you need to convert into some usable format. One could save the data to a file and then write a program that reads the file and formats the data... or one could just use Search and replace with regex (or a tool like sed).
0
Page 1 of 1

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

November 2018

S M T W T F S
    123
45678910
111213 14 151617
18192021222324
252627282930 

Tags

    Recent Comments

    Search My Blog

    0 user(s) viewing

    0 Guests
    0 member(s)
    0 anonymous member(s)

    Categories