Page 1 of 1

Basic Regular Expressions

#1 supercorey  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 119
  • View blog
  • Posts: 207
  • Joined: 15-February 09

Posted 24 September 2011 - 11:34 PM

*
POPULAR

Basic Regular Expressions Tutorial
By: Supercorey

Hello, everybody. Today I'm going to be telling you how to use basic regular expressions, often abbreviated as "regex" or "regexp". Firstly, what is a regular expression? A regular expression is a pattern describing a certain amount of text. Regular expressions can aid in matching, parsing, validating, and otherwise checking or processing a string. They can save large amounts of time by condensing long algorithms into a single concise string. Most modern programming languages have support for utilizing regular expressions. Even though I mainly use Java, I'll try to provide application implementation examples in as many languages as possible.

To process a regular expression, you need a regular expression engine. Generally, the engine is part of whatever language, SDK, etc. that you are using, so you won't have to access it directly. Not all regular expression engines are the same, although the most commonly used one that is mainly the standard is the engine in Perl 5. Since this is the standard, deservedly so, I'll base this tutorial around it.

Firstly, let's learn about the anatomy of a regular expression.

The simplest component of a regular expression is a literal character which matches the same literal character.
Let's take the regular expression g. This would match the first occurance of g in a string you compared it against. Regular expressions rock would match while The character isn't here would not.

Not all characters can be used as literals by themselves. These are special characters. In the Perl 5 engine, there are 11 special characters: [\^$.|?*+(). If you want to use these as literals, you need to escape them with a backslash, such as if you wanted to match ps waux | grep, you would have to use ps waux \| grep.

There are things called character sets which allow you to match one of several characters. You place the characters you want to match inside square brackets. You can also use ranges with a hyphen inside of character sets such as 0-9, meaning 0,1,2,3,...,9. Take the following regular expression: gr[ae]y. This would match either gray or grey. You could match and number one through 9 with [0-9].

One of the most commonly used special characters is the dot. A . matches any character except for a newline (OS-dependant). Inside character sets, dots are implied to be escaped. Be careful with the dot and don't use it as a way to be lazy as it can result in unexpected consequences, such as matching characters you didn't mean to. Use character classes whenever possible.

You can also try to match the beginning and end of a string using ^ and $ respectively. The regular expression ^word$ would match word but not this word is cool.

By using parentheses, you can group a part of an expression. This can allow you to apply a quantifier, etc. to the entire group, not just the individual parts.

By using the pipe/bar operator, you can specify the regex engine to match either the left side of it or the right side. They can also be strung together into a string of three or more options such as in x|y|z which would match x, y, or z, but not any combination thereof. You can also use it along with parentheses to alternate only part of a regular expression: i am feeling (happy|sad) would match either i am feeling happy or i am feeling sad.

Finally, there are quantifiers. Quantifiers specify how many times a particular part of an expression is required to appear. The ? operator makes the preceeding item optional. (wo)?man would match either man or woman. The * operator requires the preceeding item zero or more times. The + requires the preceeding item one or more times. You can also specify an exact number of times for the preceeding item to be required with {n}, where n is the number of times the item should be required (should be at least 1). You can also use a component of the form {n,m} to specify a range of times the preceeding item should be required where n is the bottom range(inclusive) and m is the upper range(inclusive).

In each programming language, the way to invoke the regular expression engine is often different. Sometimes, the actually syntax of the expression itself is different because different engines vary. Next, I will list some common ways to check strings against regular expressions in some common programming languages:

Java:
stringToCheck.matches(regularExpression);



C#:
System.Text.RegularExpressions.Regex.IsMatch(stringToCheck,regularExpression);



PHP:
preg_match(regularExpression,stringToCheck);



Javascript:
The regular expression is written in the form "/expression/" WITHOUT the quotes. It is not a string literal.
stringToCheck.match(regularExpression);




So, with this, I hope you learned something from this tutorial. Feel free to leave feedback, suggestions, constructive criticism, etc. in the comments. Also, please don't reproduce this tutorial without giving me credit. Regular expressions can be a powerful, time-saving tool in programming if you understand them.

Is This A Good Question/Topic? 9
  • +

Replies To: Basic Regular Expressions

#2 Tayacan  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 145
  • View blog
  • Posts: 275
  • Joined: 18-January 11

Posted 25 September 2011 - 10:53 AM

Nice introduction to the topic :D
Was This Post Helpful? 0
  • +
  • -

#3 fromTheSprawl  Icon User is offline

  • Monomania
  • member icon

Reputation: 513
  • View blog
  • Posts: 2,063
  • Joined: 28-December 10

Posted 27 September 2011 - 05:38 PM

Cool! I know 0% about regex. Thanks for putting this up!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1