Mixing languages. Now you have two problems.

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

31 Replies - 3098 Views - Last Post: 15 April 2013 - 08:59 PM

#1 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3573
  • View blog
  • Posts: 11,112
  • Joined: 05-May 12

Mixing languages. Now you have two problems.

Posted 10 April 2013 - 07:36 AM

Quote

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
- Jamie Zawinski


The quote can be taken out of context at times, but Jeff's blog later reveals that it's the overuse of regular expressions than is the problem, and not necessarily the use of regular expressions.

Anyway, I started this thread because I was responding to another thread in the C# forum pointing out that they are using SQL within C#.

Anyway, at times I also finding myself doing something like:
var cmd = new SqlCommand("SELECT Id, Name FROM Employee WHERE Id=@id");


or
var node = nodes.SelectSingleNode(String.Format("//Employee[id='{0}']", id));


or
var output = write.Write("<tr><td>{0}</td></tr>", id);


and not get the same amount of attention as
var id = RegEx.Match(input, "^.*;id=(?<id>\w+);.*$");



Somehow it seems that mixing in a regular expression is bad and catches people's eye. But mixing in SQL, XPath, HTML, XML, or LINQ and people don't even blink. Heck, throwing in some asm blocks or HLSL also doesn't seem to phase some people.

To me it's still a context shift when you have to jump into another language. It also requires the same level of expertise of understanding the other language.

Is it just because the quote above has become a meme? Or is there something else in play here where mixing in some languages are okay, but others are not? What is the criteria for what is okay to mix in and what is not?

Is This A Good Question/Topic? 2
  • +

Replies To: Mixing languages. Now you have two problems.

#2 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9199
  • View blog
  • Posts: 34,568
  • Joined: 12-June 08

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 08:08 AM

So the thrust is why do peeps hate on regex so bad?

Just, because I want to be the first to say it, regular expressions are not a programming language. Wheew. Now that's out of the way..

There's quite a bit about what is vogue and what isn't, right? Hell if I thought I could sneak it through the code reviews I would have mixed in Lua with some of my apps just to add some extensible flavoring.

For me I know regex to get by, and when a new project requires some sort of super duper regex expression it tends to moderately annoy me. Much like using LINQ (though for different reasons). Though some of my coworkers avoid regex like the plague and tend to get someone else on the team to do that work. So it's a comfort zone issue.

Me - I have an issue with the brittle nature of regex (well when applied in too large of a scope). People tend to see it as an 'all or nothing', but if they do a bit of string manipulation up front they can keep it limber.
Was This Post Helpful? 2
  • +
  • -

#3 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7744
  • View blog
  • Posts: 13,083
  • Joined: 19-March 11

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 08:15 AM

Mixing languages seems quite popular in web development - for example, I'm working on a project where we're constantly echoing lines of HTML and javascript out of a PHP call. Of course, this is a file that might also just output HTML in the usual way. Needless to say, there is also SQL mixed in (including one PHP function which is over 100 lines, all of it devoted to constructing one query) and a little ASP here and there for flavor.

As you can imagine, this creates all sorts of problems - for example, just maintaining reasonable formatting of the source is completely impossible now!

I can't say I see a lot of this in "regular" programming - usually people are pretty good about encapsulating this sort of thing so the overall flow is in the "host" language. (for example ORMs to keep you from seeing the SQL)

A related concern, for me, is the rather exuberant overloading allowed in Scala. I can see this leading to all sorts of weirdness, where you're almost learning a new syntax to use someone's library. I haven't done enough Scala to know if this is actually a problem, but I can see how it could become problematic.
Was This Post Helpful? 1
  • +
  • -

#4 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 1992
  • View blog
  • Posts: 4,144
  • Joined: 11-December 07

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 08:34 AM

Regex gives you a choice: write an unreadable gibberish of symbols that gets the job done or write 10-1000 lines of Java/C#/c++/etc to do the same thing.

I think a lot of developers are like me. I can write simple regex and know where to look when I need something obscure, but once a regex is written it is almost indecipherable. Like mode123_1 recommends, I often do a bit of string manipulation before using a simple regex. It keeps the line count down and the gibberish to a minimum. Composed regex can also help:

http://martinfowler....posedRegex.html
Was This Post Helpful? 1
  • +
  • -

#5 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7744
  • View blog
  • Posts: 13,083
  • Joined: 19-March 11

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 08:43 AM

Cool. I had no idea that "Composed regex" was a thing, but yeah, that's exactly the sort of thing I'm talking about in terms of encapsulating "guest language" code, so your overall flow is in the "host language".
Was This Post Helpful? 1
  • +
  • -

#6 xclite  Icon User is offline

  • LIKE A BOSS
  • member icon


Reputation: 905
  • View blog
  • Posts: 3,167
  • Joined: 12-May 09

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 09:55 AM

I get uneasy when I'm writing another language using the string primitive in my current language. It means I don't get any sort of checking and it usually means I'm counting on somebody updating the string whenever the thing that string manipulates changes.

Regexp are a constant target of this wariness because they're very dense - it makes them powerful but it also makes it easy to forget what they do.

Whenever I write a regexp, I try to write it in situations where the target they're matching against is pretty solidified. I also make sure I comment and explain what it's supposed to do.

Composed regex seems interesting.
Was This Post Helpful? 3
  • +
  • -

#7 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3573
  • View blog
  • Posts: 11,112
  • Joined: 05-May 12

Re: Mixing languages. Now you have two problems.

Posted 10 April 2013 - 08:27 PM

View Postxclite, on 10 April 2013 - 12:55 PM, said:

Regexp are a constant target of this wariness because they're very dense - it makes them powerful but it also makes it easy to forget what they do.

+1.

Yet, XPath queries also tend to have the same complexity, density, and odd nuances, but they seem to be preferred over DOM navigation, or SAX style callbacks.
Was This Post Helpful? 0
  • +
  • -

#8 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2262
  • View blog
  • Posts: 9,464
  • Joined: 29-May 08

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 04:12 AM

The issue with RegEx is people think too simple, and forget about the edge cases and the subtle complexities of the domain they trying to capture.

This is the RegEx for a checking the validity of a RPC822 compliant email address (perl regex)
(?:(/>?:\r\n)?[ \t])*(?:(/>?:(/>?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:
\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ 
\t]))*"(?:(/>?:\r\n)?[ \t])*))*@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)
?[ \t])*)*\<(?:(/>?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[
 \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t]
)*))*(?:,@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*
)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*)
*:(/>?:(/>?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r
\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t
]))*"(?:(/>?:\r\n)?[ \t])*))*@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?
:\r\n)?[ \t])*))*\>(?:(/>?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?
:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?
[ \t]))*"(?:(/>?:\r\n)?[ \t])*)*:(/>?:(/>?:\r\n)?[ \t])*(?:(/>?:(/>?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*))*@(?:(/>?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)*\<(?:(/>?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*(?:,@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(/>?:\r\n)?[ \t])*))*)*:(/>?:(/>?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*))*@(?:(/>?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*\>(?:(/>?:\r\n)?[ \t])*)(?:,\s*(
?:(/>?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t
])*))*@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?
:\.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)*\<(?:(/>?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*(?:,@(?:(/>?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*)*:(/>?:(/>?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])*)(?:\.(?:(/>?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>?:\r\n)?[ \t]))*"(?:(/>?:\r\n)?[ \t])
*))*@(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*)(?:\
.(?:(/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>?:(/>?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>?:\r\n)?[ \t])*))*\>(?:(/>
?:\r\n)?[ \t])*))*)?;\s*)




Don't get me started on recursive balanced tag matching attempts in RegEx.

Some programmers try and use it parse code, especially HTML.
When it would be better to write a parser, which is what the actually need.

Code Syntax highlighters are good example of this problem.
Want a challenge? Try correctly highlight VB.net's strings with " inside a string quotation.

This post has been edited by AdamSpeight2008: 11 April 2013 - 04:15 AM

Was This Post Helpful? 2
  • +
  • -

#9 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3467
  • View blog
  • Posts: 11,774
  • Joined: 12-December 12

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 07:40 AM

Regexper is useful for visualizing regex, and kinda cool!

Posted Image

I would be a little nervous of that composed regex approach. I think it requires quite a bit of confidence to be able to split regex like that, and to be certain that when re-combined the full expression remains valid. [Note: not every regex variant supports in-line comments.]

I prefer just to accept that regex can be complex, but also can be extremely useful. I would prefer just to precede them with a few lines of comment, describing the pattern they are trying to match, and any exceptions that I had to account for. I think it is pointless trying to describe them in detail because, if they ever needed revising, I know I will have to start from scratch anyway :)/>

This post has been edited by andrewsw: 11 April 2013 - 07:45 AM

Was This Post Helpful? 0
  • +
  • -

#10 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7744
  • View blog
  • Posts: 13,083
  • Joined: 19-March 11

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 07:53 AM

View PostAdamSpeight2008, on 11 April 2013 - 06:12 AM, said:

The issue with RegEx is people think too simple, and forget about the edge cases and the subtle complexities of the domain they trying to capture.

This is the RegEx for a checking the validity of a RPC822 compliant email address (perl regex)
(?:(/>/>?:\r\n)?[ \t])*(?:(/>/>?:(/>/>?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:
\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ 
\t]))*"(?:(/>/>?:\r\n)?[ \t])*))*@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)
?[ \t])*)*\<(?:(/>/>?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[
 \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t]
)*))*(?:,@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*
)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*)
*:(/>/>?:(/>/>?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r
\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t
]))*"(?:(/>/>?:\r\n)?[ \t])*))*@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?
:\r\n)?[ \t])*))*\>(?:(/>/>?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?
:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?
[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)*:(/>/>?:(/>/>?:\r\n)?[ \t])*(?:(/>/>?:(/>/>?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*))*@(?:(/>/>?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)*\<(?:(/>/>?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*(?:,@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*)*:(/>/>?:(/>/>?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*))*@(?:(/>/>?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*\>(?:(/>/>?:\r\n)?[ \t])*)(?:,\s*(
?:(/>/>?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t
])*))*@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?
:\.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)*\<(?:(/>/>?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*(?:,@(?:(/>/>?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*)*:(/>/>?:(/>/>?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])*)(?:\.(?:(/>/>?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(/>/>?:\r\n)?[ \t]))*"(?:(/>/>?:\r\n)?[ \t])
*))*@(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*)(?:\
.(?:(/>/>?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(/>/>?:(/>/>?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(/>/>?:\r\n)?[ \t])*))*\>(?:(/>/>
?:\r\n)?[ \t])*))*)?;\s*)



Looks like perfectly ordinary perl to me. :)/>

Quote

Don't get me started on recursive balanced tag matching attempts in RegEx.

Some programmers try and use it parse code, especially HTML.
When it would be better to write a parser, which is what the actually need.


This is just another example of the Turing Trap - the desire to write a sudoku solver in SQL, or to use HTML5/CSS to parse python, or to play tic-tac-toe in assembly, just because you can. This can be cute, but it's never good code. (it might be good for other purposes - showing off, or explaining something about the language, or just having fun - but it's not good code)
I don't think this is a fault in regex, or SQL, or HTML5, or assembly. It's just a case of programmers using the wrong tool for the job.

Quote

I think it is pointless trying to describe them in detail because, if they ever needed revising, I know I will have to start from scratch anyway


Maintainability is a necessary condition for good code - if you can't maintain it and update it as requirements change, it's crap and should be scrapped now, while it's still a potential problem. (and before it's an actual problem) The regexen you describe, by this standard, are crap and should be replaced with something that doesn't suck. Again, you're clearly using the wrong tool for the job.
Was This Post Helpful? 2
  • +
  • -

#11 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3467
  • View blog
  • Posts: 11,774
  • Joined: 12-December 12

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 08:06 AM

Just to clarify, when I say that I would "start from scratch" I don't mean that I would delete my existing regex, just that I would need to read through it again, before I could start to modify it. A few comment-line alongside the regex help with this process.
Was This Post Helpful? 0
  • +
  • -

#12 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7744
  • View blog
  • Posts: 13,083
  • Joined: 19-March 11

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 08:11 AM

Unfortunately, most regex just isn't that maintainable. Generally, it's great for stuff that's more complex than simple grepping or equality, and less complex than, say, validating an email address. :)

Now I'm starting to wonder what it would take to make regex maintainable - that is, what's the minimal set of changes you'd have to make to existing regex to allow a programmer to express arbitrarily complex expressions in a reasonable and maintainable fashion? Would this be even possible?
Was This Post Helpful? 0
  • +
  • -

#13 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3467
  • View blog
  • Posts: 11,774
  • Joined: 12-December 12

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 08:42 AM

I suppose a class could be constructed, something like:

rgx.AddWord("something", separated=true)    // \bsomething\b
rgx.AddDigits(0,3)


but as soon as we start to look at groups, etc., it becomes complicated.

I think it is a Catch-22: if the regex could be built with a simple class then the expression itself probably isn't that complicated - in which case..

Didn't someone once try to create a natural language version of regex (I vaguely recall)?
Was This Post Helpful? 0
  • +
  • -

#14 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 1992
  • View blog
  • Posts: 4,144
  • Joined: 11-December 07

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 08:44 AM

I think that allowing regex functions would be a great help. Maybe:

def between(:a, :B)/> = /:a[^:b]+/
def htmlTag(:a) = /:between(/<:a/ />/)/
export def anchorTag() = /:htmlTag(/a/)/
export def paragraphTag() = /:htmlTag(/p/)/
export def imageTag() = /:htmlTag(/img/)/



You could easily build up a regex file that exports useful, readable domain specific regexes. It would also be simple to write a validator for it. The big problem is that languages would have to explicitly support it or third party libraries would have to be imported.
Was This Post Helpful? 1
  • +
  • -

#15 cfoley  Icon User is online

  • Cabbage
  • member icon

Reputation: 1992
  • View blog
  • Posts: 4,144
  • Joined: 11-December 07

Re: Mixing languages. Now you have two problems.

Posted 11 April 2013 - 10:14 AM

I wonder if I could be onto something. That last syntax is a bit clunky. Here is an attempt at cleaning it up a bit and adding assertions:

export anchorTag = htmlTag(a)
htmlTag name = between(<name >)
between start end = start[^end]+

assert-match anchorTag <a>
assert-match anchorTag "<a href="http://dreamincode.net">"
assert-no-match anchorTag <>
assert-no-match anchorTag <p>
assert-no-match anchorTag </a>
assert-index [3 10] anchorTag <p><a></a><a></a></p>
assert-split ["<p>" "</a>" "</a></p>"] anchorTag <p><a></a><a></a></p>
assert-find ["<a>" "<a>"] anchorTag <p><a></a><a></a></p>

export paragraphTag = htmlTag(p)

export imageTag = htmlTag(img)




So you can have readability, testing (which acts as a good documentation) validation and modularity. Add in comments and imports and it could be really good. I'll have to think about some of the more complex features of regex. This might be a cool and useful side project. Anyone know if something like this already exists?
Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3