RegEx Url Escaping

  • (2 Pages)
  • +
  • 1
  • 2

15 Replies - 775 Views - Last Post: 02 May 2020 - 05:56 PM

#1 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

RegEx Url Escaping

Posted 26 April 2020 - 02:40 PM

Which of these regex patterns is well-escaped?
Which are wrong?
Which are your most and least favorite, and why?

Here's the unescaped url:

Quote

[url]https://www.google.com/url?q=[/url]


1: https:\/\/www\.google\.com\/url\?q=
2: https://www\.google\.com/url\?q=
3: https\:\/\/www\.google\.com\/url\?q\=


Please interpret the question as it might apply to any body of text, in preparation for pattern-matching, not just a url.

This post has been edited by johnywhy: 26 April 2020 - 04:30 PM


Is This A Good Question/Topic? 0
  • +

Replies To: RegEx Url Escaping

#2 Dormilich   User is offline

  • 痛覚残留
  • member icon

Reputation: 4271
  • View blog
  • Posts: 13,521
  • Joined: 08-June 10

Re: RegEx Url Escaping

Posted 26 April 2020 - 03:39 PM

View Postjohnywhy, on 26 April 2020 - 11:40 PM, said:

Which are your most and least favorite, and why?

Don't care along as it works.
Was This Post Helpful? 0
  • +
  • -

#3 ge∅   User is online

  • D.I.C Lover

Reputation: 318
  • View blog
  • Posts: 1,334
  • Joined: 21-November 13

Re: RegEx Url Escaping

Posted 26 April 2020 - 04:21 PM

escaping = or : is not necessary think, you can only find them in groups/assertions
escaping / is only necessary when you write an inline regexp.
when you use the RegExp constructor, \ itself must be escaped for some reason

/https:\/\/www\.google\.com\/url\?q=/
new RegExp("https://www\\.google\\.com/url\\?q=")


Was This Post Helpful? 1
  • +
  • -

#4 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 26 April 2020 - 04:25 PM

View PostDormilich, on 26 April 2020 - 03:39 PM, said:

Don't care along as it works.


i mean in terms of best practices. Some of these might work fine for this particular url, they might be inappropriate for other situations, or for things besides url's.

This post has been edited by johnywhy: 26 April 2020 - 05:38 PM

Was This Post Helpful? 0
  • +
  • -

#5 modi123_1   User is offline

  • Suitor #2
  • member icon



Reputation: 15686
  • View blog
  • Posts: 62,833
  • Joined: 12-June 08

Re: RegEx Url Escaping

Posted 26 April 2020 - 04:38 PM

Quote

To me, it indicates an "i don't care" attitude about the quality of your programming.

Ease up on the personal attacks.

FYI, You have zero mention of best practices in your first post. Be careful on editing posts after people respond. It can skew the flow and responses.
Was This Post Helpful? 1
  • +
  • -

#6 ge∅   User is online

  • D.I.C Lover

Reputation: 318
  • View blog
  • Posts: 1,334
  • Joined: 21-November 13

Re: RegEx Url Escaping

Posted 26 April 2020 - 05:00 PM

Quite frankly, I've had the opinion for a while that the best practice for me would be to avoid regexps altogether and learn how to write parsers and how to discriminate situations in which they are not the best choice. I can't recall a situation where my mistake was an escape, it has always been a design mistake from my part.
Was This Post Helpful? 1
  • +
  • -

#7 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 26 April 2020 - 05:21 PM

View Postge∅, on 26 April 2020 - 05:00 PM, said:

Quite frankly, I've had the opinion for a while that the best practice for me would be to avoid regexps altogether and learn how to write parsers and how to discriminate situations in which they are not the best choice.


I don't like regexp's because the syntax is cryptic. I always labor for too long to get my patterns right. I'm looking for an alternative, or simplified API front-end for regexp.

But pattern-matching and replacing are still needed.

When you say "write parsers", you mean a unique parser for every application? A general purpose tool? Or...?

Can you mention some situations in which regexp wouldn't be the best choice?

This post has been edited by johnywhy: 26 April 2020 - 05:26 PM

Was This Post Helpful? 0
  • +
  • -

#8 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 26 April 2020 - 05:28 PM

View Postge∅, on 26 April 2020 - 05:00 PM, said:

I can't recall a situation where my mistake was an escape, it has always been a design mistake from my part.

in this case, i'm not checking my own escapes, but escapes written by others found mostly on stackoverflow and MDN. Just trying to pick one to run with for all my coding.

Here they are:

RegexEscape1 = function(str) {
    return str.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')
	}

function RegexEscape2(str) {
	// same output as MDN on a url
	return str.replace(/[.?+*^$|({[\\]/g, '\\$&')
	}

function RegexEscape3(str) {
	// same output as 4 on a url
	return str.replace(/(?=\W)/g, '\\')
	}

function RegexEscape4(str){
	// same output as 3 on a url
	return str.replace(/([^\w\d\s])/gi, '\\$1')

	}

function RegexEscapeMDN(str) {
	// same output as 2 on a url
  return str.replace(/[.*+\-?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
	}


I prefer the least-verbose option, but that may miss some cases.

Others may err on the side of escaping things that don't need to be escaped (wondering if that will affect regexp performance).

Some complain that using a range is "less readable", because it doesn't explicitly list every character, but for me this cryptic regexp stuff is never going to be all that readable anyway.

This post has been edited by johnywhy: 26 April 2020 - 05:31 PM

Was This Post Helpful? 0
  • +
  • -

#9 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 26 April 2020 - 05:34 PM

View Postmodi123_1, on 26 April 2020 - 04:38 PM, said:

Ease up on the personal attacks.

Ok, sorry, didn't mean it that way.
My apologies to @Dormilich
I revised my reply.

This post has been edited by johnywhy: 26 April 2020 - 05:38 PM

Was This Post Helpful? 0
  • +
  • -

#10 ge∅   User is online

  • D.I.C Lover

Reputation: 318
  • View blog
  • Posts: 1,334
  • Joined: 21-November 13

Re: RegEx Url Escaping

Posted 26 April 2020 - 05:56 PM

Hell yes...

I was using this library, math.js to interpret matlab, but of course it doesn't work out of the box because of differences in conventions, syntax and missing features. A portion of which was the creation/selection/concatenation/filtering of (sub)matrices. I didn't think there would be so many ways to use the same goddamn syntax

at the top of my head (A B C are matrices a, b, c are numbers, f are functions)
A(a,b,c,...) = B;
A(a:b,c:d,...) = B;
A = f(B(a,b,c,...));
A = f(B(a:b,c:d,...));
A(f(B )) = C;

note that a, b, c, etc. can also be an arithmetic operation including a variable name. There also is this keyword, end, which is the matrix's length at a given dimension (dependant on its position in the expression).

so I wrote regexps for each case I think , I escaped stuff (by this I mean temporarily replaced portions of the input by a placeholder to protect it from side effects... urhh), I had to mind the order in which each regexp had to be executed (urhhh....) and I wrote the functions I had transpiled the raw matlab to, because of course there was work to be done at runtime. Soon enough I realised there were many runtime checks I should really move to the "transpiler" for performance reasons so I had to touch this terrible terrible code again (it was so worth it, but still...)

I wouldn't do the same mistake today but clearly I should have tried to extend the parser of the library I was using. My biggest mistake was to underestimate the complexity and the dynamic aspect of the language.

Oh and I don't remember them all but at the end of the day my solution had limitations, related to the use of variables, operator precedence and parenthesis in some expressions because it was impossible to discriminate between this and that case...

This post has been edited by ge∅: 26 April 2020 - 06:02 PM

Was This Post Helpful? 1
  • +
  • -

#11 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 26 April 2020 - 07:19 PM

View Postge∅, on 26 April 2020 - 05:56 PM, said:

I was using this library, math.js to interpret matlab, but of course it doesn't work out of the box because of differences in conventions, syntax and missing features.

if i understood matrix math, matlab, and everything else you mentioned, then i'd understand why regexp was the wrong tool for the job.

i think you're saying you used regexp to convert some math operations written in one platform or idiom to a syntax that matlab understands, correct? Sounds like you were developing or customizing a major system or add-on, and using regexp as an core component of your add-on, correct? and you did your conversions at runtime instead of transpiling something something before runtime something. i think.

i understood one part:
"but of course it doesn't work out of the box because of differences in conventions, syntax and missing features"

Surely must've been written by someone with an advanced understanding of matlab, an advanced understanding of matrix and other math operations, an advanced understanding of the programming language and environment... right? and yet things not working as they were supposed to. Do i have that right?

If so, that says to me shoddy coding, and shoddy design-work. Amiright?

People can throw around their advanced knowledge of a topic-area (as some do), but i'm more impressed by quality design and coding practice. Planning for the future. Fundamentals.

Iac, for my current need (simple string replacements in HTML), regexp seems the obvious and appropriate tool, not developing a new parser. :)

This post has been edited by johnywhy: 26 April 2020 - 07:21 PM

Was This Post Helpful? 0
  • +
  • -

#12 ge∅   User is online

  • D.I.C Lover

Reputation: 318
  • View blog
  • Posts: 1,334
  • Joined: 21-November 13

Re: RegEx Url Escaping

Posted 26 April 2020 - 08:40 PM

A matrix is an array with arbitrarily many dimensions and matlab is a language which helps you compute stuff with matrices by abstracting away all the looping over the dimensions, among other things. mathjs is a Javascript maths library which parses a matlab-like language and supports operations on matrices.

Conventions in maths are just that: if I decide that in a 2D matrix the first dimension is the horizontal one, it's just as valid as deciding it's the vertical one.

mathjs is an OK library and was a good first step for interpreting a subset of matlab in Javascript considering how its features intersected the matlab ones required for the project. And the missing pieces didn't seem too impressive at the beginning. Implementing loops and conditions with a tree structure was trivial and most of the differences between the source and target language were handled independently with no significant overhead.

It's just that one feature: the selection of sub-matrices, portions of these n-dimensional arrays, which just exploded in complexity because it turns out it can be found in many configurations to do many different things (also, there are no variable declarations in matlab so there are many things your transpiler doesn't know about). The regexp part is not even the most complex part of bringing this functionality, but it's the one which ended up looking ugly.

Clearly, in retrospect regexp were a bad choice and, to provide some context, I'm more a designer than I am a developer in the first place, so there are some things I could not anticipate, but in my past self defence: I can also clearly see how even today I could be lured into thinking something is doable with regexps only to realise after a while that this is growing more complex than I thought. It is also very easy to get sucked in: to take on the new complexity step by step thinking that each one of them is just "a little adjustment".
Was This Post Helpful? 1
  • +
  • -

#13 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2679
  • View blog
  • Posts: 7,898
  • Joined: 15-January 14

Re: RegEx Url Escaping

Posted 27 April 2020 - 10:33 AM

Frankly, you kind of ask some strange questions for someone claiming 40 years of programming experience. Best practices of escaping is definitely one of those.
Was This Post Helpful? 2
  • +
  • -

#14 johnywhy   User is offline

  • D.I.C Head

Reputation: -4
  • View blog
  • Posts: 152
  • Joined: 07-April 20

Re: RegEx Url Escaping

Posted 27 April 2020 - 12:32 PM

View PostArtificialSoldier, on 27 April 2020 - 10:33 AM, said:

Frankly, you kind of ask some strange questions for someone claiming 40 years of programming experience.


but not web programming, and i've not done a huge amount of regex stuff, so i avoid making assumptions.

Quote

Best practices of escaping is definitely one of those.


As i mentioned, there are differences of opinion on the topic. I'm seeking more opinions to help inform my own opinion. Maybe you'll be generous with yours :)/>

This post has been edited by johnywhy: 27 April 2020 - 12:32 PM

Was This Post Helpful? 0
  • +
  • -

#15 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2679
  • View blog
  • Posts: 7,898
  • Joined: 15-January 14

Re: RegEx Url Escaping

Posted 27 April 2020 - 04:04 PM

I do what makes the most sense to me. That's my advice to you, do what makes the most sense to you.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2