7 Replies - 1036 Views - Last Post: 25 January 2013 - 05:57 AM Rate Topic: -----

#1 cryptsource  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 25-January 13

preg_replace returns NULL

Posted 25 January 2013 - 01:59 AM

Hi,

I got a little problem with a preg_replace regex. This is the code I use:

preg_replace("/[^a-zA-Z0-9_\s\/\!\,\;\?\:\.\-\"\'\\@\&\]/", "", $string)


At the first try it works like it should but if I set all characters listed on this site as input it returns an empty string:

http://www.tamingthe...-characters.htm

Short Snippet
 	Á 	Á
 	Ã 	Ã
 	Å 	Å
 	À 	À
 	Â 	Â
 	Ä 	Ä
 	á 	á



If I remove the underscore from the regex everything is fine. Escaping it doesn't help.

I use it to filter a POST request.

Header is set like this at the top of the page from which the POST is submitted:
header('Content-type: text/html; charset=utf-8')



Whyt is wrong with the code?

Summary:

It produces an error if inputted some special chars. If I remove the underscore everything is Ok.

Is This A Good Question/Topic? 0
  • +

Replies To: preg_replace returns NULL

#2 Dormilich  Icon User is offline

  • 痛覚残留
  • member icon

Reputation: 3530
  • View blog
  • Posts: 10,172
  • Joined: 08-June 10

Re: preg_replace returns NULL

Posted 25 January 2013 - 02:19 AM

what if you replace a-zA-Z0-9_ by \w ?
Was This Post Helpful? 0
  • +
  • -

#3 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3718
  • View blog
  • Posts: 5,989
  • Joined: 08-June 10

Re: preg_replace returns NULL

Posted 25 January 2013 - 02:20 AM

Hey.

You may want to look into using proper Unicode sequences. PHP doesn't do Unicode natively, so you can't really post Unicode chars into a PHP string and expect it to work completely. (It sometimes does, but not always. Tends to depend on the charset and tools used to create the actual PHP code file.)

So, try something more like this:
'/[^\p{L}\d_!,;\.-"\'@&§]/ui'


Was This Post Helpful? 1
  • +
  • -

#4 Dormilich  Icon User is offline

  • 痛覚残留
  • member icon

Reputation: 3530
  • View blog
  • Posts: 10,172
  • Joined: 08-June 10

Re: preg_replace returns NULL

Posted 25 January 2013 - 02:25 AM

@Atli: afaik, . needs no escaping in character classes and - does (unless at the end or beginning)
Was This Post Helpful? 2
  • +
  • -

#5 cryptsource  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 25-January 13

Re: preg_replace returns NULL

Posted 25 January 2013 - 02:40 AM

Hi,

fast support. Thanks.

This worked:

preg_replace('/[^\p{L}\d_!,;.\-"\'@&]/ui', ' ', $string);


Solved.
Was This Post Helpful? 0
  • +
  • -

#6 Atli  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3718
  • View blog
  • Posts: 5,989
  • Joined: 08-June 10

Re: preg_replace returns NULL

Posted 25 January 2013 - 04:40 AM

View PostDormilich, on 25 January 2013 - 09:25 AM, said:

@Atli: afaik, . needs no escaping in character classes and - does (unless at the end or beginning)

Ahh, Ok thanks. I never could remember the exact details of what has to be escaped and where in a RegEx. I basically just fiddle with it until I get it right :)
Was This Post Helpful? 0
  • +
  • -

#7 Dormilich  Icon User is offline

  • 痛覚残留
  • member icon

Reputation: 3530
  • View blog
  • Posts: 10,172
  • Joined: 08-June 10

Re: preg_replace returns NULL

Posted 25 January 2013 - 05:10 AM

actually, thats quite simple. in a character class - is the range operator (must be escaped) and . could be interpreted as [\w\W] (though thats not quite exact), hence a use of that definition makes no sense if used inside a character class.
Was This Post Helpful? 0
  • +
  • -

#8 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3371
  • View blog
  • Posts: 11,420
  • Joined: 12-December 12

Re: preg_replace returns NULL

Posted 25 January 2013 - 05:57 AM

It is best to place the characters - ^ \ ] at particular positions within character classes, so that they are treated as regular characters and then don't require escaping.

Quote

Note that the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.

To include a backslash as a character without any special meaning inside a character class, you have to escape it with another backslash. [\\x] matches a backslash or an x. The closing bracket (]), the caret (^) and the hyphen (-) can be included by escaping them with a backslash, or by placing them in a position where they do not take on their special meaning. I recommend the latter method, since it improves readability. To include a caret, place it anywhere except right after the opening bracket. [x^] matches an x or a caret. You can put the closing bracket right after the opening bracket, or the negating caret. []x] matches a closing bracket or an x. [^]x] matches any character that is not a closing bracket or an x. The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen.


http://www.regular-e.../charclass.html
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1