3 Replies - 8488 Views - Last Post: 25 October 2010 - 08:48 AM

#1 monkadelicd  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 52
  • Joined: 08-July 10

Script for removing scripts from HTML code

Posted 22 October 2010 - 08:08 AM

I'm trying to write a script to remove <script> tags and everything between. This is to simply remove all Javascript from code automatically. I originally tried to split the input from a textarea using RegEx but I'm not getting matches properly.
I am just learing about RegEx so bear with me.

I tried just splitting on /<script>/ and /<\/script>/ but if I were to load the HTML code with that script I end up with it being split on the lines that have the split command itself instead of just the opening and closing script tags.

I changed it to /\b<script>\b/ and /\b\/script>\b/ but it doesn't match anything and everything in the textarea is included in the array produced.

Here is the code I have now. How can change this to get the result I want?
<html>
<head>
<title>Script Removal Script</title>
</head>
<body>
<form name="form">
<textarea name="htmlCode" cols="120" rows="25">Insert your HTML code here then click on the "Process" button to remove all scripts.</textarea><br />
<input type="button" value="Process" onclick="removeScript(form.htmlCode.value);" />
</form>
<script type="text/javascript">
function removeScript(code)
{	var fString = code;
	var first = fString.split(/\b\Wscript\W\b/);
	var second = fString.split(/\b\W\/script\W\b/);
	alert("the first array has " + first.length + " entries. The second array has " + second.length + " entries.");
	return form.htmlCode.value = second[0];
}
</script>
<script>
//something worthless
</script>
</body>
</html>



Is This A Good Question/Topic? 0
  • +

Replies To: Script for removing scripts from HTML code

#2 psyking  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 20
  • View blog
  • Posts: 165
  • Joined: 17-January 10

Re: Script for removing scripts from HTML code

Posted 23 October 2010 - 10:20 PM

Try using this function to remove the <script> tag:
function replaceText(code){
	// The regular expression that will match the script element
	var r = /(\<script (.*)\>([\s\S.]*)\<\/script\>)/gi;
	// This replaces the script element, plus all text contained in the script element with whatever you choose to replace it with
	return code.replace(r,"Replacement Text");
}


Was This Post Helpful? 0
  • +
  • -

#3 monkadelicd  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 52
  • Joined: 08-July 10

Re: Script for removing scripts from HTML code

Posted 25 October 2010 - 08:35 AM

OK. I'm going to try and figure out what exactly that RegEx is saying.

/(\<script (.*)\>([\s\S.]*)\<\/script\>)/gi;

/ - starts the RegEx
( - begins pattern match
\<script (.*)\> - matches "<script [anything besides a newline character zero or more times]>" .
([\s\S.]*) - matches any whitespace, and any non-whitespace, and any character except newline zero or more times.
/</\script\>) - matches "</script>"
/gi - matches globally, uppercase and lowercase.


Do I have all that right?
Was This Post Helpful? 0
  • +
  • -

#4 monkadelicd  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 52
  • Joined: 08-July 10

Re: Script for removing scripts from HTML code

Posted 25 October 2010 - 08:48 AM

The only problem I have with that function is if there is more than one script tag anything in between the first <script> and the last </script> is replaced regardless if everything is actually within a set of <script> </script> tags.

I'm trying to figure out if there is a way to stop after the first instance of </script> then continue replacing again after the next <script>
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1