Fast Way To Read A Large XML File

I need to traverse an XML file, but it is way to big!

  • (2 Pages)
  • +
  • 1
  • 2

15 Replies - 12069 Views - Last Post: 30 December 2009 - 02:11 PM

#1 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Fast Way To Read A Large XML File

Posted 28 December 2009 - 02:10 PM

Hello,

I have some code that I use to read an XML file:

function loadXML()
	{
		try //Internet Explorer
		  {
		  xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
		  }
		catch(e)
		  {
		  try //Firefox, Mozilla, Opera, etc.
			{
			xmlDoc=document.implementation.createDocument("","",null);
			}
		  catch(e)
			{
			alert(e.message);
			return;
			}
		  }
		xmlDoc.async=false;
		xmlDoc.load("a.xml");
	}
	
	function searchXML()
	{
		var found = 0;
		var headNodes = xmlDoc.getElementsByTagName("ac");
		var srchArea = document.getElementById("txtArea").value;
		var srchTerms = document.getElementById("txtFirst").value + document.getElementById("txtSecond").value;
		
		for(var i = 0; i < headNodes.length; i++) //loop through nodes to look for matching area code
		{
			if(headNodes[i].getAttribute("val") == srchArea)  //if its found then do this...
			{
				for(var x = 0; x < headNodes[i].childNodes.length; x++)  //loop through child nodes of this block to see if there is a matching number
				{
					if(headNodes[i].childNodes[x].getAttribute("val") == srchTerms)  //if there is a matching number then...
					{
						if(!document.getElementById("resultImg"))  //check if image exists already and if it doesnt create one, otherwise just swap them
						{
							var result = document.getElementById("result")
							var img = document.createElement("img");
							img.src = "donotcall.bmp";
							img.id = "resultImg"; 
							result.appendChild(img);
						}
						else
						{
							document.getElementById("resultImg").src = "donotcall.bmp";
						}
						found = 1;   //set found to true if the number matches
						break;  //break out of the loop when done
					}
					else
					{
						found = 0;  //set found to false if there is not a match and then outside the loop do same image swap as above
					}
				}
			}
		}
		
		if(found == 0)
		{
			if(!document.getElementById("resultImg"))
			{
				var result = document.getElementById("result")
				var img = document.createElement("img");
				img.src = "oktocall.bmp";
				img.id = "resultImg"; 
				result.appendChild(img);
			}
			else
			{
				document.getElementById("resultImg").src = "oktocall.bmp";
			}
		}
	}



This code works great for small XML documents, but the one I just received that I need to search is like 26MB and that would take forever. I was wondering if there is a faster way to do this with either JS or if not then maybe PHP? The only thing I could tyhink of so far is to just separate the large file into multiple smaller files...

Hope someone can help, and thanks!

Is This A Good Question/Topic? 0
  • +

Replies To: Fast Way To Read A Large XML File

#2 Smurphy  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 35
  • View blog
  • Posts: 367
  • Joined: 07-July 08

Re: Fast Way To Read A Large XML File

Posted 28 December 2009 - 02:35 PM

Well that depends, what is the XML file for?
Was This Post Helpful? 0
  • +
  • -

#3 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 28 December 2009 - 02:41 PM

It is a file of all the New Mexico phone numbers that are on the national do not call list, it has somewhere around 1.3 million lines
Was This Post Helpful? 0
  • +
  • -

#4 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 28 December 2009 - 04:15 PM

View PostMoshambi, on 28 Dec, 2009 - 03:41 PM, said:

It is a file of all the New Mexico phone numbers that are on the national do not call list, it has somewhere around 1.3 million lines


You want my browser to chew on that? You are a cruel web dev. :P

A web server should be doing the filtering for you and passing back a small result set, preferably from a database.

If you still want to process this giant file in javascript, then your best bet is to preprocess it into some kind of native javascript collection. Something that will easily support the searches you wish to do. Read that instead; it should be noticeably faster.

For an ideal of the kind of object I'm talking about, check out JSON.
Was This Post Helpful? 0
  • +
  • -

#5 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 28 December 2009 - 04:50 PM

baavgai,

Do you think it would be much faster if I were to put it all into a database? I was also thinking of another variants just now as well.

1: Maybe some sort of comparison on the first number and it breaks out when the number in the file is larger than the number input in, since all the numbers in the file are in numerical order.

I'm not sure if that would make it faster though since it would add another comparison in each time the loop runs. I'll have to check out that link you gave me.

Thanks for the input, much appreciated.
Was This Post Helpful? 0
  • +
  • -

#6 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 28 December 2009 - 05:39 PM

View PostMoshambi, on 28 Dec, 2009 - 05:50 PM, said:

Do you think it would be much faster if I were to put it all into a database?


Drastically. Storing large amounts of data and retrieving that data is the sole purpose of a database. Also, this assumes your datastore is now being handled on the server, where it should be.

View PostMoshambi, on 28 Dec, 2009 - 05:50 PM, said:

1: Maybe some sort of comparison on the first number and it breaks out when the number in the file is larger than the number input in, since all the numbers in the file are in numerical order.


Your fundamental problem with any standard Javascript solution is that you have to load the entire file into memory before you even get started. There's really no away around that and it will always be the slowest option.

The best option for searching a giant XML file is some kind of SAX processor. This is because it doesn't read the entire file into memory, but rather reads XML elements, stores nothing, and relies on the user code to handle state. However, SAX processors aren't as popular as the might be because they're generally more complex than the memory intensive DOM.
Was This Post Helpful? 0
  • +
  • -

#7 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 10:50 AM

View Postbaavgai, on 28 Dec, 2009 - 05:39 PM, said:

Drastically. Storing large amounts of data and retrieving that data is the sole purpose of a database. Also, this assumes your datastore is now being handled on the server, where it should be.


I guess that was kind of obvious huh? Good point I don't know why I didn't think of it that way. I will have to see with my boss if we can make a DB happen. If not then I will have to look into the SAX. However, I noticed you said that the problem with using JS is that it has to load the whole file. This hasn't been my biggest concern, I mean it does slow the page load down, but it doesn't actually take too too long to load. My main concern was when searching for a record that doesn't exist it has to search the whole entire file. This makes the time take 30seconds to 1minute usually (only on the machine I'm on though, I'm sure for others it would take longer..)
Was This Post Helpful? 0
  • +
  • -

#8 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 11:05 AM

I'd use xpath or xslt. If your browser supports an implementation of it, it's bound to be faster than pure javascript. If you're searching using a loop, you're probably doing it wrong.

If you want to offer up a slice of the giant file and the code, maybe someone could come up with something quicker for you.
Was This Post Helpful? 0
  • +
  • -

#9 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 12:07 PM

Well I don't want to give out the numbers so I will give a snippet of the structure of the XML file here:

<list type='full' level='state' val='NM'>
<ac val='505'>
<ph val='1234567' />
//Repeat above line 1.3 million times
</ac>
</list>



I am using a for loop to run through it now, I could try using XSL to run a for-each loop maybe? I'm just not all too familiar with XSL anymore and wouldn't know quite how to go about (at least effectively). I modified the loop I'm using since my first post, so here's what it looks like now:

function searchXML()
	{
		var found = 0;
		var headNodes = xmlDoc.getElementsByTagName("ph");
		var srchTerms = document.getElementById("txtFirst").value + document.getElementById("txtSecond").value;
		
		for(var x = 0; x < headNodes.length; x++)  //loop through child nodes of this block to see if there is a matching number
		{
			if(headNodes[x].getAttribute("val") == srchTerms)  //if there is a matching number then...
			{
				if(!document.getElementById("resultImg"))  //check if image exists already and if it doesnt create one, otherwise just swap them
				{
					var result = document.getElementById("result")
					var img = document.createElement("img");
					img.src = "donotcall.bmp";
					img.id = "resultImg"; 
					result.appendChild(img);
				}
				else
				{
					document.getElementById("resultImg").src = "donotcall.bmp";
				}
				found = 1;   //set found to true if the number matches
				break;  //break out of the loop when done
			}
		}
		
		if(found == 0)
		{
			if(!document.getElementById("resultImg"))
			{
				var result = document.getElementById("result")
				var img = document.createElement("img");
				img.src = "oktocall.bmp";
				img.id = "resultImg"; 
				result.appendChild(img);
			}
			else
			{
				document.getElementById("resultImg").src = "oktocall.bmp";
			}
		}
	}


Was This Post Helpful? 0
  • +
  • -

#10 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 01:08 PM

First, we can clean up your code a little.

Like so:
function getResultImage() {
	var resultImgNode = document.getElementById("resultImg");
	if(!resultImgNode) {
		var result = document.getElementById("result")
		resultImgNode = document.createElement("img");
		resultImgNode.id = "resultImg"; 
		result.appendChild(resultImgNode);
	}
	return resultImgNode;
}

function hasValue(value) {
	var headNodes = xmlDoc.getElementsByTagName("ph");
	for(var x = 0; x < headNodes.length; x++) {
		if(headNodes[x].getAttribute("val") == value) { return true; }
	}
	return false;
}

function searchXML() {
	var srchTerms = document.getElementById("txtFirst").value + document.getElementById("txtSecond").value;
	var resultImgNode = getResultImage();
	resultImgNode.src = hasValue(srchTerms) ? "donotcall.bmp" : "oktocall.bmp";
}



So, really all we need to do is speed up hasValue. Here's where having an internal representation will help. Let's make one.
var lookupTable = new LookupTable();

function LookupTable() {
	this.list = new Array();
	var headNodes = xmlDoc.getElementsByTagName("ph");
	for(var x = 0; x < headNodes.length; x++) {
		this.list[headNodes[x].getAttribute("val")] = true;
	}
	this.hasValue = function(val) { return this.list[val]; }
}



Then just change how we search:
function searchXML() {
	var srchTerms = document.getElementById("txtFirst").value + document.getElementById("txtSecond").value;
	var resultImgNode = getResultImage();
	resultImgNode.src = lookupTable.hasValue(srchTerms) ? "donotcall.bmp" : "oktocall.bmp";
}



I'm not 100% sure on my syntax; javascript has oddball scoping. See how this works for you.
Was This Post Helpful? 1
  • +
  • -

#11 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 02:41 PM

Ok so I added your code and now here's what's happening. Since the declaration of lookupTable is outside a function it gets initialized right when the page is called on. This makes it so that xmlDoc is not yet defined and it causes an error.

When I change my head section to this:

<script type="text/javascript" src="library.js">
		loadXML();
var lookupTable = new LookupTable();
	</script>



It works fine until I enter a phone number, then it gives me the error "lookupTable is undefined" and points to the line:

resultImgNode.src = lookupTable.hasValue(srchTerms) ? "donotcall.bmp" : "oktocall.bmp";



I am really confused now.

EDIT:

Ok so I put an alert in the function LookupTable() to let me know when it finishes, and nothing gets displayed. How would I call this function?

lookupTable.LookupTable() ?

This post has been edited by Moshambi: 29 December 2009 - 02:45 PM

Was This Post Helpful? 0
  • +
  • -

#12 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 03:49 PM

OK baavgai I finally got it working by declaring:

var lookupTable;



as a global variable and then adding this code to the searchXML();

if(typeof(lookupTable) == "undefined")
		{
			lookupTable = new LookupTable();
		}



Now the file takes like 3 minutes to load up but once it's loaded it is almost instant. I like that it's instant but I am not sure what to do about the other part, maybe add an animated gif or something (if I can't eventually get this into a database.

Thanks for all your help! And if there is anything more you would like to add I would love to hear it.
Was This Post Helpful? 0
  • +
  • -

#13 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 29 December 2009 - 04:43 PM

View PostMoshambi, on 29 Dec, 2009 - 04:49 PM, said:

OK baavgai I finally got it working by declaring:


Excellent!

I did take another shot at it. This is a complete example, slightly different.
<html>
<head>
<script type="text/javascript">
// don't reference this variable directly
// but through the method to be sure it's loaded
var lookupTableInstance = null;

function getXmlDoc(path) {
	try {
		//Internet Explorer
		var xml = new ActiveXObject("Microsoft.XMLDOM");
	} catch(e) {
		try { //Firefox, Mozilla, Opera, etc.
			xml = document.implementation.createDocument("","",null);
		} catch(e) {
			alert(e.message);
			return;
		}
	}
	xml.async = false;
	xml.load(path);
	return xml;
}

function getLookupTable() {
	if(lookupTableInstance==null) {
		// here we build a Javascript object for lookup
		// note, we ditch the XML Doc when we're done.
		lookupTableInstance = new Array();
		var xmlDoc = getXmlDoc("a.xml");
		var nodes = xmlDoc.getElementsByTagName("ph");
		for(var i=0; i<nodes.length; i++) {
			var key = nodes[i].getAttribute("val");
			lookupTableInstance[key] = true;
		}
	}
	return lookupTableInstance;
}


function hasPhoneNumber(value) {
	var lookupTable = getLookupTable();
	return lookupTable[value];
}

function setResultImage(doNotCall) {
	var img = document.getElementById("resultImg");
	if(!img) {
		var result = document.getElementById("result")
		img = document.createElement("img");
		img.id = "resultImg"; 
		result.appendChild(img);
	}
	img.src = (doNotCall) ? "donotcall.bmp" : "oktocall.bmp";
}


function searchXML() {
	var value = document.getElementById("txtPhone").value;
	var doNotCall = hasPhoneNumber(value);
	
	var txt = document.getElementById("resultText");
	txt.innerHTML = value + ": " + ((doNotCall) ? "Do Not Call" : "Ok To Call");
	setResultImage(doNotCall);
}

</script>
	<title>Lookup Test</title>
</head>
<body>
	<h1>Lookup Test</h1>
	<form id="frm1">
		<input type="text" id="txtPhone" value="" />
		<input type="button" onclick="searchXML();" value="Check Number">
	</form>
	<div id="result">
		<p id="resultText">&nbsp;</p>
	</div>
</body>
</html>


Was This Post Helpful? 0
  • +
  • -

#14 Moshambi  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 8
  • View blog
  • Posts: 280
  • Joined: 20-November 07

Re: Fast Way To Read A Large XML File

Posted 30 December 2009 - 08:29 AM

Thanks for that, it seems to run a little slower than the other code. But I just had an idea and I would like to see what you think.

Opposed to having everything loaded into the array all at one time, what if it loads all the numbers into the array until it finds the one that matches, then stops. Now if they enter another number and it is one up to that point it won't have to load the array again, but if they enter one that's not found then just do the same process except append it to the end of that array.

I'm not sure how clear that sounds so let me try an example here:

Search for: 3456789

//First instance of a search so build array...
array[x] = 1234567;
array[x] = 2345678;
array[x] = 3456789; //Found it so stop building array

Search: 1234567

//already in the array so no need to add on

Search: 9876543

//Not in the array so start building the array from it's endpoint
array[x] = 3475833;

//Etc etc..

Now as I was just typing this I realized only one problem, I would somehow need to set a save spot in the xmlDoc as it would try to load from the beginning again and that wouldn't solve anything.

Does this make sense? Is this a good idea or would it be more efficient/faster doing it this way?
Was This Post Helpful? 0
  • +
  • -

#15 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5821
  • View blog
  • Posts: 12,674
  • Joined: 16-October 07

Re: Fast Way To Read A Large XML File

Posted 30 December 2009 - 12:56 PM

I had thought to do something like that, but it added too much complexity for an example.

I'd encourage you to create a javascript object ahead of time and remove the overhead.

I've mentioned JSON. Here's a little more on it: http://www.json.org/xml.html

Looking at the XML you offered, I thought it would be cool to render the javascript object on a page. You may find json2.js somewhere on the linked site.

<html>
<head>
<script type="text/javascript" src="json2.js"></script>
<script type="text/javascript">
function getXmlDoc(path) {
	try {
		//Internet Explorer
		var xml = new ActiveXObject("Microsoft.XMLDOM");
	} catch(e) {
		try { //Firefox, Mozilla, Opera, etc.
			xml = document.implementation.createDocument("","",null);
		} catch(e) {
			alert(e.message);
			return;
		}
	}
	xml.async = false;
	xml.load(path);
	return xml;
}

function getPh(nodes) {
	var items = new Object();
	for(var i=0; i<nodes.length; i++) {
		var node = nodes[i];
		items[node.getAttribute("val")] = true;
	}
	return items;
}


function getAc(nodes) {
	var items = new Object();
	for(var i=0; i<nodes.length; i++) {
		var node = nodes[i];
		items[node.getAttribute("val")] = getPh(node.getElementsByTagName("ph"));
	}
	return items;
}


function getList(nodes) {
	var items = new Object();
	for(var i=0; i<nodes.length; i++) {
		var node = nodes[i];
		var item =  new Object();
		item.type = node.getAttribute("type");
		item.level = node.getAttribute("level");
		item.ac = getAc(node.getElementsByTagName("ac"));
		items[node.getAttribute("val")] = item;
	}
	return items;
}

function load_json() {
	var xmlDoc = getXmlDoc("a.xml");
	var lookupTable = getList(xmlDoc.getElementsByTagName("list"));
	var txt = document.getElementById("source");
	txt.innerHTML = "var lookupTable = " + JSON.stringify(lookupTable);
}

</script>
</head>
<body onload="load_json()">
	<p>Javascript source:</p>
	<pre id="source"></pre>
</body>
</html>



You load the page and cut and paste the display into a .js file. From that point on, rather than loading XML and parsing in, you just use javascript natively.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2