using regex to strip firefox bookmark html

and place into db, need advice

  • (2 Pages)
  • +
  • 1
  • 2

25 Replies - 5438 Views - Last Post: 08 January 2007 - 05:20 PM Rate Topic: -----

#1 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

using regex to strip firefox bookmark html

Posted 11 January 2006 - 04:06 PM

What i want to do is make a file upload form, with which you would then upload the Firefox Bookmark export html. The I want to take the links,title for that link, and the favicon, and store each link with its other parts in a mySQL database. And from what I hear, this could only be possible with regex. But iv never used regex. So im not sure were to start or how to go about doing this. And maybe for a plus, keeping the bookmark structure, as a filetree.

What I noticed in the code, is that the icon is fully stored in the html file, atleast I think so.

This is the favicon for DIC
ICON="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAI
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD///8A////AI2Njf+NjY3/jY2N/42Njf+NjY3/jY2N/42Njf+NjY3/jY2N
/42Njf+NjY3/jY2N/42Njf+NjY3/////////////////////////////////////////////////////////////////////////////////jY2N//////
8BbLD/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP/
/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AnP//AAAA/w
AAAP8AAAD/AAAA/wAAAP8AnP//AJz//wCc////////jY2N//////8BbLD/AJz//wCc//8AnP//AAAA/wAAAP8AAAD/AAAA/wA
AAP8AAAD/AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP//AJz//wAAAP8AAAD/AJz//wCc//8AAAD/AAAA/wCc//
8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AAAD/AAAA/wCc//8AnP//AAAA/wAAAP8AnP//AJz//wCc////////jY
2N//////8BbLD/AJz//wCc//8AnP//AAAA/wAAAP8AAAD/AAAA/wAAAP8AAAD/AJz//wCc//8AnP///////42Njf//////AWyw/wC
c//8AnP//AJz//wCc//8AAAD/AAAA/wAAAP8AAAD/AAAA/wCc//8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AnP//
AJz//wCc//8AnP//AAAA/wAAAP8AnP//AJz//wCc////////jY2N//////8BbLD/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wAAAP8A
AAD/AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AAAD/AAAA/wCc//8AnP//AJz//////
/+NjY3//////wFssP8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc////////jY2N//////8BbLD/AWyw/wFss
P8BbLD/AWyw/wFssP8BbLD/AWyw/wFssP8BbLD/AWyw/wFssP8BbLD//////////wD///////////////////////////////////////////////////
////////////////////////////////8AwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAQAAAAEAAA=="



If you dont use firefox, then here is a bookmark export example:

http://notemanager.net/bookmarks.html

This post has been edited by knownasilya: 11 January 2006 - 06:06 PM


Is This A Good Question/Topic? 0
  • +

Replies To: using regex to strip firefox bookmark html

#2 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 11 January 2006 - 11:15 PM

FORM(file.php):
<!-- The data encoding type, enctype, MUST be specified as below -->
<form enctype="multipart/form-data" action="fileAction.php" method="POST">
	<!-- MAX_FILE_SIZE must precede the file input field -->
	<input type="hidden" name="MAX_FILE_SIZE" value="30000" />
	<!-- Name of input element determines name in $_FILES array -->
	Send this file: <input name="userfile" type="file" />
	<input type="submit" value="Send File" />
</form>


REGEX(fileAction.php):
<?php
$file = implode(file("link.html")); 
// get host name from URL
preg_match("/^(http:\/\/)?([^\/]+)/i",
   $file , $matches);


for($i = 0; $i < 22; $i++ )
echo "{$matches[0][$i]}\n";

echo $file;
?> 



FILE(link.html):
<html>
<a href="http://www.php.net/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net2/preg_match2">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net3/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net4/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net5/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net6/preg_match">http://www.php.net/preg_match</a><br />
</html>


Now what im getting is just

< h t m l > < a h r e f = " h t t p :


if i make it loop any further, it gives
Notice: Uninitialized string offset: 22 in C:\symproj\askeet\web\fileAction.php on line 11

Was This Post Helpful? 0
  • +
  • -

#3 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 12 January 2006 - 12:47 AM

Does no one know anything about regex/preg_match.

But here is the latest update.


REGEX(fileAction.php):
<?php
$file = implode(file("link.html")); 
// get host name from URL
preg_match_all("/php/i",
   $file , $matches);


for($i = 0; $i < 10; $i++ )
echo "{$matches[0][$i]}\n";

echo $file;
?> 



Now I need to figure out how to regex everything after http:// and until after </A>, then maybe do preg_split? And split em and assign every piece to a different variable, and do this to every <a> until end of file. Damn this is hurting my brain. Regex isnt very well documented. Little examples, atleast for my need. :P
Was This Post Helpful? 0
  • +
  • -

#4 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 17 January 2006 - 11:32 PM

Does anybody know regex? Or is it something else. Im very busy, spring semester just started, so I have like no time at the moment. Any help is totally welcome.
Was This Post Helpful? 0
  • +
  • -

#5 snoj  Icon User is offline

  • Married Life
  • member icon

Reputation: 84
  • View blog
  • Posts: 3,564
  • Joined: 31-March 03

Re: using regex to strip firefox bookmark html

Posted 17 January 2006 - 11:41 PM

Does your server's PHP have DOM enabled? I'm not sure, but you could possibly navigate and get your information throught that. At least I think it's possible, I haven't tested it.
Was This Post Helpful? 0
  • +
  • -

#6 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 20 January 2006 - 08:23 PM

Sounds interesting. XML is good. Ill check into it after I have dinner. And thanks for the reply. ^^

Im sure this is it. Looks like its enabled.
domxml
DOM/XML  enabled
DOM/XML API Version  20020815
libxml Version  20622
HTML Support  enabled
XPath Support  enabled
XPointer Support  enabled
DOM/XSLT  enabled
libxslt Version  1.1.11
libxslt compiled against libxml Version  2.6.16
DOM/EXSLT  enabled
libexslt Version  1.1.8



Now I have to figure out what this DOM thing is. :D

This post has been edited by knownasilya: 20 January 2006 - 08:29 PM

Was This Post Helpful? 0
  • +
  • -

#7 snoj  Icon User is offline

  • Married Life
  • member icon

Reputation: 84
  • View blog
  • Posts: 3,564
  • Joined: 31-March 03

Re: using regex to strip firefox bookmark html

Posted 20 January 2006 - 09:26 PM

Sample on working with php4's DOM stuff.

//Parts of this are provided from the php DOM documantation.
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('./knownasilya0002.html'));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
   $value = $links->item($i)->nodeValue;
   $href = $links->item($i)->getAttribute('href');
   
   //add other attributes you want.
}


Have fun! :D
Was This Post Helpful? 0
  • +
  • -

#8 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 21 January 2006 - 03:48 AM

Looks interesting. And I can understand it. :D I'll have a go at it tomorrow morning. Need some rest now.

EDIT: Works perfect. Now I just need to set it for the mysql db. But it doesnt work on my live host.. Gives a parse error on line 9. I guess because its php4.4.1 it doesnt like the -> on line 9 and ten. Because my local is php5.something. Well I'm gona need an upgrade, which is coming anyways.

This post has been edited by knownasilya: 21 January 2006 - 07:38 PM

Was This Post Helpful? 0
  • +
  • -

#9 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 02:35 AM

Okay, weird mysql error.

when i run the script i get this after 44 entries have been added to the db.
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't delete' )' at line 1



then i checked mysqlcc and when I did return all rows i got this error. And I cant delete rows, get the same message again.
[main-mysql] ERROR 1146: Table 'form.1' doesn't exist


mySQL Query:
CREATE TABLE bookmarks(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
description VARCHAR(500),
url VARCHAR(500))



PHP code:
<?php
mysql_connect("localhost", "user", "pass") or die(mysql_error());
mysql_select_db("form") or die(mysql_error());

$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('bookmarks.html'));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
  $value = $links->item($i)->nodeValue;
  $href = $links->item($i)->getAttribute('href');
  
        mysql_query("INSERT INTO bookmarks(url, description) VALUES('$href', '$value' ) ")
        or die(mysql_error());

	echo $value . '&nbsp;&nbsp;<b>[ADDED]</b><br />';
     
}

?>


[MOD Edit] We don't want your passwords! ;)

This post has been edited by hotsnoj: 22 January 2006 - 03:07 AM

Was This Post Helpful? 0
  • +
  • -

#10 snoj  Icon User is offline

  • Married Life
  • member icon

Reputation: 84
  • View blog
  • Posts: 3,564
  • Joined: 31-March 03

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 03:09 AM

Don't ever, ever put data into a database without first using some sort of character escape for any and all variables that you don't have control over. Personally I use addslashes(). It may even fix your problem.
Was This Post Helpful? 0
  • +
  • -

#11 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 03:48 AM

Well im learning. How does addslashes() work? Im gona go check it out on php.net


I did this


  mysql_query("INSERT INTO bookmarks(url, description) VALUES('addslashes($href)', 'addslashes($value)' ) ")




now i get this error

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't delete)' )' at line 1

This post has been edited by knownasilya: 22 January 2006 - 04:07 AM

Was This Post Helpful? 0
  • +
  • -

#12 Amadeus  Icon User is offline

  • g+ + -o drink whiskey.cpp
  • member icon

Reputation: 248
  • View blog
  • Posts: 13,507
  • Joined: 12-July 02

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 08:14 AM

You'll likely have to escape the string to use the addslashes() function, or perform it before the statement.

You can also print out the statement, and see what is being passed to the query.
Was This Post Helpful? 0
  • +
  • -

#13 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 03:48 PM

This is the url it gives me an error on. It prints just fine without the mysql injection

All about Linux: Make your files immutable which even root can't deletehttp://linuxhelp.blogspot.com/2005/11/make-your-files-immutable-which-even.html


How would I addslashes to $href & $value before I insert it into the query?
Okay nevermind, fixed that.
  $value = addslashes($links->item($i)->nodeValue);
  $href = addslashes($links->item($i)->getAttribute('href'));


Now a new problem.
I have a bookmark with the description:
styling <select>

When it goes thru the mysql it stops at that and it displays styling, and then an actual select box. :D


But it actuually inputs all of the bookmarks into the mysql db. I probably wont have it output all of the bookmarks that have been put in. But maybe just the number of them. Which im not sure about either.

EDIT:

This works perfect. Now I just need to make a form and see if it works with this.


<?php
mysql_connect("localhost", "root", "pass") or die(mysql_error());
mysql_select_db("form") or die(mysql_error());

$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('bookmarks.html'));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
  $value = addslashes($links->item($i)->nodeValue);
  $href = addslashes($links->item($i)->getAttribute('href'));
  
 
        mysql_query("INSERT INTO bookmarks(url, description) VALUES('addslashes($href)', 'addslashes($value)' ) ")
        or die(mysql_error());
     
}

	echo $i . '&nbsp;Bookmarks were imported.';

?>

This post has been edited by knownasilya: 22 January 2006 - 04:11 PM

Was This Post Helpful? 0
  • +
  • -

#14 knownasilya  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 148
  • Joined: 11-January 06

Re: using regex to strip firefox bookmark html

Posted 22 January 2006 - 04:10 PM

FORM:
<form enctype="multipart/form-data" action="test2.php" method="POST">
    <input type="hidden" name="MAX_FILE_SIZE" value="30000" />
    Send this file: <input name="userfile" type="file" />
    <input type="submit" value="Send File" />
</form>



MODIFIED PHP:
<?php
mysql_connect("localhost", "root", "pass") or die(mysql_error());
mysql_select_db("form") or die(mysql_error());

$file = $_POST['userfile'];
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($file));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
  $value = addslashes($links->item($i)->nodeValue);
  $href = addslashes($links->item($i)->getAttribute('href'));
  
 
        mysql_query("INSERT INTO bookmarks(url, description) VALUES('addslashes($href)', 'addslashes($value)' ) ")
        or die(mysql_error());
     
}

	echo $i . '&nbsp;Bookmarks were imported.';

?>




gives me

Notice: Undefined index: userfile in C:\symproj\askeet\web\test2.php on line 5
0 Bookmarks were imported.

Would I have to use $_FILES[]; ? And what would the most effective/least bandwidth on server. Make a temp copy of the file? Or is there a way to just assign the file to a variable?



EDIT:

Okay, i got it. :D Man this is so much fun to figure out. Im enjoying this.

<?php
mysql_connect("localhost", "root", "pass") or die(mysql_error());
mysql_select_db("form") or die(mysql_error());

$target_path = "uploads/";
$target_path = $target_path . basename( $_FILES['uploadedfile']['name']);

$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($_FILES['uploadedfile']['tmp_name']));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
  $value = addslashes($links->item($i)->nodeValue);
  $href = addslashes($links->item($i)->getAttribute('href'));
  
 
        mysql_query("INSERT INTO bookmarks(url, description) VALUES('addslashes($href)', 'addslashes($value)' ) ")
        or die(mysql_error());
     
}

	echo $i . '&nbsp;Bookmarks were imported.';
	

?>

This post has been edited by knownasilya: 22 January 2006 - 07:25 PM

Was This Post Helpful? 0
  • +
  • -

#15 nofate  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 25-December 06

Re: using regex to strip firefox bookmark html

Post icon  Posted 25 December 2006 - 05:07 AM

Hello, codes in this thread works well. But Im trying put into database link_name , link_href, and its category / folder. As source I use html file exported from web browser. It is Html - Format (Netscape bookmark file format) - compatible withh almost all web browsers. This html file bookmarks.html is here.

Im using following code, but there is no connection between links and it's folder/category. And this is problem for me...

<?php

$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('./bookmarks.html'));

$links = $doc->getElementsByTagName('a');
$folders = $doc->getElementsByTagName('h3'); // new folder/category in exported file bookmarks.html allways starts with <h3>

echo "<table>";

while ($i < ($links->length))
  {
  $category = $folders->item($i)->nodeValue;
  $value = $links->item($i)->nodeValue;
  $href = $links->item($i)->getAttribute('href');
	echo "<tr><td> $category </td> <td> $href </td> <td> $value </td></tr>";  // here will be mysql queries instead of echo.
  $i++;
  }

echo "</table>";

?>



Click here to see result of this script.

Any ideas?
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2