Welcome to Dream.In.Code
Getting PHP Help is Easy!

Join 132,625 PHP Programmers for FREE! Get instant access to thousands of PHP experts, tutorials, code snippets, and more! There are 1,066 people online right now. Registration is fast and FREE... Join Now!




using regex to strip firefox bookmark html

3 Pages V  1 2 3 >  
Reply to this topicStart new topic

using regex to strip firefox bookmark html, and place into db, need advice

knownasilya
post 11 Jan, 2006 - 03:06 PM
Post #1


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


What i want to do is make a file upload form, with which you would then upload the Firefox Bookmark export html. The I want to take the links,title for that link, and the favicon, and store each link with its other parts in a mySQL database. And from what I hear, this could only be possible with regex. But iv never used regex. So im not sure were to start or how to go about doing this. And maybe for a plus, keeping the bookmark structure, as a filetree.

What I noticed in the code, is that the icon is fully stored in the html file, atleast I think so.

This is the favicon for DIC
CODE
ICON="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAI
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD///8A////AI2Njf+NjY3/jY2N/42Njf+NjY3/jY2N/42Njf+NjY3/jY2N
/42Njf+NjY3/jY2N/42Njf+NjY3/////////////////////////////////////////////////////////////////////////////////jY2N//////
8BbLD/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP/
/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AnP//AAAA/w
AAAP8AAAD/AAAA/wAAAP8AnP//AJz//wCc////////jY2N//////8BbLD/AJz//wCc//8AnP//AAAA/wAAAP8AAAD/AAAA/wA
AAP8AAAD/AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP//AJz//wAAAP8AAAD/AJz//wCc//8AAAD/AAAA/wCc//
8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AAAD/AAAA/wCc//8AnP//AAAA/wAAAP8AnP//AJz//wCc////////jY
2N//////8BbLD/AJz//wCc//8AnP//AAAA/wAAAP8AAAD/AAAA/wAAAP8AAAD/AJz//wCc//8AnP///////42Njf//////AWyw/wC
c//8AnP//AJz//wCc//8AAAD/AAAA/wAAAP8AAAD/AAAA/wCc//8AnP//AJz///////+NjY3//////wFssP8AnP//AJz//wCc//8AnP//
AJz//wCc//8AnP//AAAA/wAAAP8AnP//AJz//wCc////////jY2N//////8BbLD/AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wAAAP8A
AAD/AJz//wCc//8AnP///////42Njf//////AWyw/wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AAAD/AAAA/wCc//8AnP//AJz//////
/+NjY3//////wFssP8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc//8AnP//AJz//wCc////////jY2N//////8BbLD/AWyw/wFss
P8BbLD/AWyw/wFssP8BbLD/AWyw/wFssP8BbLD/AWyw/wFssP8BbLD//////////wD///////////////////////////////////////////////////
////////////////////////////////8AwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAQAAAAEAAA=="



If you dont use firefox, then here is a bookmark export example:

http://notemanager.net/bookmarks.html

This post has been edited by knownasilya: 11 Jan, 2006 - 05:06 PM
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 11 Jan, 2006 - 10:15 PM
Post #2


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


FORM(file.php):
CODE
<!-- The data encoding type, enctype, MUST be specified as below -->
<form enctype="multipart/form-data" action="fileAction.php" method="POST">
    <!-- MAX_FILE_SIZE must precede the file input field -->
    <input type="hidden" name="MAX_FILE_SIZE" value="30000" />
    <!-- Name of input element determines name in $_FILES array -->
    Send this file: <input name="userfile" type="file" />
    <input type="submit" value="Send File" />
</form>


REGEX(fileAction.php):
CODE
<?php
$file = implode(file("link.html"));
// get host name from URL
preg_match("/^(http:\/\/)?([^\/]+)/i",
  $file , $matches);


for($i = 0; $i < 22; $i++ )
echo "{$matches[0][$i]}\n";

echo $file;
?>


FILE(link.html):
CODE
<html>
<a href="http://www.php.net/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net2/preg_match2">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net3/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net4/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net5/preg_match">http://www.php.net/preg_match</a><br />
<a href="http://www.php.net6/preg_match">http://www.php.net/preg_match</a><br />
</html>


Now what im getting is just

CODE
< h t m l > < a h r e f = " h t t p :


if i make it loop any further, it gives
CODE

Notice: Uninitialized string offset: 22 in C:\symproj\askeet\web\fileAction.php on line 11
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 11 Jan, 2006 - 11:47 PM
Post #3


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Does no one know anything about regex/preg_match.

But here is the latest update.


REGEX(fileAction.php):
CODE
<?php
$file = implode(file("link.html"));
// get host name from URL
preg_match_all("/php/i",
  $file , $matches);


for($i = 0; $i < 10; $i++ )
echo "{$matches[0][$i]}\n";

echo $file;
?>



Now I need to figure out how to regex everything after http:// and until after </A>, then maybe do preg_split? And split em and assign every piece to a different variable, and do this to every <a> until end of file. Damn this is hurting my brain. Regex isnt very well documented. Little examples, atleast for my need. tongue.gif
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 17 Jan, 2006 - 10:32 PM
Post #4


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Does anybody know regex? Or is it something else. Im very busy, spring semester just started, so I have like no time at the moment. Any help is totally welcome.
User is offlineProfile CardPM

Go to the top of the page

snoj
post 17 Jan, 2006 - 10:41 PM
Post #5


$Null

Group Icon
Joined: 31 Mar, 2003
Posts: 3,304



Thanked 5 times

Dream Kudos: 700
My Contributions


Does your server's PHP have DOM enabled? I'm not sure, but you could possibly navigate and get your information throught that. At least I think it's possible, I haven't tested it.
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 20 Jan, 2006 - 07:23 PM
Post #6


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Sounds interesting. XML is good. Ill check into it after I have dinner. And thanks for the reply. ^^

Im sure this is it. Looks like its enabled.
CODE
domxml
DOM/XML  enabled
DOM/XML API Version  20020815
libxml Version  20622
HTML Support  enabled
XPath Support  enabled
XPointer Support  enabled
DOM/XSLT  enabled
libxslt Version  1.1.11
libxslt compiled against libxml Version  2.6.16
DOM/EXSLT  enabled
libexslt Version  1.1.8



Now I have to figure out what this DOM thing is. biggrin.gif

This post has been edited by knownasilya: 20 Jan, 2006 - 07:29 PM
User is offlineProfile CardPM

Go to the top of the page

snoj
post 20 Jan, 2006 - 08:26 PM
Post #7


$Null

Group Icon
Joined: 31 Mar, 2003
Posts: 3,304



Thanked 5 times

Dream Kudos: 700
My Contributions


Sample on working with php4's DOM stuff.

CODE
//Parts of this are provided from the php DOM documantation.
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('./knownasilya0002.html'));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
  $value = $links->item($i)->nodeValue;
  $href = $links->item($i)->getAttribute('href');
 
  //add other attributes you want.
}


Have fun! biggrin.gif
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 21 Jan, 2006 - 02:48 AM
Post #8


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Looks interesting. And I can understand it. biggrin.gif I'll have a go at it tomorrow morning. Need some rest now.

EDIT: Works perfect. Now I just need to set it for the mysql db. But it doesnt work on my live host.. Gives a parse error on line 9. I guess because its php4.4.1 it doesnt like the -> on line 9 and ten. Because my local is php5.something. Well I'm gona need an upgrade, which is coming anyways.

This post has been edited by knownasilya: 21 Jan, 2006 - 06:38 PM
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 22 Jan, 2006 - 01:35 AM
Post #9


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Okay, weird mysql error.

when i run the script i get this after 44 entries have been added to the db.
CODE
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't delete' )' at line 1


then i checked mysqlcc and when I did return all rows i got this error. And I cant delete rows, get the same message again.
CODE
[main-mysql] ERROR 1146: Table 'form.1' doesn't exist


mySQL Query:
CODE
CREATE TABLE bookmarks(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
description VARCHAR(500),
url VARCHAR(500))



PHP code:
CODE

<?php
mysql_connect("localhost", "user", "pass") or die(mysql_error());
mysql_select_db("form") or die(mysql_error());

$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents('bookmarks.html'));

$links = $doc->getElementsByTagName('a');

for ($i = 0; $i < $links->length; $i++) {
 $value = $links->item($i)->nodeValue;
 $href = $links->item($i)->getAttribute('href');
 
       mysql_query("INSERT INTO bookmarks(url, description) VALUES('$href', '$value' ) ")
       or die(mysql_error());

    echo $value . '&nbsp;&nbsp;<b>[ADDED]</b><br />';
   
}

?>


[MOD Edit] We don't want your passwords! wink2.gif

This post has been edited by hotsnoj: 22 Jan, 2006 - 02:07 AM
User is offlineProfile CardPM

Go to the top of the page

snoj
post 22 Jan, 2006 - 02:09 AM
Post #10


$Null

Group Icon
Joined: 31 Mar, 2003
Posts: 3,304



Thanked 5 times

Dream Kudos: 700
My Contributions


Don't ever, ever put data into a database without first using some sort of character escape for any and all variables that you don't have control over. Personally I use addslashes(). It may even fix your problem.
User is offlineProfile CardPM

Go to the top of the page

knownasilya
post 22 Jan, 2006 - 02:48 AM
Post #11


D.I.C Head

Group Icon
Joined: 11 Jan, 2006
Posts: 144


My Contributions


Well im learning. How does addslashes() work? Im gona go check it out on php.net


I did this


CODE
 mysql_query("INSERT INTO bookmarks(url, description) VALUES('addslashes($href)', 'addslashes($value)' ) ")




now i get this error

CODE

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't delete)' )' at line 1


This post has been edited by knownasilya: 22 Jan, 2006 - 03:07 AM
User is offlineProfile CardPM

Go to the top of the page

Amadeus
post 22 Jan, 2006 - 07:14 AM
Post #12


g++ -o drink whiskey.cpp

Group Icon
Joined: 12 Jul, 2002
Posts: 12,176



Thanked 33 times

Dream Kudos: 25
My Contributions


You'll likely have to escape the string to use the addslashes() function, or perform it before the statement.

You can also print out the statement, and see what is being passed to the query.
User is offlineProfile CardPM

Go to the top of the page

3 Pages V  1 2 3 >
Fast ReplyReply to this topicStart new topic
Time is now: 11/23/08 03:31AM

Live PHP Help!

PHP Tutorials

Reference Sheets

PHP Snippets

Bye Bye Ads

Free DIC T-Shirt

T-Shirt Example

Related Sites

Monthly Drawing

Thumb Drive

Partners

Top Contributors

Top 10 Kudos This Month