5 Replies - 902 Views - Last Post: 18 December 2014 - 01:03 PM Rate Topic: -----

#1 se7en983   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 20-October 12

What's the best way to copy a single HTML page from a url with ima

Posted 15 December 2014 - 01:20 PM

What's the best way to copy a single HTML page from a url with images using PHP?

This is for legitimate use with permission i.e. not for staling other peoples source or images, etc!

I've recently built an email marketing system at work where affiliates often send a link to a html page that I have to copy the source code and images to our database / server before we can use them.

I need a way to input a URL for a html campaign from an affiliate and run a script that copies the html source and the images to our server and does a little cleaning etc before adding it to our system for use.

1) I've considered curl -- but can't find any functionality that does this.
2) Also thought about WGet but not sure if this is the best approach?
3) AND thought about using SPL directory / file system Iterators and looking for <img src='blahblahblah'> tags and doing a copy etc to get the images on our server....

Any Ideas which would be the best approach??

Any advice would be much appreciated!

thanks

Is This A Good Question/Topic? 0
  • +

Replies To: What's the best way to copy a single HTML page from a url with ima

#2 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2760
  • View blog
  • Posts: 8,062
  • Joined: 15-January 14

Re: What's the best way to copy a single HTML page from a url with ima

Posted 15 December 2014 - 02:17 PM

If you're using PHP then cURL is the obvious choice, it's more flexible than using a console command like wget or just sending a regular HTTP request. But any of those methods are only going to return the actual HTML source of the page. If you want the final rendered version, after any Javascript code executes, then you're going to need to use a rendering engine that can execute that code and change the DOM, then export the DOM as HTML. Once you have that then, yeah, you can just search through the code looking for img tags if that's all you want to check for. There are also CSS background images but, again, you'll need a rendering engine for that to help figure out which CSS rules apply to which elements. Once you have the various image URLs then you can figure out the absolute path to each one based on whether or not the URL starts with a protocol, and then you'll have a list of images on the page.
Was This Post Helpful? 0
  • +
  • -

#3 se7en983   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 20-October 12

Re: What's the best way to copy a single HTML page from a url with ima

Posted 15 December 2014 - 02:45 PM

View PostArtificialSoldier, on 15 December 2014 - 02:17 PM, said:

If you're using PHP then cURL is the obvious choice, it's more flexible than using a console command like wget or just sending a regular HTTP request. But any of those methods are only going to return the actual HTML source of the page. If you want the final rendered version, after any Javascript code executes, then you're going to need to use a rendering engine that can execute that code and change the DOM, then export the DOM as HTML. Once you have that then, yeah, you can just search through the code looking for img tags if that's all you want to check for. There are also CSS background images but, again, you'll need a rendering engine for that to help figure out which CSS rules apply to which elements. Once you have the various image URLs then you can figure out the absolute path to each one based on whether or not the URL starts with a protocol, and then you'll have a list of images on the page.


remember i'm only interested in html pages that will be used as email marketing templates so no need to worry about JS -- and as email templates use inline css and tables etc -- css images shouldn't be a problem (maybe! , didn't consider them so thanks for the input on that!)

I couldn't find a suitable curl function for following links and copying images etc.... do you know of any that might be suitable ??

This stuff needs to be coded asap so i would love to spend some time using some of the SPL iterator/dir iterators and copy functions for the images etc that could be re-used but i'm looking for the fastest solution to this problem -- not necessary the best solution!

Thanks for you time and input -- much appreciated:)
Was This Post Helpful? 0
  • +
  • -

#4 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2760
  • View blog
  • Posts: 8,062
  • Joined: 15-January 14

Re: What's the best way to copy a single HTML page from a url with ima

Posted 15 December 2014 - 04:05 PM

Quote

I couldn't find a suitable curl function for following links and copying images etc.... do you know of any that might be suitable ??

cURL is for sending HTTP requests in general. I don't think there is a single function that does everything you're looking for, you'll need to parse the document yourself. You can use something like DOMDocument to parse that and find img nodes, assuming the HTML is well-formed.

Quote

This stuff needs to be coded asap so i would love to spend some time using some of the SPL iterator/dir iterators and copy functions

In order to use the SPL iterators you need something to iterate over. That list of image URLs will come from downloading and parsing the HTML document.
Was This Post Helpful? 0
  • +
  • -

#5 JackOfAllTrades   User is offline

  • Saucy!
  • member icon

Reputation: 6259
  • View blog
  • Posts: 24,028
  • Joined: 23-August 08

Re: What's the best way to copy a single HTML page from a url with ima

Posted 15 December 2014 - 04:06 PM

There's also Goutte.
Was This Post Helpful? 0
  • +
  • -

#6 se7en983   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 20-October 12

Re: What's the best way to copy a single HTML page from a url with ima

Posted 18 December 2014 - 01:03 PM

Just an update on this, I managed to solve this problem using php's DOMDocument Methods as suggested in the reply above.

Basically something like this;
$url = 'http://someURL.co.uk/page-to-download.html';

$doc = new DOMDocument();

$doc->loadHTMLFile( $url );

$image_nodes_array = $doc->getElementsByTagName('img');

foreach ($image_nodes_array as $image_node ){

$image_resource = $image_node->getAttribute('src'); // get me the path to the image as a string.

$pieces = explode("/", $image_resource );

$image_name = array_pop($pieces);

copy($image_resource, 'images_dir/' . $image_name);
# etc.....
}

# etc.....





It went something like that, the codes at work so I can't really remember the exact syntax, obviously more checking and other stuff but you get the idea.

thanks for the input it helped lead me to a pretty painless solution:)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1