3 Replies - 5516 Views - Last Post: 31 October 2004 - 01:28 PM Rate Topic: -----

#1 richowe  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 64
  • Joined: 15-May 04

Programatically Logging In And Page Scraping

Post icon  Posted 30 October 2004 - 05:36 PM

Okay, I'm working with a client who needs to access product availability/inventory levels in real-time with a 3rd party company.

------The Challenge----------------

1.) This company does not provide an API or URL to interface with their product database

2.) This company does provide a web page login for members to log in and check inventory manually via a form

------The System-------------------

1.) Using PHP as CGI on Windows 2000 Advanced Server
2.) No PEAR/PECL modules installed

------The Problem------------------

1.) How to programatically log in to a SSL page with CGI PHP
2.) Navigate to SSL inventory page after logging in OR
3.) Submit query based on product number
4.) Scrape the results and import into local database

--------------------------------------

If I can just get to step 3 of the problem I can scrape the page for info

Any ideas? Solutions?

Is This A Good Question/Topic? 0
  • +

Replies To: Programatically Logging In And Page Scraping

#2 cyberscribe  Icon User is offline

  • humble.genius
  • member icon

Reputation: 10
  • View blog
  • Posts: 1,062
  • Joined: 05-May 02

Re: Programatically Logging In And Page Scraping

Posted 30 October 2004 - 06:22 PM


Was This Post Helpful? 0
  • +
  • -

#3 richowe  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 64
  • Joined: 15-May 04

Re: Programatically Logging In And Page Scraping

Posted 30 October 2004 - 09:19 PM

Will HTTP::Request allow me to:

1.) Authenticate
2.) Maintain session state somehow between the server and the web site to make additional SSL requests?

-------------------------

Not sure how to install PEAR using Windows 2000 and CGI version not service module version.

Okay so downloading PEAR and as usual, Windows installations are a bitch!

Here's the error I'm getting:

Starting installation ...
Loading zlib: ok
Downloading package: PEAR-stable......ok
Downloading package: Archive_Tar-stable....ok
Downloading package: Console_Getopt-stable....ok
Downloading package: XML_RPC-stable....ok
Downloading package: Pager............ok
Downloading package: HTML_Template_IT....ok
Downloading package: Net_UserAgent_Detect....ok
Downloading package: PEAR_Frontend_Web....ok
Bootstrapping: PEAR...................(remote) ok
Bootstrapping: Archive_Tar............(remote) ok
Bootstrapping: Console_Getopt.........(remote) ok
Extracting installer..................

Warning: main(PEAR.php): failed to open stream: No such file or directory in C:\Inetpub\wwwroot\go-pear\Archive\Tar.php on line 21

Fatal error: main(): Failed opening required 'PEAR.php' (include_path='C:/WINNT/TEMP/gop8.tmp') in C:\Inetpub\wwwroot\go-pear\Archive\Tar.php on line 21


The file referred to on Line 21 of Tar.php is

require_once 'PEAR.php';

which is acutally one directory above, but even changing it to ../pear.php doesn't work

grr

This post has been edited by richowe: 30 October 2004 - 09:57 PM

Was This Post Helpful? 0
  • +
  • -

#4 cyberscribe  Icon User is offline

  • humble.genius
  • member icon

Reputation: 10
  • View blog
  • Posts: 1,062
  • Joined: 05-May 02

Re: Programatically Logging In And Page Scraping

Posted 31 October 2004 - 01:28 PM

Yes -- HTTP::Request will allow you to log in over SSL. I'm not quite sure what you mean by 'state.' HTTP itself is a stateless protocol. Do you mean cookies?

If you are having grief with Pear remember that you can always just copy the package and any necessary dependencies into the root directory of your code base so that:
include('http/Request');



works to resolve ./http/Request.php instead of /path/to/pear/http/Request.php

hth.
Was This Post Helpful? 1

Page 1 of 1