7 Replies - 480 Views - Last Post: 12 September 2011 - 11:15 PM Rate Topic: -----

#1 king1212  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 65
  • Joined: 06-September 11

Error on CURL

Posted 12 September 2011 - 03:05 AM

hi,

I got this error :

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set

But on my localhost the CURL is enable, and this error has gone.
But when i put it on my server, they say the CURL is already enable, and this error still showing up.

What do i do now? What do i have to say to them? :|

br,
love king!

Is This A Good Question/Topic? 0
  • +

Replies To: Error on CURL

#2 ArcticFox  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 83
  • Joined: 05-April 09

Re: Error on CURL

Posted 12 September 2011 - 04:04 AM

View Postking1212, on 12 September 2011 - 04:05 AM, said:

cannot be activated when in safe_mode or an open_basedir is set

maybe on of those is the problem, did you check?
Was This Post Helpful? 0
  • +
  • -

#3 king1212  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 65
  • Joined: 06-September 11

Re: Error on CURL

Posted 12 September 2011 - 04:05 AM

yes on my server customer support said that there is no problem with CURL, and they keep saying that its already enable and running.
Was This Post Helpful? 0
  • +
  • -

#4 ArcticFox  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 83
  • Joined: 05-April 09

Re: Error on CURL

Posted 12 September 2011 - 04:08 AM

I did some searching and came up with this

http://www.php.net/m...etopt.php#71313

or

http://www.francesco...basedir-is-set/

This post has been edited by ArcticFox: 12 September 2011 - 04:10 AM

Was This Post Helpful? 0
  • +
  • -

#5 king1212  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 65
  • Joined: 06-September 11

Re: Error on CURL

Posted 12 September 2011 - 04:29 AM

yes yes i already read this before i post here sir.
Was This Post Helpful? 0
  • +
  • -

#6 CTphpnwb  Icon User is online

  • D.I.C Lover
  • member icon

Reputation: 3030
  • View blog
  • Posts: 10,563
  • Joined: 08-August 08

Re: Error on CURL

Posted 12 September 2011 - 01:18 PM

Show your code!
Was This Post Helpful? 0
  • +
  • -

#7 king1212  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 65
  • Joined: 06-September 11

Re: Error on CURL

Posted 12 September 2011 - 09:38 PM

ok here is the code

for index.php

$url = "http://www.google.com http://www.yahoo.com http://www.msn.com";
	
	$cropURL = split("[ |\n|\r]+", $url, -1);
	
	foreach($cropURL as $url){
		require_once('spider.class.php');
		
		// create new Spider object
		$spider = new Spider('$cropURL[0]');
		
		// allow files with extension *.txt being spidered
		$spider->allowType('txt');
		
		// and disable files with that extension
		$spider->restrictType('txt');
		
		// set it to true if you want to see what is happening on the screen
		$spider->setVerbose(true);
		
		// start spidering website
		$spider->startSpider();
		
		// all found and fetched links are in that variable
		$links = $spider->all_links;	
		
		$sqlSpider = mysql_query("UPDATE k_feed_details SET spider_date=NOW() WHERE trackDetails='$trackDetails' ") or die (mysql_error());
		
	}


for spider.class.php

<?php
class Spider {

 /**
  * cURL connection handler
  * 
  * @var resource
  * @access private
  */
  private $curl_session;

 /**
  * Root url value
  * 
  * @var array
  * @access private
  */
  private $root_url = array(
                      'scheme' => 'http',
                      'host' => 'localhost',
                      'path' => '/');

 /**
  * All found links
  * 
  * @var array
  * @access public
  */
  public $all_links = array();

 /**
  * Allowed file types
  * 
  * @var array
  * @access private
  */
  private $accept_types = array('htm', 'html', 'php', 'php5', 'aspx');

 /**
  * Verbose spidering process
  * 
  * @var boolean
  * @access private
  */
  private $verbose = false;

 /**
  * Fetched urls
  * 
  * @var integer
  * @access private
  */
  private $fetched_urls = 0;

 /**
  * Not fetched urls
  * 
  * @var integer
  * @access private
  */
  private $not_fetched_urls = 0;

 /**
  * User agent string
  * 
  * @var string
  * @access private
  */
  private $user_agent = 'Spider website 0.1';

 /**
  * Constructor
  *
  * @param string $site as root url
  * @access public
  * @return void
  */
  public function __construct ($site = '') {
    $this->setRootURL($site);
    $this->curl_session = curl_init();
  }

 /**
  * Changes root url
  *
  * @param string $site as new root url
  * @access public
  * @return void
  */
  public function setRootURL($site) {
    if (!empty($site)) {
      $this->root_url = parse_url($site);
    }
  }

 /**
  * Changes verbose mode
  *
  * @param boolean $value as new verbose setting
  * @access public
  * @return void
  */
  public function setVerbose($value) {
    if (is_bool($value)) {
      $this->verbose = $value;
    }
  }

 /**
  * Allows file type being spidering
  *
  * @param string $extension
  * @access public
  * @return void
  */
  public function allowType($extension) {
    if (!empty($extension)) {
      if (!in_array($extension, $this->accept_types)) array_push($this->accept_types, $extension);
    }
  }

 /**
  * Restricts file type from being spidered
  *
  * @param string $extension
  * @access public
  * @return void
  */
  public function restrictType($extension) {
    if (!empty($extension) && in_array($extension, $this->accept_types)) {
      foreach ($this->accept_types as $key => $value) {
        if ($extension == $value) {
          $this->accept_types[$key] = null;
        }
      }
      $this->accept_types = array_filter($this->accept_types);
    }
  }

 /**
  * Checks if url allowed to be fetched
  *
  * @param string $url url of page, string $useragent as useragent string
  * @access private
  * @return boolean Returns true if url allowed to fetch and false if otherwise
  */
  private function _robotsAllowed ($url, $useragent=false) { 
    $parsed = parse_url($url);
    $agents = array(preg_quote('*'));
    if($useragent) {
      $agents[] = preg_quote($useragent);
    }
    $agents = implode('|', $agents);
    $robotstxt = @file('http://'.$parsed['host'].'/robots.txt');
    if(!$robotstxt) 
      return true;
    $rules = array();
    $ruleapplies = false;
    foreach($robotstxt as $line) {
      if(!$line = trim($line)) continue;
      if(preg_match('/User-agent: (.*)/i', $line, $match)) { 
        $ruleapplies = preg_match('/('.$agents.')/i', $match[1]);
      } 
      if($ruleapplies && preg_match('/Disallow:(.*)/i', $line, $regs)) { 
        if(!$regs[1]) return true;
        $rules[] = preg_quote(trim($regs[1]), '/');
      }
    }
    foreach($rules as $rule) {
      if(preg_match('/^'.$rule.'/', $parsed['path'])) return false;
    }
    return true; 
  } 

 /**
  * Prints fetching status
  *
  * @param boolean $type
  * @access private
  * @return void
  */
  private function _verboseStatus($type=false) {
    if ($this->verbose) {
      if ($type) {
        echo ' [OK]' . "\n";
        $this->fetched_urls++;
      } else {
        echo ' [Not fetched] (robots.txt rules, meta tags rules or error)' . "\n";
        $this->not_fetched_urls++;
      }
    }
  }

 /**
  * Fetches given url
  *
  * @access private
  * @return void
  */
  private function _fetchUrl($url) {
    if ($this->verbose) {
      echo 'Fetching ' . htmlentities($url);
    }

    if ($this->_robotsAllowed($url)) {
      curl_setopt($this->curl_session, CURLOPT_URL, $url);
      curl_setopt($this->curl_session, CURLOPT_USERAGENT, $this->user_agent);
      curl_setopt($this->curl_session, CURLOPT_HEADER, 0);
      curl_setopt($this->curl_session, CURLOPT_FOLLOWLOCATION, 1);
      curl_setopt($this->curl_session, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt($this->curl_session, CURLOPT_SSL_VERIFYHOST, 2);
      curl_setopt($this->curl_session, CURLOPT_SSL_VERIFYPEER, 0);
      curl_setopt($this->curl_session, CURLOPT_POST, 0);

      $result = curl_exec($this->curl_session);
      $info = curl_getinfo($this->curl_session);
      $robots = array();

      if ($info['http_code'] == 200) {
        $tags = get_meta_tags($url);
        $robots = explode(',', strtolower(str_replace(' ', '', trim($tags['robots']))));
      }

      if (!in_array('none', $robots)) {

        if (!in_array('noindex', $robots) && $info['http_code'] == 200) {
          if (!in_array($url, $this->all_links)) {
            array_push($this->all_links, $url);
          }
          $fetched=true;
        }
        $this->_verboseStatus($fetched);

        if (!in_array('nofollow', $robots) && $info['http_code'] == 200) {

          preg_match_all('/href=\"(.*)\"/imsU', $result, $matches);
          foreach ($matches[1] as $fetch_url) {
            $tmp = @parse_url($fetch_url);
            if (!empty($tmp) && $tmp['host'] == $this->root_url['host']) {
              $url = $tmp;
              $extension = pathinfo($url['path'], PATHINFO_EXTENSION);
              if (in_array($extension, $this->accept_types)) {
                if (!in_array($url, $this->all_links)) {
                  $this->_fetchUrl($url);
                }
              }
            } else if (empty($tmp['host'])) {
              if (!empty($tmp['query'])) {
                $fetch_url = substr($fetch_url, 0, strpos($fetch_url, '?'));
              }
              $tmp_file = pathinfo($fetch_url);
              if ($tmp_file['dirname'][0] == '.' || empty($tmp_file['dirname'])) {
                $url = $this->root_url['scheme'].'://>/'.$this->root_url['host'].substr($this->root_url['path'], 0, -1).substr($tmp_file['dirname'], 1).'/'.$tmp_file['basename'];
                if (!empty($tmp['query'])) {
                  $url = $url . '?' . $tmp['query'];
                }
                if (empty($tmp_file['extension']) || in_array($tmp_file['extension'], $this->accept_types)) {
                  if (!in_array($url, $this->all_links)) {
                    $this->_fetchUrl($url);
                  }
                }
              }

              if ($tmp_file['dirname'][0] == '/') {
                $url = $this->root_url['scheme'].'://>/'.$this->root_url['host'].substr($tmp_file['dirname'], 1).'/'.$tmp_file['basename'];
                if (!empty($tmp['query'])) {
                  $url = $url . '?' . $tmp['query'];
                }
                if (empty($tmp_file['extension']) || in_array($tmp_file['extension'], $this->accept_types)) {
                  if (!in_array($url, $this->all_links)) {
                    $this->_fetchUrl($url);
                  }
                }
              }
            }
          }
        }
      }
    }
  }

 /**
  * Starts spidering
  *
  * @access public
  * @return void
  */
  public function startSpider() {
    $url = $this->root_url['scheme'].'://>/'.$this->root_url['host'].$this->root_url['path'];
    if (!empty($this->root_url['query'])) {
      $url = $url.'?'.$this->root_url['query'];
    }

    if ($this->verbose) {
      echo '<pre>'.
           'Started spidering on website ' . $url . ' on ' . date('Y-m-d H:i:s', time()) . "\n";
    }

    $this->_fetchUrl($url);

    if ($this->verbose) {
      echo 'Succesfully fetched ' . $this->fetched_urls . ' urls, not fetched ' .$this->not_fetched_urls. '. Finished on '. date('Y-m-d H:i:s', time()).
           '</pre>';
    }
  }
}
?>


i got this code on the internet, on my LOCALHOST its perfectly working [only 1 website],

But when im going to spider multiple websites for example im going to indexing 3 website its not working anymore, i think i found the problem the script is not working with multiple website it can only work one website.

reference is PHP Class
http://www.phpclasse...-all-links.html

i want to make something like it can spider multiple website, or make a robot that everytime someone put a website or url in database it automatically spider the website.
Was This Post Helpful? 0
  • +
  • -

#8 Valek  Icon User is offline

  • The Real Skynet
  • member icon

Reputation: 542
  • View blog
  • Posts: 1,713
  • Joined: 08-November 08

Re: Error on CURL

Posted 12 September 2011 - 11:15 PM

cURL can be enabled and working as much as you want, but that particular option is not going to work if your host is running PHP in safe mode. It's not that there's an issue with cURL. It's just an issue with that option.

Also, here's some good information on open_basedir, which will make it obvious why that would cause issues.

If you are unsure of which if either of these are in effect on your host, you can always run a script that executes the phpinfo() function.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1