0 Replies - 566 Views - Last Post: 02 December 2010 - 09:14 PM Rate Topic: -----

#1 wch1wch2  Icon User is offline

  • New D.I.C Head

Reputation: -2
  • View blog
  • Posts: 6
  • Joined: 27-November 10

Searching appropriate webpages

Posted 02 December 2010 - 09:14 PM

This programme wants to find out some .swf document and save them. And the depth means that from original link, how many sublinks inside this link (if maxDepth=0) and so on. This is my code that input in Python:

import cgitb

import cgi
import urllib
import sys
from lxml import etree

links = []
swf_links = []
temp_links = []

## Recursion for searching links:

def find_webpage(links, maxDepth, depth):
    global swf_links
    global temp_links

    for element1 in links:
            webPage = urllib.urlopen(element1)
            html = webPage.read()
            dom = etree.HTML(html)
            for element in dom.iter("a"):
                link = element.get("href")
                if link != None and not link.startswith("#"):
                    if not link.startswith("http://") and not link.startswith("https://"):
                        link=str(element1) + '/' + str(link)
                    if not link.endswith(".swf"):    
            #- Search for the src attribute
            for element in dom.iter():
                src = element.get("src")
                if src != None and src.startswith("http://") and src.endswith(".swf"):

            # Close the connection
        except UnicodeEncodeError:

    if depth <= maxDepth:
        for element in temp_links:
            for element2 in links:
                if not (element==element2):
        find_webpage(links, maxDepth, depth+1)  

links = A list of links that want to search by users, maxDepth = Input by users of how many depths of webpages they want to search
depth = 1

But I find that there are no problems for maxDepth=0 / 1, but after 1, it starts to be cannot processed the output. Why? Can you help me to solve it to search more quickly?

Mod edit - Fixed code tags

Is This A Good Question/Topic? 0
  • +

Page 1 of 1