Hi you all,
Im new to Python language.
First I wish to say that when I tried to search my problem, an error occurred.
QUOTE
An error occurred!
Error: HTTP Error: Unsupported HTTP response status 502 Bad Gateway (soapclient->response has contents of the response)
My problem is:
Im working on a app. that will get a url of a site and it will search
for alll kinds of links (http://,news://,ftp://,www.) at that site and print them.
I tried those functions:
CODE
def getSource(Host,Path):
file = urllib.urlopen("http://" + Host + Path);
text = file.read();
return text;
def seekLinks(source):
ex = "[http://|www.|ftp://|news://].[\.htm|\.com]";
r = re.compile(ex,re.DOTALL | re.IGNORECASE);
for item in re.findall(r, source):
print item;
I successed in getting the Source code,
however when I ran the "seekLinks" function I got
lot of results that contained only 3 characters.
For example:
CODE
e-M
nSt
nam
nam
wmo
ent
nam
src
htt
na.
s/C
eat
s/O
tec
e-M
nSt
wmo
ent
nam
wSc
tAc
sam
eDo
tio
sho
htt
ww.
.co
/go
eCo
nec
/em
ect
/bo
/ht
Is my Regex code is wrong?
Waiting for help - Thank to the helper.