2 Replies - 596 Views - Last Post: 31 October 2013 - 02:08 PM Rate Topic: -----

#1 bhaktanishant  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 19-November 12

how to extract page-URL using BeautifulSoup

Posted 31 October 2013 - 07:04 AM

I know to extract tags of HTML using BeautifulSoup but I want to extract the page link.
Example:
if i have this code

import urllib2
from bs4 import BeautifulSoup
link = "http://www.dreamincode.net"
page = urllib2.urlopen(link).read()
soup = BeautifulSoup(page)


then i can extract title of page by:
title = soup.title
but i want to know that how to extract page-URL from
soup
that will be
http://www.dreamincode.net
.

This post has been edited by bhaktanishant: 31 October 2013 - 07:06 AM


Is This A Good Question/Topic? 0
  • +

Replies To: how to extract page-URL using BeautifulSoup

#2 witeboy724  Icon User is offline

  • New D.I.C Head

Reputation: 8
  • View blog
  • Posts: 42
  • Joined: 21-June 12

Re: how to extract page-URL using BeautifulSoup

Posted 31 October 2013 - 12:51 PM

I'm not sure what you're asking. You specify the URL in your code in your 3rd line. This is what BS loads. So you already have that..

Maybe you want the rest of the links on the page using something like this:
from bs4 import BeautifulSoup
import requests

url = 'dreamincode.net'

r  = requests.get("http://" +url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))



But you definitely need the page's URL before you can send that to BS, so it's not really something that needs to be read back from BS afterwards. I have experience with BeautifulSoup, so if you clarify what you need then I may be able to help
Was This Post Helpful? 1
  • +
  • -

#3 bhaktanishant  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 18
  • Joined: 19-November 12

Re: how to extract page-URL using BeautifulSoup

Posted 31 October 2013 - 02:08 PM

View Postwiteboy724, on 31 October 2013 - 12:51 PM, said:

I'm not sure what you're asking. You specify the URL in your code in your 3rd line. This is what BS loads. So you already have that..

Maybe you want the rest of the links on the page using something like this:
from bs4 import BeautifulSoup
import requests

url = 'dreamincode.net'

r  = requests.get("http://" +url)

data = r.text

soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))



But you definitely need the page's URL before you can send that to BS, so it's not really something that needs to be read back from BS afterwards. I have experience with BeautifulSoup, so if you clarify what you need then I may be able to help


Thanks for your response. You didn't understand my question, i was asking that if u don't know the link (in this case: www.dreamincode.net) that have been souped, then how will you get the link from souped object.
Anyway i get my problem solved.
THANKS

This post has been edited by bhaktanishant: 31 October 2013 - 02:09 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1