3 Replies - 365 Views - Last Post: 29 June 2013 - 12:03 AM Rate Topic: -----

#1 bee80  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-June 09

dynamically create a list of lists

Posted 22 June 2013 - 12:16 AM

Hi,
I've recently begun learning Python, and after learning the syntax and various modules I decided to challenge myself by scraping all the film data from my local cinema.
I've completed everything except one final hurdle, when i scrape the time data, my code inserts all days & times in the same list.
What i would like it to do is create a list of lists, separated by each day ,e.g:
[['Sunday', '12:00', '13:00'],['Monday','12:00','2:00']]

Below is some sample code i was testing on (apologies for length of html var), any help would be appreciated, thanks:

import urllib, re
from bs4 import BeautifulSoup

html = ''' <div class="timeslisting clearfix fRegular  f fSaturday fSunday   f2D ">
				<!-- Times for Saturday -->
	<div class="dayline clearfix fSaturday f">
		<!-- Date -->
		<div class="day"><span>Saturday</span></div>
		<!-- Times -->
		<div class="showingtimes">
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/p14BF0000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Saturday 12:00">12:00</a>
								</span>
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/p44BF0000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Saturday 14:30">14:30</a>
								</span>
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/pA7001000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Saturday 17:45">17:45</a>
								</span>
		</div><div class="cl"></div>
	</div><div class="cl"></div>
				<!-- Times for Sunday -->
	<div class="dayline clearfix fSunday f">
		<!-- Date -->
		<div class="day"><span>Sunday</span></div>
		<!-- Times -->
		<div class="showingtimes">
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/p54BF0000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Sunday 12:00">12:00</a>
								</span>
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/p24BF0000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Sunday 14:30">14:30</a>
								</span>
								<span class="tPeak">
					        
			<a href="/fanatic/booking-interactive/s8/pE7001000023FXLZZCD/" title="Peak - Book &quot;Despicable Me 2 2D&quot; on Sunday 17:45">17:45</a>
								</span>
		</div><div class="cl"></div>
	</div><div class="cl"></div>
</div>'''

#create soup
soup = BeautifulSoup(html)

#assign empty list
film_times = []

#for each day get the day and times of movies
day_list = soup.findAll('div',{'class' : 'dayline'})
for days in day_list:
    day = days.find('div', {'class': 'day'})
    film_times.append(day.string)

    times_list = days.find_all('span', {'class': re.compile(r"^(tPeak|tOffpeak|tSaverday|tMisc|)$")})
    for times in times_list:
        time = times.find('a')
        film_times.append(time.contents)
print film_times



Is This A Good Question/Topic? 0
  • +

Replies To: dynamically create a list of lists

#2 Martyr2  Icon User is offline

  • Programming Theoretician
  • member icon

Reputation: 4319
  • View blog
  • Posts: 12,101
  • Joined: 18-April 07

Re: dynamically create a list of lists

Posted 22 June 2013 - 12:40 PM

Well if I am reading this correctly, the problem is that you are trying to append the days and times into film_times list when really you should be building a new list in the for days in day_list loop and after you build the list add the LIST to the film_times list.

day_list = soup.findAll('div',{'class' : 'dayline'})

for days in day_list:
    # Create new list for this day
    newDayTimes = []

    day = days.find('div', {'class': 'day'})
    newDayTimes.append(day.string)

    times_list = days.find_all('span', {'class': re.compile(r"^(tPeak|tOffpeak|tSaverday|tMisc|)$")})
    for times in times_list:
        time = times.find('a')
        
        # Append times to new days' times as well
        newDayTimes.append(time.contents)

    # Now append this day times to film_times
    film_times.append(list(newDayTimes))

print film_times



This is untested, but I am trying to give you the idea of building the list in your loop and then at the end of the loop, when the day's times list are built, we append the list to film_times. This should give you the list of lists you are looking for.

Hopefully that works for you. :)

This post has been edited by Martyr2: 22 June 2013 - 12:45 PM

Was This Post Helpful? 0
  • +
  • -

#3 bee80  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-June 09

Re: dynamically create a list of lists

Posted 23 June 2013 - 01:56 AM

Thankyou, worked a treat.
I was trying iterating over len()'s building empty lists etc.
Seems simple once you know the answer :)
Thanks again.
Was This Post Helpful? 0
  • +
  • -

#4 bee80  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 14-June 09

Re: dynamically create a list of lists

Posted 29 June 2013 - 12:03 AM

Aaargh help please, i'm pulling my hair out!
The code above, which worked on a selection of html, when plugged into my program returns just the first day followed by all times for all days and doesnt split on the day.
Here is my full code the get_times function is the code in question:
import urllib, re
from bs4 import BeautifulSoup


# Open url and find all divs with class of filmdiv	
def html_text(url):
	htmlfile = urllib.urlopen(url).read()
	soup = BeautifulSoup(htmlfile)
	return soup.findAll('div', {'class':'filmdiv'})

# Return film title
def  get_title(x):
	return x.h1.a.contents

# Return movie classification age
def get_classification(x):
	regex = '<div class="imgclassificationuk uk-(.+?)">'
	pattern = re.compile(regex)
	return re.findall(pattern,str(x))

# Return running time of movie
def get_running(x):
	a = x.find('div', {'class': 'details'})
	regex = 'Running time: (.+?) mins\\t'
	pattern = re.compile(regex)
	return re.findall(pattern,str(a))

# Grab image url
def get_image(x):
	a = x.find('div', {'class' : 'leftside'})
	regex = 'class="film-image" src="(.+?)"/>'
	pattern = re.compile(regex)
	return re.findall(pattern,str(x))

# Grab film days & times
def get_times(x):
    film_times = []

    day_list = x.findAll('div',{'class' : re.compile(r"^(fRegular|fPlusCulture|fKids|f2D)$")})

    for days in day_list:
                    
        newDayTimes = []

        day = days.find('div', {'class': 'day'})
        if day :
                newDayTimes.append(day.string)
        
        times_list = days.find_all('span', {'class': re.compile(r"^(tPeak|tOffpeak|tSaverday|tMisc|)$")})
        for times in times_list:
            time = times.find('a')
            newDayTimes.append(time.contents)
        film_times.append(list(newDayTimes))
        return film_times

# Run the program
movie_list = html_text("http://www.odeon.co.uk/fanatic/film_times/s8/")
for movie in movie_list:
	print get_title(movie)
	#print get_classification(movie)
	#print get_running(movie)
	#print get_image(movie)
	print get_times(movie)


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1