Python: Need help cycling through regex results

***thud thud thud*** goes my head on the desk

Page 1 of 1

6 Replies - 3002 Views - Last Post: 06 July 2010 - 12:16 PM Rate Topic: -----

#1 numberwhun  Icon User is offline

  • D.I.C Head

Reputation: 23
  • View blog
  • Posts: 87
  • Joined: 28-March 08

Python: Need help cycling through regex results

Posted 06 July 2010 - 08:40 AM

Hello everyone! I am working on a Python script that does the following:

  • Connect to an ftp site
  • Change to the appropriate directory
  • Get a listing
  • Use a Regex to pull out just the file names from the listing
  • Cycle through the file names and download each one


Here is the code I have so far:


#!/usr/bin/env python

import ftplib
import re

ftp = ftplib.FTP("ftp.fu-berlin.de")
ftp.login('anonymous', 'anything')

# The directory to change to, once logged in
directory = '/pub/misc/movies/database/'

# Now to change directories
ftp.cwd(directory)

# This will handle the data being downloaded
# It will be explained shortly
def handleDownload(block):
    file.write(block)
    print ".",

data = []

# Print the contents of the directory
ftp.retrlines('LIST *.gz', data.append)

# Setup the regular expression used to match the file name from the listing
prog = re.compile('\s+(?P<name>[a-zA-Z0-9\-\.]+)$')

result = []

for line in data:
    result = prog.search(line)
    print result.group(0)



At this point, this will print the entire list of names from the site shown. (The login/pwd shown work as it is a public ftp site)

The above prints the list to show that it is there. Now, if you replace the final for loop above with the following:

for filename in result.group():
    print 'Opening local file ' + filename
    file = open(filename, 'wb')
    
    # Download the file a chunk at a time
    # Each chunk is sent to handleDownload
    # We append the chunk to the file and then print a '.' for progress
    # RETR is an FTP command
    print 'Getting ' + filename
    ftp.retrbinary('RETR ' + filename, handleDownload)
    
    # Clean up the file
    print 'Closing file ' + filename
    file.close()



You will now fine that it connects and downloads one, and only one file, the last one in the list, writers.list.gz.

Can someone PLEASE tell me what I am doing wrong? I want to cycle through the entire list and have the above for loop download each file, but I just cannot figure it out.

Thanks in advance as I really appreciate any help on this.

Regards,

Jeff

Is This A Good Question/Topic? 0
  • +

Replies To: Python: Need help cycling through regex results

#2 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 452
  • View blog
  • Posts: 796
  • Joined: 08-June 10

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 08:52 AM

Hey there! I think the following for loop might help:
for line in data:
    filename = prog.search(line).group(0).strip()
    print "Downloading %s..." % (filename)
    
    fhandle = open(filename, 'wb')
    ftp.retrbinary('RETR ' + filename, fhandle.write)
    fhandle.close()



It allows you to get rid of the handleFile function as well!
Was This Post Helpful? 2
  • +
  • -

#3 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5777
  • View blog
  • Posts: 12,592
  • Joined: 16-October 07

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 09:15 AM

How exactly does your handleDownload know about file?

Try something like:
def getbinary(ftp, filename):
	print 'Opening local file ' + filename
	file = open(filename, 'wb')
	def handleDownload(block):
		 file.write(block)
		 print ".",
	print 'Getting ' + filename
	ftp.retrbinary("RETR " + filename, handleDownload)
	print 'Closing file ' + filename
	file.close()

for line in data:
	result = prog.search(line)
	getbinary(ftp, result.group(0))



Also, your regex is probably overkill. You should be able to get the file name just by taking the last value after a split.

e.g.
# List of last values
fileNames = [ s.split()[-1] for s in data ]
for fn in fileNames:
	getbinary(ftp, fn)


Was This Post Helpful? 1
  • +
  • -

#4 numberwhun  Icon User is offline

  • D.I.C Head

Reputation: 23
  • View blog
  • Posts: 87
  • Joined: 28-March 08

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 09:46 AM

@baavgal The function came from this site. Yes, its a bit aged (like 6 years), but figured I would give it a shot.

@Motoma Thanks! I will give it a try and see if I can get it working.

Regards,

Jeff
Was This Post Helpful? 0
  • +
  • -

#5 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 452
  • View blog
  • Posts: 796
  • Joined: 08-June 10

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 10:19 AM

View Postnumberwhun, on 06 July 2010 - 10:46 AM, said:

@baavgal The function came from this site. Yes, its a bit aged (like 6 years), but figured I would give it a shot.

@Motoma Thanks! I will give it a try and see if I can get it working.

Regards,

Jeff


Good luck, I hope it works out for you.
Was This Post Helpful? 0
  • +
  • -

#6 numberwhun  Icon User is offline

  • D.I.C Head

Reputation: 23
  • View blog
  • Posts: 87
  • Joined: 28-March 08

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 11:00 AM

@motoma It worked great! I love it, I searched Google trying to find a solution and not one of the regex tutorials or even advanced documentation included anything like what you did. Do you have any links or documentation that goes further into what you did?

Is stip() a built in function or something extra as part of re module? Just curious.

Regards,

Jeff

@baavgal Interesting way to get the list of last values. I will have to play with that.

Thank you to both of you for your responses. I am (at this moment) reading up on both of your posts.

Regards,

Jeff
Was This Post Helpful? 0
  • +
  • -

#7 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 452
  • View blog
  • Posts: 796
  • Joined: 08-June 10

Re: Python: Need help cycling through regex results

Posted 06 July 2010 - 12:16 PM

View Postnumberwhun, on 06 July 2010 - 12:00 PM, said:

Is stip() a built in function or something extra as part of re module? Just curious.


strip() is a member function for the str class:
>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']



It removes all whitespace from both ends of a string:
>>> " This is a line with spaces on both ends ".strip()
'This is a line with spaces on both ends'



An added note: shy away from using the variable name 'file'. While you generally won't have issues with it, it is actually a built-in variable which is used to denote the type 'file'. For instance:

def is_a_file(f):
    return type(fhandle) == file

fhandle = open('/home/motoma/test.txt', 'wb')
if is_a_file(fhandle): print "fhandle is a file object."

file = open('/home/motoma/test2.txt', 'wb')
if is_a_file(file): print "file is not a file object?"
if not is_a_file(fhandle): print "fhandle is no longer a file object?"



Again, you will rarely run into an issue, but the danger is there.

If you like the wonderfully Pythonic piece of code @baavgai posted, you should take a look at Python Tips, Tricks, and Hacks. It covers are great range of interesting Python programming techniques, from list slicing and iteration, lambda functions, mapping, filtering, generators, and more!
Was This Post Helpful? 1
  • +
  • -

Page 1 of 1