7 Replies - 662 Views - Last Post: 22 October 2010 - 08:41 AM Rate Topic: -----

#1 duffman18  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 54
  • Joined: 20-October 10

List Deletion Help

Posted 21 October 2010 - 08:14 AM

So I am very new to programming in Python and I am in the process of teaching myself. I am using Python3.1 if that makes a difference. What I am trying to do with the program I am writing now is to remove duplicate or invalid entries from a Mailing List. The code I am having trouble with is removing all the invalid entries from the list. So what I have done is taken the entire text file that was exported from Excel. I read in each line and put the values in a dictionary with the corresponding keys as the column title(eg.: 'address1', 'address2', 'zip') and then added each dictionary into a list. I know I probably could have used more object oriented techniques, but this sounded easy enough. So here is the code that I am having trouble with:

    #Removes all the entries that have are non-US or have an invalid zip and
    #all the entries that have the "DO NOT USE" or "DON'T USE" tag and
    #all the entries that start with an '*' in address1 or address2.
    for x in List[:]:
        #make the addresses uppercase
        add1 = x['address1'].upper()
        add2 = x['address2'].upper()
        if ((add1.find('DO') > -1 and add1.find('NOT') > -1) or add1.find("DON'T") > -1) and add1.find('USE') > -1:
            i = List.index(x)
            del(List[i])
        elif ((add2.find('DO') > -1 and add2.find('NOT') > -1) or add2.find("DON'T") > -1) and add2.find('USE') > -1:
            i = List.index(x)
            del(List[i])
        elif not x['zip'].isnumeric():
            i = List.index(x)
            del(List[i])
        elif add1[:1] == '*' or add2[:1] == '*':
            i = List.index(x)
            del(List[i])
            print(List[i])  #Output the line to check validity of entries removed



What is going wrong is the printed entries, what I believe should be only the entries that start with an '*', are not the correct entries. I have tried several things in my debugging process so far, I first used a counter for the variable i instead of finding the index of the corresponding entry, I added print to the other parts of the if..elif statement, but I have always gotten the wrong entries. Now I know it is deleting the right entries because there are no invalid entries left in the new file I output from the revised list. The only problem is I don't want it to delete extra entries. Maybe I am thinking about python data structures wrong, maybe I need to get out of the C++ mindset, or maybe the upper() isn't doing what I think it is doing. I hope I gave you guys all the info you need, any help learning here would be great. Thanks.

Is This A Good Question/Topic? 0
  • +

Replies To: List Deletion Help

#2 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: List Deletion Help

Posted 21 October 2010 - 08:53 AM

Glanzing over your program I think it does what it is supposed to do.
Though it is not surprising, that your print statements dont print the deleted lines:

>>> li = ['entry0', 'entry1', 'entry2']
>>> del li[1] #lets delete entry at index 1
>>> li #the old entry at index 1 is gone, now 'entry2' is at index 1!
['entry0', 'entry2']
>>> li[1] #so li[1] returns entry at index 1, but that is 'entry2' now
'entry2'



If you want to test that the right entries are deleted you could use List.pop(i) instead of del List[i] and look at the return value:
>>> li = ['entry0', 'entry1', 'entry2']
>>> deleted_entry = li.pop(1) #pop deletes index from list and returns item at index
>>> print(deleted_entry)
entry1
>>> li[1]
'entry2'


This post has been edited by Nallo: 21 October 2010 - 08:55 AM

Was This Post Helpful? 0
  • +
  • -

#3 duffman18  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 54
  • Joined: 20-October 10

Re: List Deletion Help

Posted 21 October 2010 - 10:21 AM

Thanks for the help, it is printing out the right entries now! I can't believe I put the print() statement after the del(), I should have known that I was deleting the entry then trying to read the deleted index. Oh well, guess I am a little rusty in the programming department since I haven't written any code in over half a year. Also because I saw what I was deleting actually now I also caught that I forgot about the valid zip codes in the format 'XXXXX-XXXX' that I unintentionally removed. So I just made these changes:
        ZipCode = x['zip']  #Added
        if ((add1.find('DO') > -1 and add1.find('NOT') > -1) or add1.find("DON'T") > -1) and add1.find('USE') > -1:        
        .
        .
        elif not ZipCode[:5].isnumeric():  #Changed



Is there a better way to do this? Do I need to make that variable ZipCode to be able to use the slice operator? Thanks.

I guess I can't bold text in code brackets....

This post has been edited by duffman18: 21 October 2010 - 10:26 AM

Was This Post Helpful? 0
  • +
  • -

#4 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: List Deletion Help

Posted 21 October 2010 - 11:31 AM

ZipCode = x['zip']
ZipCode[:5].isnumeric()

is the same as
x['zip'][:5].isnumeric()


Python variables are only references to objects. In this case ZipCode and x['zip'] both reference a string object. The slice operator and the isnumeric method are methods of that string object. You don't need to create an extra reference to it with ZipCode = x['zip']. But doing so may increase readibility of the program.
Was This Post Helpful? 0
  • +
  • -

#5 duffman18  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 54
  • Joined: 20-October 10

Re: List Deletion Help

Posted 21 October 2010 - 12:37 PM

Awesome, thanks for the help Nallo.
Was This Post Helpful? 0
  • +
  • -

#6 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: List Deletion Help

Posted 21 October 2010 - 01:36 PM

A hint to make it more pythonesque.
If I were to write your programm I would separate the validity tests and the deletion code like this:
def is_valid_adress(adress):
    """returns True if adress is valid
    False otherwise
    """
    add1 = adress['address1'].upper()
    add2 = adress['address2'].upper()
    if ((add1.find('DO') > -1 and add1.find('NOT') > -1) or add1.find("DON'T") > -1) and add1.find('USE') > -1:
        return False
    # ....
    if if add1[0] == '*' or add2[0] == '*':
        return False
    return True #False tests failed, we have a valid adress

# using list comprehension to remove unwanted elements
adresslist = [adress for adress in adresslist if is_valid_adress(adress)]


Nice side effect: I got rid of dealing with error-prone mixing of indexing and deleting of elements. Plus it is more readable.

Two list comprehension examples in case you havent seen this yet in Python:
>>> li = [1, 2, 3]
>>> new_li = [i + 1 for i in li]
>>> new_li
[2, 3, 4]

>>> def even(n):
...     return n % 2 == 0
>>> li = [1, 2, 3, 4]
>>> new_li = [i for i in li if even(i)]
>>> new_li
[2, 4]


This post has been edited by Nallo: 21 October 2010 - 01:40 PM

Was This Post Helpful? 1
  • +
  • -

#7 duffman18  Icon User is offline

  • D.I.C Head

Reputation: 14
  • View blog
  • Posts: 54
  • Joined: 20-October 10

Re: List Deletion Help

Posted 22 October 2010 - 07:15 AM

Thanks for the info, I have not ran into list comprehension yet in my studying of Python. This would work great for this application. However when you do this method will it increase the amount of memory the program uses? I don't think the mailing list isn't large enough to worry about that, but for future reference should I look out for that if the list being evaluated is exceptionally large, or could I do something like this to save memory:
>>> def even(n):
	return n % 2 == 0

>>> li = [1, 2, 3, 4]
>>> li = [i for i in li if even(i)]
>>> li
[2, 4]



It works for this case obviously, but it seems like something that is risky, and not good "code etiquette" since it is altering the list it is looping through.
Was This Post Helpful? 0
  • +
  • -

#8 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: List Deletion Help

Posted 22 October 2010 - 08:41 AM

Two comments:

1)
li = [i for i in li if even(i)]

This is not risky. What Python does here: It evaluates the part to the right of the = first thereby creating a new object in memory. Only afterwards that newly created object is assigned to li.
2)
memory usage:
It is true, this method creates a copy of the original list in memory (with the not even values removed) and only afterwards marks the old list for garbage collection by the interpreter (if there were are no other references to it than li). But your version did a similar thing:
for x in List[:]:

List[:] creates a copy of List in memory which will stay there untill you are finished with the loop. And as you changed the list you had to do this copy.
Was This Post Helpful? 1
  • +
  • -

Page 1 of 1