8 Replies - 1197 Views - Last Post: 28 August 2012 - 06:06 AM Rate Topic: -----

#1 DoMo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 02-March 11

Problem with combining two lists into a third.

Posted 26 August 2012 - 12:21 AM

Hey guys, hope you are having a good day AND are able to help me out with this little problem :D

So let me clear the air, YES this for an assignment do please do not spoon feed me. Points in the right direction would be good.

Here is what I'm doing.

Essentially I have two .CSV files. Lets say they contain this data. A combination of names and test scores.

File1: <--- base names file

john,,,,
steve,,,,
michael,,,,
alfred,,,,

File2: <--- A file with marks from certain people

john,2,,,
steve,9,,,
jessica,5,,,
bill,1,,,
alfred,3,,,

The gist is, I am supposed to make a new file combining the two. HOWEVER, since file2 has instances which are not in file1 (i.e jessica and bill), they do not get added to the new .csv file and are instead left out.

The final third .csv file which is what I am tasked to make should look like this

File3: <--- File I make


john,2,,,
steve,9,,,
michael,,,, <-- even though it didn't receive input from the other file, it remains
alfred,3,,,


I am fairly sure I am 99% of the way there, here is my current code. Yes it is going to look nooby but I have been experimenting with stuff all afternoon, even some ludicrously nubbish ideas which I knew from the start wouldn't work but I tried anyways. Note I haven't actually tried to make the third file, I just want to be able to OUTPUT what it should look like first. One problem at a time ^.^

import csv

file1 = csv.reader(open('test1.csv', 'rb'), delimiter=',')
file2 = csv.reader(open('test2.csv', 'rb'), delimiter=',')

#loop through each line and add to a list A
file1list = []
for rowfirst in file1:
    #print row[0] #prints first element
    file1list.append(rowfirst)


#loop through each line and add to a list B
file2list = []
for rowsecond in file2:
   # print row[0] #prints first element
    file2list.append(rowsecond)


# the new file 
"""
basically what I am doing here is checking if the start of a row, i.e. the [0]th element of the first file (as it is the first column) matches the [0]th element of the second file. 

e.g. alfred and alfred match, SO in my new third list, add alfred WITH HIS SCORE (or without) to the new list


OK actually upon typing this for you guys to read I realized my program won't work properly anyways 
"""

# the new file 
finallines = []
for row1 in file2list: #I think this may be troublesome because if there are more rows in the original file (stored in file1list) they will be left out :S correct me if I'm wrong.
    if row1 not in file1list: #if the row name column in the second file does not match a row name column in the first file show error, will be formalized later
        print "Error"
    else:
        finallines.append(row1) #add to the new final list
print finallines




Of the 4 or 5 different ways which I was SURE would work, I chose to present this one because I think it gets across my idea best.



Just some nudges in the right direction would be much appreciated. I think I am really close but a side of me is telling me I have made a GIGANTIC logical error.

Thankyou for reading to the end! haha ♥♥ If you need any further explanation of what I am actually asking, fire away.

Is This A Good Question/Topic? 0
  • +

Replies To: Problem with combining two lists into a third.

#2 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5883
  • View blog
  • Posts: 12,767
  • Joined: 16-October 07

Re: Problem with combining two lists into a third.

Posted 26 August 2012 - 01:49 AM

if row1 not in file1list:



But you want to compare row1[0], don't you? You don't want to compare to a list of rows, but a list of names.

I'd make another list, namesIn2. Then apply your logic:
if row1[0] not in namesIn2:


Was This Post Helpful? 1
  • +
  • -

#3 den510  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 6
  • Joined: 22-August 12

Re: Problem with combining two lists into a third.

Posted 27 August 2012 - 02:13 PM

I agree with baavgai, however I do question your naming scheme. For your project, it's a trivial point, but if you pursue a career in C.S., you'll find that accurate naming can be everything, especially if you're working on a shared project.

What caught my attention was

Quote

for row1 in file2list


There's no reason to have the 1 there, as you are scanning multiple rows.

Also, earlier, you had two for loops, using rowfirst and rowsecond as names. In python, the variables declared in the "for" line do not get carried over after the loop is terminated. The only reason to have separate variables is if you're dealing with nested for loops and you're declaring separate variables.

Just some pointers, best of luck on your project.

-Dennis
Was This Post Helpful? 0
  • +
  • -

#4 DoMo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 02-March 11

Re: Problem with combining two lists into a third.

Posted 27 August 2012 - 04:25 PM

Thanks baavgi and Dennis, I am aware of variables within for loops not carrying over like in other programming languages but somewhere else in the program I had variable instances of row, and since I was not 100% sure if python could have clashes between for loop iterators and external local variables I renamed everything out of paranoia :D

Anywho, I have everything I want 100% working.

EXCEPT for one last bit.

I have got the two lists to join into a third, and sorted this new list. Now, I am supposed to get rid of duplicates.

For example:

File1:

Steve,,,,,

File2:
Steve,12,,,,

In File 3 becomes:
Steve,12,,,,

At the moment I have this working for all instances that require replacement except for one, the very last one.

Here is my new code, yes a bit messy and unorthodox and most likely terribly inefficient.

#loop through each line and add to a list A
namesIn1 = []
file1list = []
for rowfirst in file1:
    #print rowfirst #prints rows with any extra columns if there
    file1list.append(rowfirst)
    namesIn1.append(rowfirst[0])


    



   

    
#loop through each line and add to a list B
file2list = []
#namesIn2 = []
for rowsecond in file2:
   # print rowsecond #prints rows with any extra columns if there
    file2list.append(rowsecond)
   # namesIn2.append(rowsecond[0])



# the new file 
finallines = []
for row1 in file2list:
    if row1[0] not in namesIn1:
        print row1[0]
    else:
        for a in file1list:
            for b in finallines:
                if (a[0] == b[0]):
                    file1list.remove(a)
        finallines.append(row1)

        
finallines = finallines + file1list
finallines.sort()
print finallines



The output I am receiving is

[['s222222'], ['s333333', '10'], ['s444444'], ['s555555', '7'], ['s666666'], ['s666666', '9'], ['s777777']]

(Yes those are the actual values for the assignment).

The program clearly recognized duplicates on s333333 and s555555 but hasn't made the changes on s666666.

Any particular reason? Is the loop cutting short?

Thanks for any assistance.
Was This Post Helpful? 0
  • +
  • -

#5 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5883
  • View blog
  • Posts: 12,767
  • Joined: 16-October 07

Re: Problem with combining two lists into a third.

Posted 27 August 2012 - 05:17 PM

Based on your output, I believe I can figure out what you two lists might look like.

Here's a quick and dirty example.
>>> data1 = [ ['s222222'], ['s444444'], ['s666666'], ['s777777'] ]
>>> data2 = [ ['s333333', '10'], ['s555555', '7'], ['s666666', '9'] ]
>>> namesIn2 = [ row[0] for row in data2 ]
>>> namesIn2
['s333333', 's555555', 's666666']
>>> data2Add = [ row for row in data1 if row[0] not in namesIn2 ]
>>> data2Add
[['s222222'], ['s444444'], ['s777777']]
>>> data2.extend(data2Add)
>>> data2.sort()
>>> data2
[['s222222'], ['s333333', '10'], ['s444444'], ['s555555', '7'], ['s666666', '9'], ['s777777']]



Basically, namesIn2 is a list of keys. It's used to get a list of rows from data1 that's not in that key list. That list is then simply added to the other list.

I could be missing something, but I don't think so. If you're unfamiliar with the syntax, it's called a list comprehension. Very common in python. You can do the same with your methodology of declaring and looping. e.g.
>>> namesIn2 = [ ]
>>> for row in data2:
...     namesIn2.append(row[0])
... 
>>> namesIn2
['s333333', 's555555', 's666666']
>>> 



Hope this helps.
Was This Post Helpful? 1
  • +
  • -

#6 DoMo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 02-March 11

Re: Problem with combining two lists into a third.

Posted 27 August 2012 - 07:31 PM

Hmm yes that is actually quite helpful, shows me another way of approaching the problem that looks much simpler and shorter :D and yes you figured out what the lists should look like ^_^

However, that added, forgive me for maybe misunderstanding but I feel my issue resides in this section of code:

    else:
        for a in file1list:
            for b in finallines:
                if (a[0] == b[0]):
                    file1list.remove(a)
        finallines.append(row1)  



Specifically the .remove part. It IS removing the duplicate instances of s3... and s5... but not s6..

I know that my logic is correct because it is working for the other instances but why not for the final one?
Was This Post Helpful? 0
  • +
  • -

#7 DoMo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 02-March 11

Re: Problem with combining two lists into a third.

Posted 28 August 2012 - 01:04 AM

I GOT IT WORKING YES

Thankyou so much for your help Baavgi ♥♥
Was This Post Helpful? 0
  • +
  • -

#8 DoMo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 8
  • Joined: 02-March 11

Re: Problem with combining two lists into a third.

Posted 28 August 2012 - 04:47 AM

Ok actually Baavgi, I spoke a bit too soon. APPARENTLY Python decided not to save correctly, and I have tried for the past 1.5 hours to remember what I wrote exactly. Here is what I am at so far.

#other try
import csv

file1 = csv.reader(open('test1.csv', 'rb'), delimiter=',')
file2 = csv.reader(open('test2.csv', 'rb'), delimiter=',')

data1 = []
data2 = []
for row in file1:
    data1.append(row)

for row in file2:
    data2.append(row)


namesIn2 = [ row[0] for row in data2]
namesIn1 = [row [0] for row in data1]
print namesIn2

#data2Add = [ row for row in data1 if row[0] not in namesIn2 ]
#print data2Add

data2Add = []
for row in data1:    
    if row[0] not in namesIn2:
        data2Add.append(row)

#this is outputting the numbers in data1 which do not share any names
#with those in file2
#so, they are files that will not be updated, so they stay, i.e. 2 4 and 7

print data2Add
        
data2.extend(data2Add)
data2.sort()
print data2



My output is

['s555555', 's333333', 's666666', 's111111', 's999999']
[['s222222'], ['s444444'], ['s777777']]
[['s111111', '10'], ['s222222'], ['s333333', '10'], ['s444444'], ['s555555', '7'], ['s666666', '9'], ['s777777'], ['s999999', '9']]



So I understand that the second line, s22222, s44444 etc. etc. are the numbers which do not appear in the namesIn2 AND then you are combining the data2add with the data2 file. HOWEVER, I want to get RID of the s111111 and s999999 because neither of them appear in the original FIRST file, data1. Everything else is correct otherwise.

This may sound like some kind of cheaters excuse but I DID get the final result, I celebrated, put it in its own folder but must of copy pasted the wrong file across and didn't save OR because of the various hanging pythonw.exe processes in the background did not save properly and got overwritten.

Any further nudges would be great. I am fairly sure in my original version I actually reversed most of what you did, I just don't remember it all. Most frustrating night ever.

Thanks.
Was This Post Helpful? 0
  • +
  • -

#9 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5883
  • View blog
  • Posts: 12,767
  • Joined: 16-October 07

Re: Problem with combining two lists into a third.

Posted 28 August 2012 - 06:06 AM

Then adding data to data2 is probably the wrong approach. Start with an empty list, data3.

Simply, loop through data1. If row[0] is in namesInData2, add the row from data2 to data3. Otherwise, add the row from data1 to data3.

You only need one namesId list, depending on your approach.

Use functions, don't repeat yourself.
e.g.
def getData(filename):
	file = csv.reader(open(filename, 'rb'), delimiter=',')
	return [ row for row in file ]

data1 = getData('test1.csv')
data2 = getData('test2.csv')



You might consider using a dict. It would buy you most of what you want for free.
def getData(filename):
	file = csv.reader(open(filename, 'rb'), delimiter=',')
	return dict([ (row[0], row) for row in file ])


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1