6 Replies - 537 Views - Last Post: 07 February 2016 - 03:43 AM Rate Topic: -----

#1 mercury2016   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 04-February 16

group by statements in python to display unique records

Posted 04 February 2016 - 06:56 PM

Here is the code I wrote to summarise customer information by grouping on id.
df.groupby('Customerid') ['NTransactions', 'Amount'].sum() [('score')]
I need to summarize number of transactions and amount and also display unique score on the same row, no need to summarize score, so I wrote that outside the bracket. But I am getting error. How do I correct it
Is This A Good Question/Topic? 0
  • +

Replies To: group by statements in python to display unique records

#2 ndc85430   User is offline

  • I think you'll find it's "Dr"
  • member icon

Reputation: 984
  • View blog
  • Posts: 3,879
  • Joined: 13-June 14

Re: group by statements in python to display unique records

Posted 04 February 2016 - 10:18 PM

You haven't given us enough information to be able to help. Which libraries are you using, for example that let you use those functions (i.e. groupby() and sum() on an object)? What type is df? What is the error you're getting?
Was This Post Helpful? 0
  • +
  • -

#3 mercury2016   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 04-February 16

Re: group by statements in python to display unique records

Posted 05 February 2016 - 06:58 PM

Following is the info I can give:
import pandas as pd
df=pd.read_csv('customer1.csv')

df[:5]
Out[3]:
id pd1 pd2 score Amount NTrans
1 0 0 598 1111 1
1 0 0 598 2154 3
1 0 0 598 666 1
2 0 0 400 444 2
2 0 0 400 364 2

df.groupby('id') ['NTrans', 'Amount'].sum(),['pd1', 'pd2' 'score']

I am trying to display one row per customer. id, pd1, pd2, score should be displaying only one value, do not summarise, but NTrans and Amount should add up all the transactions

I am seeing this works:
df.groupby('id') ['NTrans', 'Amount'].sum()

but that is not enough. I need all the fields.
Thanks for your help
Was This Post Helpful? 0
  • +
  • -

#4 ndc85430   User is offline

  • I think you'll find it's "Dr"
  • member icon

Reputation: 984
  • View blog
  • Posts: 3,879
  • Joined: 13-June 14

Re: group by statements in python to display unique records

Posted 06 February 2016 - 10:38 AM

Hmm, I can't really help with this I'm afraid. I know nothing about Pandas, so all I can really suggest is to consult the docs :(.

Also, when posting code here, please make sure to do so within code tags and make sure it's indented sensibly.
Was This Post Helpful? 0
  • +
  • -

#5 mercury2016   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 04-February 16

Re: group by statements in python to display unique records

Posted 07 February 2016 - 02:34 AM

Doesn't need to be in pandas. If you can get me the out put as
1 0 0 598 3931 5
2 0 0 400 808 4

This means that display first 4 fields, and get the sum of ntrans and amount.
Was This Post Helpful? 0
  • +
  • -

#6 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7505
  • View blog
  • Posts: 15,553
  • Joined: 16-October 07

Re: group by statements in python to display unique records

Posted 07 February 2016 - 03:15 AM

Well, that's of a horse of a different color, isn't it?

If this is for an assignment, then presumably using some giant library you found on the interwebs isn't going to get you a pass.

Let's play around with this a little:
>>> # first put your data in the most basic python structure
... 
>>> data = [
...     [1,0,0,598,1111,1],
...     [1,0,0,598,2154,3],
...     [1,0,0,598,666,1],
...     [2,0,0,400,444,2],
...     [2,0,0,400,364,2]
... ]
>>> # now, let's just look at the first element
... # is this the key?
... data[0][:4]
[1, 0, 0, 598]
>>>
>>> # so, this is data?
... data[0][4:]
[1111, 1]
>>>
>>> # first thing you might want to do is group those as such
... xs = [ [ x[:4], x[4:] ] for x in data ]
>>> xs
[[[1, 0, 0, 598], [1111, 1]], [[1, 0, 0, 598], [2154, 3]], [[1, 0, 0, 598], [666, 1]], [[2, 0, 0, 400], [444, 2]], [[2, 0, 0, 400], [364, 2]]]
>>> # nice, now we want a unique list of the fist bit
... # the built in set will collect unique values for us
... keys = set(k for (k,_) in xs)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>>
>>> # oh, dear, that didn't work.
... # maybe we can make something from that list
... # that is hashable
... keys = set(tuple(k) for (k,_) in xs)
>>> keys
set([(1, 0, 0, 598), (2, 0, 0, 400)])
>>> # we're on our way
... # next we'd loop through our keys
... # and total up the data bit that matches
... # each key
...
>>> 



Hope this helps.
Was This Post Helpful? 0
  • +
  • -

#7 mercury2016   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 04-February 16

Re: group by statements in python to display unique records

Posted 07 February 2016 - 03:43 AM

Thank you very much, it works in some way. I thought I can give you some more detail of what I am trying to do. This is a csv file with 100,000 records. What I gave you was the structure of the dataset. I am trying to do a kmeans clustering to identify patterns. That is the reason I used those libraries. I thought I can not use the data as such, because customers repeat so many times. I was thinking, I should make one row for customer and do kmeans clustering on those dataset....I was not able to make one row..can I know what will be best way to do this, please..I appreciate your help
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1