8 Replies - 2039 Views - Last Post: 21 May 2014 - 02:02 AM Rate Topic: -----

#1 zukeru   User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 163
  • Joined: 25-December 09

How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 04:56 PM

I have a log file of net-flow data that I am trying to sort by ip address and time stamp and add the bytes. Thus, it needs to list the same ip address in descending order by byte amount.

The output of the file reads:


./R2snd/2014/02/02/02/'min'25.flows:'sourceip'100.000.000.000|101.101.101.101|0|4|3|2|'bytes'96|1391336665|1391336668|3361|445|2|6|0|0|0|0|0

the 'min' , 'sourceip', and 'bytes' dont actually exist just denoting where in the string they are.

For some reason I can only get it to display the minute but i need the whole time and date formatted. The minute is the last /number i typed minute above it. Then I need it to take every ip address in the file and sort them by ip thus repeating ips would appear together, and add the bytes send for each ip. I have tried to do this below with a dictionary but I can't seem to get it to work. Then I need to sort the dictionary in descending order by bytes, because for each ip entry it needs to add the bytes, thus the top entry for each ip will be the total bytes sent by that ip.

    import operator
    with open('/home/username/Documents/log') as f:
        for line in f:
            #save the data into an array
            firstsplitforminute = line.split('/')
            secondsplitforminute = firstsplitforminute[6].split('.')
            firstsplitforsourceip = line.split('|')
            secondsplitforsourceip = firstsplitforsourceip[0].split(':')
            minute = secondsplitforminute[0]
            sourceip = secondsplitforsourceip[1]
            bytes = line.split('|')[6]
            protocol = line.split('|')[12]
            
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(item['BYTES'] for item in entries)
                def sortbykey():
                    sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
                    print sortedbykeydict
                 sortbykey() 
            else:
                pass


however I get the following error when I run this code:

    File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
        debugger.run(setup['file'], None, None)
      File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
        sum(item['BYTES'] for item in entries)
      File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <genexpr>
        sum(item['BYTES'] for item in entries)
    TypeError: string indices must be integers, not str



Is This A Good Question/Topic? 0
  • +

Replies To: How do I Sum Dictionary in Python and Sort by Key Value

#2 Shadowys   User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 64
  • Joined: 16-May 14

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 07:30 PM

View Postzukeru, on 20 May 2014 - 04:56 PM, said:

The output of the file reads:


./R2snd/2014/02/02/02/'min'25.flows:'sourceip'100.000.000.000|101.101.101.101|0|4|3|2|'bytes'96|1391336665|1391336668|3361|445|2|6|0|0|0|0|0

the 'min' , 'sourceip', and 'bytes' dont actually exist just denoting where in the string they are.

    import operator
    with open('/home/username/Documents/log') as f:
        for line in f:
            #save the data into an array
            firstsplitforminute = line.split('/')
            secondsplitforminute = firstsplitforminute[6].split('.')
            firstsplitforsourceip = line.split('|')
            secondsplitforsourceip = firstsplitforsourceip[0].split(':')
            minute = secondsplitforminute[0]
            sourceip = secondsplitforsourceip[1]
            bytes = line.split('|')[6]
            protocol = line.split('|')[12]
            
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(item['BYTES'] for item in entries)
                def sortbykey():
                    sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
                    print sortedbykeydict
                 sortbykey() 
            else:
                pass


however I get the following error when I run this code:

    File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
        debugger.run(setup['file'], None, None)
      File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
        sum(item['BYTES'] for item in entries)
      File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <genexpr>
        sum(item['BYTES'] for item in entries)
    TypeError: string indices must be integers, not str




...
entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
sum(item['BYTES'] for item in entries)
def sortbykey():
    sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
    print sortedbykeydict
sortbykey() 



The second line. 'BYTES' is an item of entries. If you want to sum all the bytes in entries,
sum(int(byte) for byte in entries['BYTES'])



You could have seperate the jobs into different functions and does that. This sorts by bytes:
def GetDict(path):
    entries=[]
    with open(path) as f:
        for line in f:
            entry={}
            if line.split('|')[12] =='6':
                entry['min']=line.split('/')[6].split('.')[0]
                entry['ips']=line.split('|')[0].split(':')[1]
                entry['byt']=line.split('|')[6]
            entries.append(entry)
    return entries
    
def SortEntry(entries,Key):
    #Sorting key for clarity
    def addByt(entry):
        return sum(int(byt) for byt in entry[Key])
    return sorted(entries, key= lambda x : addByt(x), reverse=True)

path='/home/username/Documents/log'  
print(SortIpsByByt(GetDict(path)))




Does this help? :P
Was This Post Helpful? 2
  • +
  • -

#3 Shadowys   User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 64
  • Joined: 16-May 14

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 07:39 PM

View Postzukeru, on 20 May 2014 - 04:56 PM, said:



Oops, I missed the add same ip part. You can sort the entries and use a dict.
lister['ip']+=entry['byte']


Was This Post Helpful? 0
  • +
  • -

#4 zukeru   User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 163
  • Joined: 25-December 09

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 08:39 PM

So i tried your code and even fixed the function call but it didn't work so I just modified mine to match what I thought was going on. However I'm getting a weird key error any help?

Your code just locks my computer and I have to restart.

import operator
with open('/home/grant/Documents/log') as f:
        for line in f:
            #save the data into an array
            firstsplitforminute = line.split('/')
            secondsplitforminute = firstsplitforminute[6].split('.')
            firstsplitforsourceip = line.split('|')
            secondsplitforsourceip = firstsplitforsourceip[0].split(':')
            minute = secondsplitforminute[0]
            sourceip = secondsplitforsourceip[1]
            bytes = line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                Key = entries['IP']
                addByt = entries
                print sum(int(byt) for byt in entries[Key])
                print sorted(entries, key= lambda x : addByt(x), reverse=True)
     

This is the error i get   
pydev debugger: starting (pid: 2946)
Traceback (most recent call last):
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
    debugger.run(setup['file'], None, None)
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 18, in <module>
    print sum(int(byt) for byt in entries[Key])
KeyError: '187.162.118.218'




This post has been edited by zukeru: 20 May 2014 - 08:41 PM

Was This Post Helpful? 0
  • +
  • -

#5 zukeru   User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 163
  • Joined: 25-December 09

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 09:32 PM

from operator import itemgetter
with open('/home/grant/Documents/log') as f:
        for line in f:
            #save the data into an array
            firstsplitforminute = line.split('/')
            secondsplitforminute = firstsplitforminute[6].split('.')
            firstsplitforsourceip = line.split('|')
            secondsplitforsourceip = firstsplitforsourceip[0].split(':')
            minute = secondsplitforminute[0]
            sourceip = secondsplitforsourceip[1]
            bytes = line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                KeySum = entries['BYTES']
                Key = entries['IP']
                addByt = entries
                newlist=sorted(entries, key=itemgetter('IP'))
                print newlist

get this error
pydev debugger: starting (pid: 4147)
Traceback (most recent call last):
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
    debugger.run(setup['file'], None, None)
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 19, in <module>
    newlist=sorted(entries, key=itemgetter('IP'))
TypeError: string indices must be integers, not str




Was This Post Helpful? 0
  • +
  • -

#6 Shadowys   User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 64
  • Joined: 16-May 14

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 09:33 PM

...
Um, you're going to add bytes to an ip, then sort?
The error is because you're trying to add the IP, not the bytes.

lol you've just copied and pasted it.
It works okay here, with a test case. btw, can you give me the file to test it out?

#No idea what's this doing by itself
sum(int(byte) for byte in entries['BYTES'])
# Change to int(byte) since I noticed that only the 96 is inside the code.



should be changed to int(byte) then saved to the same key in the dict. My bad here.

btw, you have to initiate an array(list) if you're going to save things in it.

You should also seperate the logic to functions to make it clearer to yourself about the what the code is doing.
Was This Post Helpful? 0
  • +
  • -

#7 zukeru   User is offline

  • D.I.C Head

Reputation: 2
  • View blog
  • Posts: 163
  • Joined: 25-December 09

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 20 May 2014 - 11:20 PM

What is Wrong Here?

from operator import itemgetter
with open('/home/grant/Documents/log') as f:
        for line in f:
            minute=line.split('/')[6].split('.')[0]
            sourceip=line.split('|')[0].split(':')[1]
            bytes=line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                newlist = sorted(entries.items, key=itemgetter('IP'), reverse=True)                #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
                print entries



pydev debugger: starting (pid: 7571)
Traceback (most recent call last):
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
debugger.run(setup['file'], None, None)
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
newlist = sorted(entries.items, key=itemgetter('IP'), reverse=True) #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
TypeError: 'builtin_function_or_method' object is not iterable


Then i take out entries.items for just entries

from operator import itemgetter
with open('/home/grant/Documents/log') as f:
        for line in f:
            minute=line.split('/')[6].split('.')[0]
            sourceip=line.split('|')[0].split(':')[1]
            bytes=line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                newlist = sorted(entries, key=itemgetter('IP'), reverse=True)                #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
                print entries      



i get

pydev debugger: starting (pid: 7602)
Traceback (most recent call last):
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
debugger.run(setup['file'], None, None)
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
newlist = sorted(entries, key=itemgetter('IP'), reverse=True) #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
TypeError: string indices must be integers, not str

This post has been edited by zukeru: 20 May 2014 - 11:24 PM

Was This Post Helpful? 0
  • +
  • -

#8 Shadowys   User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 64
  • Joined: 16-May 14

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 21 May 2014 - 01:55 AM

View Postzukeru, on 20 May 2014 - 11:20 PM, said:

What is Wrong Here?

from operator import itemgetter
with open('/home/grant/Documents/log') as f:
        for line in f:
            minute=line.split('/')[6].split('.')[0]
            sourceip=line.split('|')[0].split(':')[1]
            bytes=line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                newlist = sorted(entries.items, key=itemgetter('IP'), reverse=True)                #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
                print entries



pydev debugger: starting (pid: 7571)
Traceback (most recent call last):
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
debugger.run(setup['file'], None, None)
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
newlist = sorted(entries.items, key=itemgetter('IP'), reverse=True) #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
TypeError: 'builtin_function_or_method' object is not iterable


Then i take out entries.items for just entries

from operator import itemgetter
with open('/home/grant/Documents/log') as f:
        for line in f:
            minute=line.split('/')[6].split('.')[0]
            sourceip=line.split('|')[0].split(':')[1]
            bytes=line.split('|')[6]
            protocol = line.split('|')[12]    
            if protocol == '6':
                entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
                sum(int(byte) for byte in entries['BYTES'])
                newlist = sorted(entries, key=itemgetter('IP'), reverse=True)                #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
                print entries      



i get

pydev debugger: starting (pid: 7602)
Traceback (most recent call last):
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
debugger.run(setup['file'], None, None)
File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
newlist = sorted(entries, key=itemgetter('IP'), reverse=True) #print 'IPs: ' + entries['IP'] + 'Bytes: ' + entries['BYTES'] + 'Time: ' + entries['MIN']
TypeError: string indices must be integers, not str


......
You didn't point entries['MIN'] to minutes
Was This Post Helpful? 0
  • +
  • -

#9 Shadowys   User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 64
  • Joined: 16-May 14

Re: How do I Sum Dictionary in Python and Sort by Key Value

Posted 21 May 2014 - 02:02 AM

My eyes missed. LOL

entries.items is a function, you should call with entries.items()
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1