large data sorting in python for text file

large data sorting in python for text file

Page 1 of 1

3 Replies - 6383 Views - Last Post: 30 July 2009 - 01:06 PM Rate Topic: -----

#1 oceanic71  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 30-July 09

large data sorting in python for text file

Post icon  Posted 30 July 2009 - 06:09 AM

hi everyone;

i am grad student and programming is not my major at least for now...
as a starting point i am trying to sort my correlation data results (try to sort the lines according to number at the end)...

my source is arranged like

kasdasfq asgsgdfg 1.00239502396
jskjdfgks dfsgksdj 1.92349209420

that and present in txt file about 18 .000 .000 lines ...

i use built in sort method with some modifications but my data file is so big (more than 550 mb) and long... and it stops after a while becouse of inssufficient memory..
and i am using a a pretty strong computer of university...
i can do file handling and other stuff...

so i will be glad if you share your experince with me to overcome to my problem...

thanks...

ps : at least recommend me low memory required sorting methods...

Is This A Good Question/Topic? 0
  • +

Replies To: large data sorting in python for text file

#2 code_m  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 24
  • View blog
  • Posts: 202
  • Joined: 21-April 09

Re: large data sorting in python for text file

Posted 30 July 2009 - 08:46 AM

I would try to use the io Library... I don't have my book with me, but the io has a file handler in it that can be treated as a string.

Let me look up some more info, as I'm on a windows machine without Python (*cries*).
Was This Post Helpful? 0
  • +
  • -

#3 Nallo  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 163
  • View blog
  • Posts: 255
  • Joined: 19-July 09

Re: large data sorting in python for text file

Posted 30 July 2009 - 12:02 PM

Hello Oceanic,

View Postoceanic71, on 30 Jul, 2009 - 05:09 AM, said:

ps : at least recommend me low memory required sorting methods...


Heapsort needs only memory for data and a constant additional amount. Unlike quicksort which needs (somewhat uncontrollable) additional memory for each recursion step. Heapsort is also guaranteed to be O(nlog(n)) and as a well known algorithm you will surely find a Python implementation on the web. Google for it.

This post has been edited by Nallo: 30 July 2009 - 12:03 PM

Was This Post Helpful? 0
  • +
  • -

#4 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5800
  • View blog
  • Posts: 12,636
  • Joined: 16-October 07

Re: large data sorting in python for text file

Posted 30 July 2009 - 01:06 PM

550MB isn't all that big... still.

Read in a buffer's worth of lines, say 100,000, sort them, and write them to a file. Keep doing this until you have a whole lot of files and nothing left to read. Now read in two files, a line at a time and output them in order to a third file. Delete the two you read when you're done, leaving the file that was the result. Keep doing this until you have one file left and you're done.

Here you're using your filesystem as your memory. If you can store it, you can sort it. No, it's not fast.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1