Page 1 of 1

A Simple Extension to the Python CSV Module

#1 Motoma   User is offline

  • D.I.C Addict
  • member icon

Reputation: 452
  • View blog
  • Posts: 798
  • Joined: 08-June 10

Posted 08 June 2010 - 09:04 AM

One of the side effects of working with database driven software is that you eventually find yourself needing to pull in large amounts of information from old and terrible systems. When talking to your counterparts on the other side of the line (the inter-company line, that is), you will invariably be told that you will only receive your data in one of a few straight forward formats. What follows is a small extension to Python's CSV object which streamlines the process of coding these data transformations.
Simply put, Python's CSV object is more than enough to handle any conversion; however, in order to make my life even simpler, I made the following wrapper:

#! /usr/bin/env python
import csv
class CSVFileHandler:
   def __init__(self, filename):
   def __del__(self):
   def open(self, filename):
      self.file = open(filename, 'r')
      self.reader = csv.reader(self.file)
   def close(self):
   def process(self, function, args):
      for row in self.reader:
         function(row, args)

Here, the process accepts a function object, and an argument (tuple most likely). The CSVFileHandler object will execute the function on each line of the file, making the main loop of your processing code a meek 3 lines long. Take, for example, the following code:

#! /usr/bin/env python
import sys
from CSVFileHandler import CSVFileHandler
def main(ifile, ofile):
   writer = open(ofile, 'w')
   def processFile(line, args):
      query = """INSERT INTO `trans` (`trans_code_id`, `generic_descriptor`, `amount`, `personal_information`, `related_id`, `created`, `last_modified`)
         (SELECT %s, '%s', %s, '%s', c.client_id, '%s', '%s' FROM client c WHERE c.username = '%s');""" % (line[1], line[0], line[14], line[14], line[3], line[32], line[4])
      print>>writer, query
   handler = CSVFileHandler(ifile)
   handler.process(processFile, ())
def usage():
   print " <csv filename> <sql output filename>\n"
if __name__ == "__main__":
   if len(sys.argv) == 3:
      main(sys.argv[1], sys.argv[2])

The code listing above quickly parses through that old CSV file and leaves with with a shiney new SQL data file to load into your database.

This tutorial--and others--was originally posted at my professional site,

Is This A Good Question/Topic? 0
  • +

Page 1 of 1