1 Replies - 235 Views - Last Post: 10 October 2017 - 12:21 PM Rate Topic: -----

#1 erburrell  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 10
  • View blog
  • Posts: 147
  • Joined: 22-December 09

Newbie Code Review

Posted 10 October 2017 - 09:12 AM


I am new to python and no expert at coding, so I was wondering if I could get a code review/suggestions. The code below is for building a dataframe (just the beginnings) similar to what you might find in R or by using pandas. I realize that using pandas would be the most likely route and make better use of a seasoned programmer's time, but I am just learning the language, and decided to build something at least a little useful while doing it.

The code runs fine with small data sets, with the errors noted in comments.

Any suggestions or corrections are welcome.



import myUtils as mu
from math import sqrt

Dataframe class to implement dataframe like attributes and methods.
Subclasses the Dict class.

class dataframe(dict):
    """ Init method """
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __getitem__(self, key):
        val = dict.__getitem__(self, key)
        return val
    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).items():
            self[k] = v

#    Creates the dataframe from a given cvs file.
#    inputs: filename as string
    def create_from_csv(self, filename):
        # TODO:// Error Handling - File not present...

        # expecting fields in first row.  Set up list for keys
        keys = []

        # Open File for reading.
        with open(filename) as ef:
            i = 0
            for line in ef:
                if (i == 0):      # First line has field names use as dict keys
                    for item in line.split(','):
                        self[item.rstrip()] = []
                    j = 0
                    for key in keys:
                        strVal = line.split(',')[j]

                        # Check for int or float to convert information.
                        if (mu.isInt(strVal.rstrip())):
                        elif (mu.isFloat(strVal)):
                        j += 1
                i += 1

#   Creates a data frame from a dicionary.
#   Assumes all dimensions are correct at the moment.
    def create_from_dict(self, d):
        for key in d.keys():
            if key not in self.keys(): self[key] = []
            for item in d[key]:

#   Provides a method to access the entire data frame
    def getDataFrame(self):
        return self

#   Returns a given row from the dataframe as a dictionary object.
    def row(self, r):
        # TODO:// Error Handling: What happens when r is > length.

        row = {}
        for key in self.keys():
            row[key] = self[key][r]
        return row

#   Returns the headers as a list.
    def headers(self):
        headerList = []
        for key in self.keys():
        return headerList

#    Returns a tuple providing the numbers of rows and columns (r, c)
    def shape(self):
        for key in self.keys():
            rows = len(self[key])

        return (rows, len(self))

#   Returns a dictionary object that represents the first 5 elements of
#   the data frame
    def head(self):
        # TODO:// What if there are fewer than 5 items in the frame?
        tempDf = {}
        for key in self.keys():
            tempDf[key] = self[key][:5]

        return tempDf

#   Converts a field from one type to another.
    def convert(self, field, type):
        if type == 'int':
            count = 0
            for x in self[field]:
                    self[field][count] = int(self[field][count])
                except ValueError:
                    self[field][count] = 'NaN'
                count += 1
        # TODO:// Add other conversions.

#   drops a row from the data frame
    def drop(self, index):
        for key in self.keys():
            del self[key][index]

#   Sorts the entire data frame by a specific field
    def sort_by_index(self, field):

#   Takes a given field and strips all non numerics out, replacing them with
#   None
    def clean_num_field(self, field):
        i = 0
        for item in self[field]:
            if (mu.isNum(item)):
                self[field][i] = None
            i += 1

#   Sums a column of data
    def col_sum(self, field):
        i = 0
        total = 0.0
        while i < len(self[field]):
                total += self[field][i]
            i += 1
        return total

#   Calculate the average of a dataframe field
    def col_avg(self, field):
        i = 0
        total = 0.0
        count = 0
        while i < len(self[field]):
                total += self[field][i]
                count += 1
            i += 1
        return total/count

#   Calculation of Standard deviation on a data column
#   Assumes a cleaned column
    def std_dev_sample(self, field):
        avg = self.col_avg(field)
        total = 0
        count = 0
        i = 0
        while i < len(self[field]):
            if (mu.isNum(self[field][i])):
                total += (self[field][i] - avg) ** 2
                count += 1
            i += 1

        return sqrt(total) / (count - 1)

#   Count based on criteria by field  Send list of tupples('field', 'cond', 'value')
    def count_by_field(self, args):
        # TODO:// This counts to many.  Fix!!
        conditions = args
        count = 0
        for item in self[conditions[0][0]]:
            if item != None:
                if conditions[0][1] == '==':
                    if item == conditions[0][2]:
                        count += self.count_by_field(conditions[1:])
                elif conditions[0][1] == '!=':
                    if item != conditions[0][2]:
                        count +=1
                elif conditions[0][1] == '<=':
                    if item <= conditions[0][2]:
                        count +=1
                elif conditions[0][1] == '>=':
                    if item >= conditions[0][2]:
                        count +=1
                elif conditions[0][1] == '<':
                    if item < conditions[0][2]:
                        count +=1
                elif conditions[0][1] == '>':
                    if item > conditions[0][2]:
                        count +=1
            count = count + 1
        return count

#   Set a list of items for a categorical data column.
    def set_categorical(self, field, cats = None):
        # Check to see if categorical dictionary exists.
            self.cats = {}
        # Assume that if the field is a key in cats, the field has already 
        # been set as categorical in the past.
        if field in self.cats.keys(): return
        self.cats[field] = []
        # if no args are provided, find categories from data:
        if cats == None:
            # Initialize and copy all values into the list only once.
            for item in self[field]:
                if item not in self.cats:
            for item in cats:
#   Privedes counts from a column with categorical data
    def counts(self, field):
        # TODO:// build this...
        if field not in self.cats.keys(): 
            # TODO:// should throw exception...
        counts = {}
        for item in self[field]:
                counts[item] += 1
                counts[item] = 1
        for item in self.cats[field]:
            if item not in counts.keys():
                counts[item] = 0
        msg = ''
        for key in counts.keys():
            msg += key + ' : ' + str(counts[key]) + '\n'
        return msg

Is This A Good Question/Topic? 0
  • +

Replies To: Newbie Code Review

#2 andrewsw  Icon User is offline

  • the case is sol-ved
  • member icon

Reputation: 6375
  • View blog
  • Posts: 25,756
  • Joined: 12-December 12

Re: Newbie Code Review

Posted 10 October 2017 - 12:21 PM

Does the code run fine or not? If there are errors then post the error details. If the code works "sometimes" then you need to describe for which input it works, and for which it doesn't work.

The code either works, or doesn't work. If it doesn't work then we/you need to discover why it isn't working.

"# TODO:// This counts to many. Fix!!"

That isn't descriptive enough to receive help (aside from the grammatical error).
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1