This code currently does what it should just fine. What I've realized I need to add,however, seems complicated and I am unsure how to implement it.
The functionality I'd like to add is this: I need to make sure that the tuples that end up in the "testing.csv" file contain AT LEAST one example of each possible value for the last value on each line. In other words, I need to make sure that the testing.csv file contains tuples that contain at least one example of a tuple whose last-of-line value is "A", "S" or "I". All three must be represented in testing.csv.
Does anyone know how I could do that?
import random # randomSplitter.py # # Description: # This program creates two text files called "testing.csv" and "training.csv" # based on the file "Post-Op_Patient_Data_Set.csv" in which each line # contains exactly one instance or example of the data. # The program does so by randomly picking 80% of the original lines of the # provided text file and writing those out to "training.csv". The remaining # 20% of lines of the original file are written to the file called "testing.csv". # # Author: moerl print "### RandomSplitter ###\n" # Get input file name and define output files inFileName = "Post-Op_Patient_Data_Set.csv" trainingFileName = "training.csv" testingFileName = "testing.csv" # Open files for reading and writing inFile = open(inFileName, 'r') trainingFile = open(trainingFileName, 'w') testingFile = open(testingFileName, 'w') numInstances = 0 # Process all lines in the file. For each line, randomly determine whether it should # be moved into the N "bucket" or the 100 - N bucket (or the training file and test file, # respectively). I chose N = 80. for line in inFile: # Generate a random floating point number between 0.0 and 1.0 randNum = random.random() # Randomly assign (100 - N)% of the instances of the original file to testingFile... if randNum >= 0.8 and randNum <= 1.0: testingFile.write(line) # ... and the rest (N%) to trainingFile. else: trainingFile.write(line) # Print friendly confirmation message of a job completed;) print "Successfully split \"" + inFileName + "\" into \n\"" + trainingFileName + "\" and \n\"" + testingFileName + "\"!" # Close all open files to conclude file processing inFile.close() trainingFile.close() testingFile.close()
This post has been edited by moerl: 03 May 2008 - 06:48 AM