Welcome to Dream.In.Code
Getting Help is Easy!

Join 86,261 Programmers. There are 1,912 online right now! Ask your question and get quick answers from Dream.In.Code experts. Join the #1 programming help community on the internet! Registration is fast and FREE... Join Now!

Chat LIVE With a Expert
Powered by LivePerson.com

Register to Make This Box Go Away!

How to make sure specific lines in a text file are selected?

 
Reply to this topicStart new topic

How to make sure specific lines in a text file are selected?, The problem is based on very specific code (not long but provided)

moerl
post 3 May, 2008 - 04:54 AM
Post #1


New D.I.C Head

*
Joined: 2 Jun, 2006
Posts: 29



This code serves a very specific purpose and is designed, as you can see, to act on a specific file. Obviously it could be modified so that it could take any text file, but that is not required. This link shows the contents of the input file referenced in the code: http://archive.ics.uci.edu/ml/machine-lear...-operative.data

This code currently does what it should just fine. What I've realized I need to add,however, seems complicated and I am unsure how to implement it.

The functionality I'd like to add is this: I need to make sure that the tuples that end up in the "testing.csv" file contain AT LEAST one example of each possible value for the last value on each line. In other words, I need to make sure that the testing.csv file contains tuples that contain at least one example of a tuple whose last-of-line value is "A", "S" or "I". All three must be represented in testing.csv.

Does anyone know how I could do that?

CODE
import random

# randomSplitter.py
#
# Description:
# This program creates two text files called "testing.csv" and "training.csv"
# based on the file "Post-Op_Patient_Data_Set.csv" in which each line
# contains exactly one instance or example of the data.
# The program does so by randomly picking 80% of the original lines of the
# provided text file and writing those out to "training.csv". The remaining
# 20% of lines of the original file are written to the file called "testing.csv".
#
# Author: moerl

print "### RandomSplitter ###\n"

# Get input file name and define output files
inFileName = "Post-Op_Patient_Data_Set.csv"
trainingFileName = "training.csv"
testingFileName = "testing.csv"

# Open files for reading and writing
inFile = open(inFileName, 'r')
trainingFile = open(trainingFileName, 'w')
testingFile = open(testingFileName, 'w')

numInstances = 0

# Process all lines in the file. For each line, randomly determine whether it should
# be moved into the N "bucket" or the 100 - N bucket (or the training file and test file,
# respectively). I chose N = 80.
for line in inFile:
    # Generate a random floating point number between 0.0 and 1.0
    randNum = random.random()
    
    # Randomly assign (100 - N)% of the instances of the original file to testingFile...
    if randNum >= 0.8 and randNum <= 1.0:
        testingFile.write(line)
    # ... and the rest (N%) to trainingFile.
    else:
        trainingFile.write(line)

# Print friendly confirmation message of a job completed;)
print "Successfully split \"" + inFileName + "\" into \n\"" + trainingFileName + "\" and \n\"" + testingFileName + "\"!"

# Close all open files to conclude file processing
inFile.close()
trainingFile.close()
testingFile.close()


This post has been edited by moerl: 3 May, 2008 - 06:48 AM
User is offlineProfile CardPM
Go to the top of the page
+Quote Post


Fast ReplyReply to this topicStart new topic
Time is now: 5/16/08 10:14AM

Live Help!

Tutorials

Programming

Web Development

Reference Sheets

Code Snippets

Bye Bye Ads

Free DIC T-Shirt

T-Shirt Example

Related Sites

Monthly Drawing

Thumb Drive

Partners

Top Contributors

Top 10 Kudos This Month