1 Replies - 628 Views - Last Post: 18 November 2013 - 12:12 PM Rate Topic: -----

#1 jsewell94   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 16
  • Joined: 25-April 09

Parsing Data From An Input File: Correct Approach?

Posted 18 November 2013 - 11:52 AM

Hey all,

A few days ago we were given an assignment that involves parsing. Basically, we are given a document containing course information for different classes and we have find a way to extract the course ID, course description, credit hour amount, and the professor and store it in a linked list. At this point, I'm very comfortable with the linked list portion as we've been doing ADTs for a while now. But our professor/TAs didn't give us a lot of guidance on how to approach parsing. I didn't know if you could give me input on whether or not I'm even remotely close to doing this correctly? I feel like I'm not :/

Here is the input text document that we are supposed to use:
SEE INDIVIDUAL COURSE FOR DEPARTMENTAL
  WEB SITE, FACULTY PROFILES, COURSE
  DESCRIPTIONS OR SEE AVAILABLE COURSE SYLLABI.
 ************************************************
                                                 
 COSC-010  INTRO TO COMPUTER SCIENCE           3 
   available seating
   01  LEC T 1:15-2:30 REI 112             Miles A
       LAB M 4:15-5:30 STM 343
      CROSS LIST: COSC-501-01                            
 COSC-011  INTRO TO INFORMATION PRIVACY        3 
   available seating
   01  LEC M 4:10-6:30 WGR 203             Moran E
      PREFERENCE GIVEN TO BSFS STUDENTS          
 COSC-012  INTRO TO MEDIA COMPUTING            3 
   available seating
   01  LEC MW 11:40-12:55 STM 343  Kalyanasundaram
 COSC-014  FUNDAMENTALS OF TECHNOLOGY          3 
   available seating
   01  *** CANCELLED ***
 COSC-071  COMPUTER SCIENCE I                  3 
   available seating
   01  LEC TR 1:15-2:30 WAL 492            Singh L
      CROSS LIST: COSC-502-01                    
 COSC-072  COMPUTER SCIENCE II                 3 
   available seating
   01  LEC TR 11:40-12:55 ICC 107         Maloof M
      Pre-requisite: COSC-071                    
      CROSS LIST: COSC-503-                      
 COSC-127  MATH METHODS FOR COMP SCI           3 
   available seating
   01  LEC MW 1:15-2:30 STM 343           Bansal A
      CROSS LIST: COSC-506-01                    
 COSC-288  INTRO TO MACHINE LEARNING           3 
   available seating
   01  LEC TR 4:15-5:30 REI 264           Maloof M
      Pre-requisite: COSC-173                    
 ACCT-001  PRINCIPLES OF ACCOUNTING            3 
   available seating
   01  LEC MW 1:15-2:30 WAL 398           Baisey J
      NON-MSB STUDENTS ONLY                                     
 ACCT-102  ACCOUNTING II                       3 
   available seating
   01  LEC MW 10:15-11:30 WAL 395        Galasso M
      MSB STUDENTS ONLY                          
      SPRING ONLY COURSE                         
      PREREQ: ACCT-101                           
       STUDENT DAY      LAB TIMES         ROOM   
        A-K     T        6:40 P- 7:30 P  WAL 498 
        L-Z     T        7:40 P- 8:30 P  WAL 498 
 ACCT-181  BUSINESS LAW I                      3 
   available seating
   01  LEC WF 8:50-10:05 WGR 211           Cooke T
      MSB SOPHOMORES ONLY                        
BIOL-008  ECOLOGY & THE ENVIRONMENT           3 
   available seating
   01  LEC TR 10:15-11:30 REI 262            Sze P
 BIOL-009  BIOLOGY OF DRUGS & PEOPLE           3 
   available seating
   01  LEC TR 2:40-3:55 REI 262          Russell J
           BIOLOGY OF DRUGS & PEOPLE           4 
   02  LEC TR 2:40-3:55 REI 262          Russell J
      ONE HOUR COMMUNITY OUTREACH                
 BIOL-104  INTRODUCTORY BIOLOGY II             4 
   available seating
   01                        Hamilton M, Johnson E
       LEC TRF 8:50-10:05 REI 103
       LAB M 1:15-4:05 REI 459
       REC W 6:15-7:30PM REI 103
      Undergrads need instructor permission      
      Fees $145                                      
 BIOL-151  BIOLOGICAL CHEMISTRY                4 
   available seating
   01                         Rosenwald A, Tilli M
       LEC MWF 11:15-12:05 REI 103
       LAB T 1:15-4:05 REI 459
      Fees $145                                             
 BIOL-194  GATEWAY:BIOL OF GLOBAL HEALTH       4 
   available seating
   01                     Elmendorf H, Rosenwald A
       LEC TR 10:15-11:30 REI 261A
       REC M 1:15-4:05 REI 439
 BIOL-195  GATEWAY: NEUROBIOLOGY               4 
   available seating
   01  LEC MWF 9:15-10:05 REI 262       Donoghue M
       REC M 2:15-3:05 REI 284
      STUDENTS WHO HAVE TAKEN BIOL 370 ARE NOT   
      ELIGIBLE TO TAKE THIS COURSE.              
      RESTRICTED TO NEUROBIOLOGY MAJORS          




According to our sample output, the user first enters a course ID that he would like to add and I have to verify whether or not the string he entered was a valid course ID. This is what I did for that:

int FindID(FILE* fp, char* id)
{
	char current;
	int counter = 0;
	int found = 0;
	while((current = fgetc(fp)) != EOF && found == 0) //Scanning through the file
	{
		if(current == id[counter])
		{
			counter++;
		}
		
		else
		{
			counter = 0;
		}
		
		if(counter == 8 && id[counter] == '\0')
		{
			found = 1;
		}
	}
	
	return found;
}



I scan through the file, character by character, and see if I can find a string of 8 characters that matches the string entered by the user (id. This portion works correctly, but it feels inefficient to me. Also, when I looked at our prelab for tomorrow (which is also over parsing), it specifies that we need to run through the file line by line. How would I approach this by going through the file line by line?

The next portion of my code attempts to extract the appropriate course data from the text file. I asked my TA, and he said that the credit hour amount would always come after the course information, which in turn would always come after the course ID. So I figured that I could just scan through the rest of the line of the file and store this as the course information, until I hit a number. Once I hit a number, I can store this value as the credit hour amount. Here is the code for that portion:

void getData(FILE* fp, char* description, int* credits, char* teacher)
{
	char current;
	int i = 0;
	
	while((current = fgetc(fp)) != EOF)
	{
		if(isalpha(current) && isupper(current))
		{
			description[i] = current;
			i++;
		}
		
		if(isnum(current))
		{
			*credits = current - '0';
			break;
		}
		
	}
.
.
.



Last, I need to extract the professor's name. I don't think there is a consistent way to determine the professor based on its location relative to the course ID. However, I noticed that all of the professor's names have the first letter capitalized and the rest lower case until the last initial which is also capitalized. Also, if there is more than 1 professor, the last initial is follow by a comma. I started to write this using a similar method as before, but I started to get a little confused :( And I keep having this feeling that there is much better way to do all of this.

So I guess what I'm asking in short is..how do people typically approach parsing? I'm not asking for a homework solution..just maybe some example code or resources.

Thanks!

Is This A Good Question/Topic? 0
  • +

Replies To: Parsing Data From An Input File: Correct Approach?

#2 jsewell94   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 16
  • Joined: 25-April 09

Re: Parsing Data From An Input File: Correct Approach?

Posted 18 November 2013 - 12:12 PM

Just kidding, I figured this out. I just drank some coffee and solved the issue using fscanf :P This can be deleted.

Thanks anyway!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1