A few days ago we were given an assignment that involves parsing. Basically, we are given a document containing course information for different classes and we have find a way to extract the course ID, course description, credit hour amount, and the professor and store it in a linked list. At this point, I'm very comfortable with the linked list portion as we've been doing ADTs for a while now. But our professor/TAs didn't give us a lot of guidance on how to approach parsing. I didn't know if you could give me input on whether or not I'm even remotely close to doing this correctly? I feel like I'm not
Here is the input text document that we are supposed to use:
SEE INDIVIDUAL COURSE FOR DEPARTMENTAL
WEB SITE, FACULTY PROFILES, COURSE
DESCRIPTIONS OR SEE AVAILABLE COURSE SYLLABI.
************************************************
COSC-010 INTRO TO COMPUTER SCIENCE 3
available seating
01 LEC T 1:15-2:30 REI 112 Miles A
LAB M 4:15-5:30 STM 343
CROSS LIST: COSC-501-01
COSC-011 INTRO TO INFORMATION PRIVACY 3
available seating
01 LEC M 4:10-6:30 WGR 203 Moran E
PREFERENCE GIVEN TO BSFS STUDENTS
COSC-012 INTRO TO MEDIA COMPUTING 3
available seating
01 LEC MW 11:40-12:55 STM 343 Kalyanasundaram
COSC-014 FUNDAMENTALS OF TECHNOLOGY 3
available seating
01 *** CANCELLED ***
COSC-071 COMPUTER SCIENCE I 3
available seating
01 LEC TR 1:15-2:30 WAL 492 Singh L
CROSS LIST: COSC-502-01
COSC-072 COMPUTER SCIENCE II 3
available seating
01 LEC TR 11:40-12:55 ICC 107 Maloof M
Pre-requisite: COSC-071
CROSS LIST: COSC-503-
COSC-127 MATH METHODS FOR COMP SCI 3
available seating
01 LEC MW 1:15-2:30 STM 343 Bansal A
CROSS LIST: COSC-506-01
COSC-288 INTRO TO MACHINE LEARNING 3
available seating
01 LEC TR 4:15-5:30 REI 264 Maloof M
Pre-requisite: COSC-173
ACCT-001 PRINCIPLES OF ACCOUNTING 3
available seating
01 LEC MW 1:15-2:30 WAL 398 Baisey J
NON-MSB STUDENTS ONLY
ACCT-102 ACCOUNTING II 3
available seating
01 LEC MW 10:15-11:30 WAL 395 Galasso M
MSB STUDENTS ONLY
SPRING ONLY COURSE
PREREQ: ACCT-101
STUDENT DAY LAB TIMES ROOM
A-K T 6:40 P- 7:30 P WAL 498
L-Z T 7:40 P- 8:30 P WAL 498
ACCT-181 BUSINESS LAW I 3
available seating
01 LEC WF 8:50-10:05 WGR 211 Cooke T
MSB SOPHOMORES ONLY
BIOL-008 ECOLOGY & THE ENVIRONMENT 3
available seating
01 LEC TR 10:15-11:30 REI 262 Sze P
BIOL-009 BIOLOGY OF DRUGS & PEOPLE 3
available seating
01 LEC TR 2:40-3:55 REI 262 Russell J
BIOLOGY OF DRUGS & PEOPLE 4
02 LEC TR 2:40-3:55 REI 262 Russell J
ONE HOUR COMMUNITY OUTREACH
BIOL-104 INTRODUCTORY BIOLOGY II 4
available seating
01 Hamilton M, Johnson E
LEC TRF 8:50-10:05 REI 103
LAB M 1:15-4:05 REI 459
REC W 6:15-7:30PM REI 103
Undergrads need instructor permission
Fees $145
BIOL-151 BIOLOGICAL CHEMISTRY 4
available seating
01 Rosenwald A, Tilli M
LEC MWF 11:15-12:05 REI 103
LAB T 1:15-4:05 REI 459
Fees $145
BIOL-194 GATEWAY:BIOL OF GLOBAL HEALTH 4
available seating
01 Elmendorf H, Rosenwald A
LEC TR 10:15-11:30 REI 261A
REC M 1:15-4:05 REI 439
BIOL-195 GATEWAY: NEUROBIOLOGY 4
available seating
01 LEC MWF 9:15-10:05 REI 262 Donoghue M
REC M 2:15-3:05 REI 284
STUDENTS WHO HAVE TAKEN BIOL 370 ARE NOT
ELIGIBLE TO TAKE THIS COURSE.
RESTRICTED TO NEUROBIOLOGY MAJORS
According to our sample output, the user first enters a course ID that he would like to add and I have to verify whether or not the string he entered was a valid course ID. This is what I did for that:
int FindID(FILE* fp, char* id)
{
char current;
int counter = 0;
int found = 0;
while((current = fgetc(fp)) != EOF && found == 0) //Scanning through the file
{
if(current == id[counter])
{
counter++;
}
else
{
counter = 0;
}
if(counter == 8 && id[counter] == '\0')
{
found = 1;
}
}
return found;
}
I scan through the file, character by character, and see if I can find a string of 8 characters that matches the string entered by the user (id. This portion works correctly, but it feels inefficient to me. Also, when I looked at our prelab for tomorrow (which is also over parsing), it specifies that we need to run through the file line by line. How would I approach this by going through the file line by line?
The next portion of my code attempts to extract the appropriate course data from the text file. I asked my TA, and he said that the credit hour amount would always come after the course information, which in turn would always come after the course ID. So I figured that I could just scan through the rest of the line of the file and store this as the course information, until I hit a number. Once I hit a number, I can store this value as the credit hour amount. Here is the code for that portion:
void getData(FILE* fp, char* description, int* credits, char* teacher)
{
char current;
int i = 0;
while((current = fgetc(fp)) != EOF)
{
if(isalpha(current) && isupper(current))
{
description[i] = current;
i++;
}
if(isnum(current))
{
*credits = current - '0';
break;
}
}
.
.
.
Last, I need to extract the professor's name. I don't think there is a consistent way to determine the professor based on its location relative to the course ID. However, I noticed that all of the professor's names have the first letter capitalized and the rest lower case until the last initial which is also capitalized. Also, if there is more than 1 professor, the last initial is follow by a comma. I started to write this using a similar method as before, but I started to get a little confused
So I guess what I'm asking in short is..how do people typically approach parsing? I'm not asking for a homework solution..just maybe some example code or resources.
Thanks!

New Topic/Question
Reply



MultiQuote


|