I've parsed a file as a list of strings line-by-line. Below is what one of the lines looks like:
00036762 04 n 03 feat 0 effort 2 exploit 0 006 @ 00035189 n 0000 ~ 00043116 n 0000 ~ 00043902 n 0000 ~ 00045646 n 0000 ~ 00046344 n 0000 ~ 00047018 n 0000 | a notable achievement; "he performed a great feat"; "the book was her finest effort"
I built a class that will store the values. Each of the storage properties is below:
- ID - a single string which is the first number you in the string above (it's always 8 digits and is the first 8 digits of every line)
- Words - a list of words contained in the line (note, there are 3 words here: "feat", "effort" and "exploit".. some words are separated by "_" which is a space, and others have hyphens such as "give-and-take")
- Pointers - a list of pointers contained in the line (the pointers are all of the other 8-digit numbers)
..the rest of the stuff in the string is essentially considered garbage to me at this point (as I don't need it for the function of my program).
I'm having the hardest time breaking this string up how I need to. I essentially have the rest of the program (the functions that will use this information) all finished, but I cannot get this. The furthest I've gotten was obtaining the first 8 digits.
I'd like to have it set up to just grab those first 8 digits because that ID number is in the same place every line (set as the ID), ignore all of the other crap until it comes upon a word of two characters or more (place it into my list of words), repeat by ignoring the other crap and either searching for another word or an 8-digit string, and take every 8-digit string that follows (and place them into my list of pointers). Everything after the last 8-digit number can be trashed.
It seems like it shouldn't be that difficult.. I just really suck at understanding all of the actual Regex expressions.
Here are a couple more lines for reference:
00039740 04 n 01 eye_contact 0 001 @ 00039297 n 0000 | contact that occurs when two people look directly at each other; "a teacher should make eye contact with the students" 00039916 04 n 01 fetch 0 001 @ 00037396 n 0000 | the action of fetching 00039990 04 n 01 placement 1 001 @ 00039297 n 0000 | contact established between applicants and prospective employees; "the agency provided placement services" 00040152 04 n 03 interchange 1 reciprocation 2 give-and-take 0 006 @ 00039021 n 0000 + 02372326 v 0201 + 02257370 v 0103 ~ 00040420 n 0000 ~ 00040545 n 0000 ~ 00040804 n 0000 | mutual interaction; the activity of reciprocating or exchanging (especially information)
This post has been edited by Cynosure: 05 March 2013 - 10:02 PM

New Topic/Question
Reply




MultiQuote




|