2 Replies - 713 Views - Last Post: 05 March 2013 - 10:57 PM Rate Topic: -----

#1 Cynosure  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 44
  • Joined: 04-February 09

Parsing - Regex & Splitting Help;

Posted 05 March 2013 - 10:01 PM

Hey there,

I've parsed a file as a list of strings line-by-line. Below is what one of the lines looks like:

00036762 04 n 03 feat 0 effort 2 exploit 0 006 @ 00035189 n 0000 ~ 00043116 n 0000 ~ 00043902 n 0000 ~ 00045646 n 0000 ~ 00046344 n 0000 ~ 00047018 n 0000 | a notable achievement; "he performed a great feat"; "the book was her finest effort" 


I built a class that will store the values. Each of the storage properties is below:
  • ID - a single string which is the first number you in the string above (it's always 8 digits and is the first 8 digits of every line)
  • Words - a list of words contained in the line (note, there are 3 words here: "feat", "effort" and "exploit".. some words are separated by "_" which is a space, and others have hyphens such as "give-and-take")
  • Pointers - a list of pointers contained in the line (the pointers are all of the other 8-digit numbers)

..the rest of the stuff in the string is essentially considered garbage to me at this point (as I don't need it for the function of my program).

I'm having the hardest time breaking this string up how I need to. I essentially have the rest of the program (the functions that will use this information) all finished, but I cannot get this. The furthest I've gotten was obtaining the first 8 digits.

I'd like to have it set up to just grab those first 8 digits because that ID number is in the same place every line (set as the ID), ignore all of the other crap until it comes upon a word of two characters or more (place it into my list of words), repeat by ignoring the other crap and either searching for another word or an 8-digit string, and take every 8-digit string that follows (and place them into my list of pointers). Everything after the last 8-digit number can be trashed.

It seems like it shouldn't be that difficult.. I just really suck at understanding all of the actual Regex expressions.

Here are a couple more lines for reference:

00039740 04 n 01 eye_contact 0 001 @ 00039297 n 0000 | contact that occurs when two people look directly at each other; "a teacher should make eye contact with the students"  
00039916 04 n 01 fetch 0 001 @ 00037396 n 0000 | the action of fetching  
00039990 04 n 01 placement 1 001 @ 00039297 n 0000 | contact established between applicants and prospective employees; "the agency provided placement services"  
00040152 04 n 03 interchange 1 reciprocation 2 give-and-take 0 006 @ 00039021 n 0000 + 02372326 v 0201 + 02257370 v 0103 ~ 00040420 n 0000 ~ 00040545 n 0000 ~ 00040804 n 0000 | mutual interaction; the activity of reciprocating or exchanging (especially information)  


This post has been edited by Cynosure: 05 March 2013 - 10:02 PM


Is This A Good Question/Topic? 0
  • +

Replies To: Parsing - Regex & Splitting Help;

#2 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1010
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Parsing - Regex & Splitting Help;

Posted 05 March 2013 - 10:47 PM

You'll want to split it into two regex expressions, it will save your sanity later.

String input = "00036762 04 n 03 feat 0 effort 2 exploit 0 006 @ 00035189 n 0000 ~ 00043116 n 0000 ~ 00043902 n 0000 ~ 00045646 n 0000 ~ 00046344 n 0000 ~ 00047018 n 0000 | a notable achievement; \"he performed a great feat\"; \"the book was her finest effort\"";

String frontPart = input.Split('|')[0];
MatchCollection numbers = Regex.Matches(frontPart, "(?<id>\d{8})");
MatchCollection words = Regex.Matches(frontPart, "(?<word>[A-Za-z_\-]{2})";



The first MatchCollection is all your 8 digit numbers, starting with the ID number. The second is all the words that are 2 characters or more.

The reason I do the split first is that it's easier to not deal with all the text at the end.
Edit: Fixed bug, probably added new one.

This post has been edited by Momerath: 05 March 2013 - 10:53 PM

Was This Post Helpful? 1
  • +
  • -

#3 Cynosure  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 44
  • Joined: 04-February 09

Re: Parsing - Regex & Splitting Help;

Posted 05 March 2013 - 10:57 PM

Thanks so much! <3

:euro:
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1