I'm having to discuss the following process in detail for specs. However, I'm having trouble deciding how to categorize it (deterministic or fuzzy?)
I have two datasets that I am trying to match to one another. One dataset's contents is a subset of the other, but contains typographical errors of varying types and magnitude.
I am applying the following logic to the data linkage process:
Apply regular expressions in an iterative manner with increasing flexibility until a match is found (for example, in one iteration, leave vowels as optional). If two matches are found for one record within one iteration, categorize match as tie and leave unmatched. Apply Python's fuzzy regex to handle scenarios which a rule-based regex can't handle, namely character insertions, deletions, and substitutions within an edit distance of one.
Would this be considered a deterministic process only?
Your help would be appreciated. I do not have a CS background, so I apologize if my question is fairly rudimentary.
0 Replies - 1357 Views - Last Post: 05 March 2013 - 03:32 PM
Page 1 of 1