7 Replies - 2033 Views - Last Post: 12 July 2009 - 06:36 PM Rate Topic: -----

#1 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

[python] Similar Item Pattern Matching

Post icon  Posted 09 July 2009 - 08:30 PM

I am trying to write a python script that will allow me to find similar items in a file

say the file looks like this:
FAH 98-1
FAH 98.1



it would match those two items, and the real file will have multiple different combinations and possibly need to match more than two occurrences of similar items.

I think i would like to use Regular Expressions to solve this problem

Thanks!
Is This A Good Question/Topic? 0
  • +

Replies To: [python] Similar Item Pattern Matching

#2 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 09 July 2009 - 09:00 PM

this is what we are thinking,

build a regular expression for each line like this.

if the line is "ASD"
a regex would be built to find "A - anything - S - anything - D - anything"
then be run against all the other lines in the file.

how to i build this regex?
Was This Post Helpful? 0
  • +
  • -

#3 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 09 July 2009 - 09:06 PM

thats not completely right because if i have AS.D and AS-D that would not work. do you think i should replace the characters like '-' and '.' or something else?
Was This Post Helpful? 0
  • +
  • -

#4 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 10 July 2009 - 05:45 PM

	
for item in lines:
		regex = ''
		for char in item:
			if isPuncuation(char):
				pass
			else:
				#regex += char
				regex += '[!-/\s]?'


Was This Post Helpful? 0
  • +
  • -

#5 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 10 July 2009 - 05:56 PM

import re

def main():
	inFile = file("input", "r")
	lines = []
	
	#Split String Up
	textPre = inFile.read()
	text = textPre.split('\n')
	for item in text:
		lines.append(list(item))
	
	resultSet = []
	#Compare For Sims
	for item in lines:
		regexExpr = ''
		for char in item:
			if isPuncuation(char):
				pass
			else:
				regexExpr += char
				regexExpr += '[!-/\s]?'
		regex = re.compile(regexExpr)
		resultSet.append(re.findall(regex, textPre))
	print resultSet
		

def isPuncuation(char):
	if char == '.':
		return True
	elif char == '-':
		return True
	else:
		return False
	
if __name__ == "__main__":
	main()



this code almost works, it only finds something on the second itteration of the loop
Was This Post Helpful? 0
  • +
  • -

#6 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 10 July 2009 - 07:19 PM

import re

def main():
	inFile = file("input", "r")
	lines = []
	
	#Split String Up
	textPre = inFile.read()
	text = textPre.split('\n')
	for item in text:
		lines.append(list(item))
	
	resultSet = []
	#Compare For Sims
	for lCount in range(0, lines.__len__()):
		regexExpr = ''
		for char in lines[lCount]:
			if isPuncuation(char):
				pass
			else:
				regexExpr += char
				regexExpr += '[!-/\s]?'
		#print regexExpr
				
		regex = re.compile(regexExpr)
		
		for count in range(0, lines.__len__()):
			if re.match(regex, ''.join(lines[count])):
				if lCount == count:
					pass
				else:
					print 'match for line ' + str(lCount) + ' with line ' + str(count)
			
		

def isPuncuation(char):
	if char == '.':
		return True
	elif char == '-':
		return True
	elif char == ' ':
		return True
	else:
		return False
	
if __name__ == "__main__":
	main()




it works perfectly now! any further sugestions for the code?
Was This Post Helpful? 0
  • +
  • -

#7 code_m  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 21
  • View blog
  • Posts: 194
  • Joined: 21-April 09

Re: [python] Similar Item Pattern Matching

Posted 12 July 2009 - 06:30 PM

Not sure here, but you might want to try using a set.

Sets and Lists are very much alike, but a Set has no definite order (in theory), and can only contain 1 occurance of an item, so say your file was:

HJK 92
HSF 92

and you seperated all whitespace:
>>> items = set()
>>> for item in file.split():
...    items.add(item)

the resulting set would be: {'HJK', '92', 'HSF'}
Was This Post Helpful? 0
  • +
  • -

#8 dz0004455  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 9
  • Joined: 03-April 09

Re: [python] Similar Item Pattern Matching

Posted 12 July 2009 - 06:36 PM

I can do a really simple version of it, as i showed you, but now i am working on finding abbreviations also, and that is hard. I am trying to build a regex that will find abbreviations and similar items. I am also trying to fix a few bugs that I am getting
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1