3 Replies - 381 Views - Last Post: 23 October 2013 - 04:36 AM Rate Topic: -----

#1 starriol  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 11-October 13

Searching for a list of strings in a file with Python

Posted 13 October 2013 - 10:34 PM

Hi guys,

I'm trying to search for several strings, which I have in a .txt file line by line, on another file.
So the idea is, take input.txt and search for each line in that file in another file, let's call it rules.txt.

So far, I've been able to do this, to search for individual strings:

import re

shakes = open("output.csv", "r")

for line in shakes:
    if re.match("STRING", line):
        print line,


How can I change this to input the strings to be searched from another file?

So far I haven't been able to.

Thanks for the ideas.

Is This A Good Question/Topic? 0
  • +

Replies To: Searching for a list of strings in a file with Python

#2 andrewsw  Icon User is offline

  • It's just been revoked!
  • member icon

Reputation: 3608
  • View blog
  • Posts: 12,399
  • Joined: 12-December 12

Re: Searching for a list of strings in a file with Python

Posted 14 October 2013 - 03:25 AM

Open both files and use two, nested, loops. Between the inner and outer loop you need file.seek(0) to return to the start of the file.

I won't provide code though, as it's simple enough (or you could search).

If the files are huge then you might have to reconsider the approach: it could take a while.
Was This Post Helpful? 0
  • +
  • -

#3 xdr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 23-October 13

Re: Searching for a list of strings in a file with Python

Posted 23 October 2013 - 01:26 AM

Hi,

Assuming you are searching fixed strings (not regex patterns) at the beginning
of the line, you can benefit of using memory-mapped files instead of plain ones.
This can substantially speed-up the process. Full working program is below.
This was written for python3 but if you are on python2, just use uncomment
corresponding lines.

#!/usr/bin/env python3

import mmap
import sys


def main():
    if len(sys.argv) != 3:
        print("Usage: {} <input_file> <rules_file>".format(__file__))
        sys.exit(1)

    input_file_name, rules_file_name = sys.argv[1:]


    with open(input_file_name, "rb") as input_file:
        with open(rules_file_name, "rb") as rules_file:
            mapf = mmap.mmap(rules_file.fileno(), 0, access=mmap.ACCESS_READ)
            for pattern in input_file:
                pattern = pattern.rstrip(b"\n")

                # skip over empty search patterns
                if not pattern:
                    continue

                start_pos = 0
                mapf.seek(0)

                while True:
                    start_pos = mapf.find(pattern, start_pos)

                    if start_pos != -1:
                        #if pattern found at the beginning of file or at the beginning of a line
                        if start_pos == 0 or mapf[start_pos - 1] == ord(b"\n"):
                        #if start_pos == 0 or mapf[start_pos - 1] == b"\n":     # !!! for python2
                            mapf.seek(start_pos)
                            line = mapf.readline()
                            start_pos += len(line)
                            print(str(line, encoding="utf8"), end='')
                            #print(line.rstrip(b"\n")) # !!! for python2
                        else:
                            # if pattern found in the middle of the string, move ahead
                            start_pos += len(pattern)
                    else:
                        break
            mapf.close()


if __name__ == "__main__":
    main()


So, if input.txt is
foo
bar
baz

and rules.txt is
test string 1
foo 123
123 bar
test string 2
bazzzz

test string 3


than ./search.py input.txt rules.txt
will give you
foo 123
bazzzz


Hope this is what is expected!

This post has been edited by andrewsw: 23 October 2013 - 03:17 AM
Reason for edit:: Removed unnecessary quote

Was This Post Helpful? 0
  • +
  • -

#4 xdr  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 23-October 13

Re: Searching for a list of strings in a file with Python

Posted 23 October 2013 - 04:36 AM

UPDATE: Windows text files have "\r\n" at the end of a line, so some corrections needed:

pattern = pattern.rstrip(b"\n\r")

if start_pos == 0 or mapf[start_pos - 1] in [ord(b"\n"), ord(b"\r")]:
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1