2 Replies - 1222 Views - Last Post: 26 February 2011 - 04:41 PM Rate Topic: -----

#1 Indigo_chilled  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 14-February 11

regular expressions with global variable

Posted 14 February 2011 - 05:30 PM

Hi,

I'm trying to run a regular expression on a string, print some index values, and also print the pattern and the number of times it is encountered. I have an idea of the code I'd like to use but I'm running into some syntax issues. Would appreciate any input.

- 'orfPattern' is a regular expression searching for codons in a string of gene text
- 'text' is a variable that holds text from a text file

Global variable
orfPattern = 'ATG(...)*?(TAG|TGA|TTA)'
orf = re.compile( orfPattern )
def displaymatch():
#    orf_pattern = search('ATG(...)*?[(TAG)(TGA)(TTA)]', text)
    for orf in text
        do
            print string.find(s, sub[, start[, end]])
            print string.rfind(s, sub[, start[, end]])
            print len(text)
            print orf_pattern
            countOrf=countOrf+1
        done            
        print countOrf
            



Thanks!

Is This A Good Question/Topic? 0
  • +

Replies To: regular expressions with global variable

#2 atraub  Icon User is offline

  • Pythoneer
  • member icon

Reputation: 756
  • View blog
  • Posts: 1,990
  • Joined: 23-December 08

Re: regular expressions with global variable

Posted 15 February 2011 - 07:36 AM

Could you specifically tell us what your errors are?
Was This Post Helpful? 0
  • +
  • -

#3 Indigo_chilled  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 14-February 11

Re: regular expressions with global variable

Posted 26 February 2011 - 04:41 PM

It turns out it was a syntax error.

The code I ended up writing looks like this:

orfPattern = r'ATG(...)*?(TAG|TGA|TTA)'
#compile the pattern
orfCompile = re.compile(orfPattern)
#match pattern to the text string
for match in orfCompile.finditer(text): 
    #check if length is <300
    start = int(match.start())
    end = int(match.end())
    size = end - start


However it turns out that this code ends up skipping the nested expressions within the regular expression. As in it won't find 'the string in 'ATGACAAGTGAATGGGTTGTATGATTTGGGAAATTA'. So I've re-written it to use find instead of match/compile.

pattern = 'ATG'
    for pos in xrange(len(text)):
        if (pattern == text[pos:pos+len(pattern)]):
            # Print positions starting with 1
            openOrf = pos+1
            strOrf = int(openOrf)     
            closeOrf1 = text.find('TAG',openOrf)
            closeOrf2 = text.find('TGA',openOrf)
            closeOrf3 = text.find('TAA',openOrf)
            xorf = text[openOrf:closeOrf1]
            size = len(xorf)
            if size > limit:
                #print result
                print 'Start:', openOrf
                print 'End:', closeOrf1
                print 'Length:', size
                codon = str(xorf)
                print codon[0:45]
            

I can't figure out how to ensure that there are no more ATG starts within the xorf slice I've made.

Any thoughts?
Thanks!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1