4 Replies - 771 Views - Last Post: 04 April 2011 - 10:50 AM Rate Topic: -----

#1 wisam abbasi  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 12-December 09

splitting text file

Posted 30 March 2011 - 05:18 PM

Hi
I am having a problem in splitting an HTML docuement stored in a text file
according to the <div> tag.

I can split the document into paragraphs using:
p = re.compile('[\n]')
s2 = p.sub('',s)



where s is the text
but how to split according to <div (not <div> or </div>)???

I've tried to use:
p = re.compile('<div')
s2 = p.sub('',s)


but it did not work.

Is This A Good Question/Topic? 0
  • +

Replies To: splitting text file

#2 atraub  Icon User is offline

  • Pythoneer
  • member icon

Reputation: 759
  • View blog
  • Posts: 2,010
  • Joined: 23-December 08

Re: splitting text file

Posted 31 March 2011 - 09:59 AM

What sort of results do you get from
p = re.compile('<div')
s2 = p.sub('',s)

Was This Post Helpful? 0
  • +
  • -

#3 wisam abbasi  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 12-December 09

Re: splitting text file

Posted 04 April 2011 - 07:26 AM

View Postatraub, on 31 March 2011 - 10:59 AM, said:

What sort of results do you get from
p = re.compile('<div')
s2 = p.sub('',s)


mmmmmm
sorry, I've forgot what were the results.
anyway it wasn't the right solution for my problem.
I had to split the document using word indexes.
:sweatdrop:
Thanks
:rolleyes:
Was This Post Helpful? 0
  • +
  • -

#4 baavgai  Icon User is offline

  • Dreaming Coder
  • member icon

Reputation: 5800
  • View blog
  • Posts: 12,636
  • Joined: 16-October 07

Re: splitting text file

Posted 04 April 2011 - 09:44 AM

It does kind of depend what you're after. It really doesn't make sense to blast <div without the closing tag...

Perhaps:
>>> s = '<html><div><div id="foo"><p>Hi</p></div></div>'
>>> s
'<html><div><div id="foo"><p>Hi</p></div></div>'
>>> re.compile('<div [^>]*>').sub('',s)
'<html><div><p>Hi</p></div></div>'
>>> 


Was This Post Helpful? 1
  • +
  • -

#5 wisam abbasi  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 12-December 09

Re: splitting text file

Posted 04 April 2011 - 10:50 AM

View Postbaavgai, on 04 April 2011 - 10:44 AM, said:

It does kind of depend what you're after. It really doesn't make sense to blast <div without the closing tag...

Perhaps:
>>> s = '<html><div><div id="foo"><p>Hi</p></div></div>'
>>> s
'<html><div><div id="foo"><p>Hi</p></div></div>'
>>> re.compile('<div [^>]*>').sub('',s)
'<html><div><p>Hi</p></div></div>'
>>> 



Thanks alot
and I said before
anyway it wasn't the right solution for my problem.
Thanks for the help
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1