5 Replies - 6385 Views - Last Post: 01 March 2011 - 09:56 AM Rate Topic: -----

#1 bern1812   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 18-March 10

Recursive Parsing of Newick Tree

Posted 28 February 2011 - 01:00 PM

I am trying to parse a tree in Newick format, where the tree can have any number of children. A tree has a label and a list of its children and each node is a tree object and its edge weight.

For example if I had the Newick String
(A:5, B:5, (C:6, D:5)E:2);

It would have an unnamed root node with three children A, B and E with their given edge lengths. A and B would have an empty set of children and E would have two childen C and D, both of which have no children.

What I am looking for is a way to parse this recursively.

What I was thinking is that if you strip off the outer parenthatheis you are left with a list of tree separated by commas. Each of these needs to be a tree object and then the objects need to be hooked together in the list of children. What I dont know is how to make this happen.

Any help would be appreciated.

This post has been edited by bern1812: 28 February 2011 - 01:01 PM


Is This A Good Question/Topic? 0
  • +

Replies To: Recursive Parsing of Newick Tree

#2 atraub   User is offline

  • Pythoneer
  • member icon

Reputation: 837
  • View blog
  • Posts: 2,271
  • Joined: 23-December 08

Re: Recursive Parsing of Newick Tree

Posted 28 February 2011 - 01:59 PM

Essentially, this is a complex case of string parsing. I'll see what I can do to help you out with this one, but I would like to first ask you: If you were looking at this string, how would you parse it by hand? Explain to me your approach. If you can tell me how to parse such a string, you may find it easier to tell a computer how to.

This post has been edited by atraub: 28 February 2011 - 02:21 PM

Was This Post Helpful? 0
  • +
  • -

#3 bern1812   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 18-March 10

Re: Recursive Parsing of Newick Tree

Posted 28 February 2011 - 02:59 PM

View Postatraub, on 28 February 2011 - 01:59 PM, said:

Essentially, this is a complex case of string parsing. I'll see what I can do to help you out with this one, but I would like to first ask you: If you were looking at this string, how would you parse it by hand? Explain to me your approach. If you can tell me how to parse such a string, you may find it easier to tell a computer how to.


By hand I would do it from the bottom up. I would go to the lowest level, create each leaf, then create its parent with the children in a list and then move up from there. From what I have been able to understand the general strategy to string parsing of this type is that of a top down approach which I suppose would be different than my strategy
Was This Post Helpful? 0
  • +
  • -

#4 bern1812   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 18-March 10

Re: Recursive Parsing of Newick Tree

Posted 28 February 2011 - 06:22 PM

If I were to remove the top level parens then I would be left with 3 Newick trees that will be the children of the root.
A:5
B:5
(C:6, D:6)E

Each one needs to be created as a tree object. A and B are easy as they are leaf nodes so I dont have to do any more digging. E on the other hand has 2 children. So I would have to dive into those parens and I suppose strip them and be left with 2 more Newick trees with no children. I would then make these the children of E.


How to do this recursively I have not the slightest idea though.
Was This Post Helpful? 0
  • +
  • -

#5 atraub   User is offline

  • Pythoneer
  • member icon

Reputation: 837
  • View blog
  • Posts: 2,271
  • Joined: 23-December 08

Re: Recursive Parsing of Newick Tree

Posted 28 February 2011 - 07:51 PM

You're over complicating it. In my mind, step 1 would be to isolate each item in that string.I think I'd want to turn that string from:
"(A:5, B:5, (C:6, D:5)E:2)"
to:
[["A",5,[]],["B",5,[]],["E",2,[["C",6,[]],["D",5,[]]]]]

So my question is, if you were going to explain to a non-programmer how to take that thing in quotes and put it into a series of boxes, what would you tell him?

EDIT:
Added some color for highlighting.

This post has been edited by atraub: 01 March 2011 - 09:54 AM

Was This Post Helpful? 0
  • +
  • -

#6 atraub   User is offline

  • Pythoneer
  • member icon

Reputation: 837
  • View blog
  • Posts: 2,271
  • Joined: 23-December 08

Re: Recursive Parsing of Newick Tree

Posted 01 March 2011 - 09:56 AM

There are other various ways of approaching this sort of thing. instead of nesting the crap out of it (as you saw in my earlier post) you could also give each branch an ID and then make it look more like this:
[[1,"A",5,[]],[2,"B",5,[]],[3,"C",6,[]],[4,"D",5,[]],[5,"E",2,[3,4]]]

Organizing the data in such a way may seem like a useless intermediate step, but think about how much easier it'll be turning that list into a tree rather than turning the string into one. It also would give you the opportunity to more easily use the input data with other tree structures later down the line if necessary. You could also consider using a data structure that would allow you to pre-impose the size limit. This would eliminate the need for each branch to have a separate size associated with it.

This post has been edited by atraub: 01 March 2011 - 09:56 AM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1