1 Replies - 150 Views - Last Post: 15 April 2015 - 04:32 PM Rate Topic: -----

#1 bigmike7801  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 15-April 15

How can I output non-escaped element tag in XML?

Posted 15 April 2015 - 08:13 AM

I have a Python script that I have inherited and my issue is that right now I have a chunk of text in a `paragraph` variable that contains anchor tags. For example:

This is text with a <a href="http://somewebsite.com">Link</a> in it.

What I'm required to do however is convert the anchor tags to the `apxh` name space so the above line should look something like this:

This is text with a <apxh:a href="http://somewebsite.com">Link</apxh:a> in it.

The problem is the way I have it above is outputting:

This is text with a &lt;apxh:a href=\"http://somewebsite.com;\"&gt;Link Text;&lt;/apxh:a&gt; in it.

My guess is that when I'm running the for loop on the `paragraph`, I need to some how find all anchor tags and text and do something like `etree.Element("{%s}a" % nm["apxh"], nsmap=nm)` but I'm not really sure

This is the current script:
    def get_news_feed(request):
        articles = models.Article.objects.all().filter(distributable = True)
        
        nm = {
                None: "http://www.w3.org/2005/Atom",
                "ap": "http://ap.org/schemas/03/2005/aptypes",
                "apcm": "http://ap.org/schemas/03/2005/apcm",
                "apnm": "http://ap.org/schemas/03/2005/apnm",
                "apxh": "http://www.w3.org/1999/xhtml",
                }
    
        doc = etree.Element("{%s}feed" % nm[None], nsmap=nm)
    
        for article in articles:
            entry = etree.Element("{%s}entry" % nm[None], nsmap=nm)
            content = etree.Element("{%s}content" % nm[None], nsmap=nm)
            content.set("type", "xhtml")
    
            div = etree.Element("{%s}div" % nm["apxh"], nsmap=nm)
            for paragraph in article.body.replace("&amp;", "&").split("\n"):
                par = etree.Element("{%s}p" % nm["apxh"], nsmap=nm)
                par.text = paragraph            
                par.text = paragraph.replace("<a", "<apxh:a")            
                par.text = par.text.replace("</a", "</apxh:a")  
                par.text = cleanup_entities(par.text)
                div.append(par)
            content.append(div)
            entry.append(content)
            
            doc.append(entry)
    
        output = etree.tostring(doc, encoding="UTF-8", xml_declaration=True, pretty_print=True)
        return HttpResponse(output, mimetype="application/xhtml+xml")



This is how the output should look:
        
    <?xml version='1.0' encoding='UTF-8'?>
    <feed xmlns:ap="http://ap.org/schemas/03/2005/aptypes" xmlns:apxh="http://www.w3.org/1999/xhtml" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns="http://www.w3.org/2005/Atom">
      <entry>
        <content type="xhtml">
          <apxh:div>
            <apxh:p>This is some text</apxh:p>
            <apxh:p>This is text with a <apxh:a href="http://somewebsite.com">Link</apxh:a> in it.</apxh:p>
            <apxh:p>Theater</apxh:p>
          </apxh:div>
        </content>
      </entry>
    </feed>



This is how the output currently looks:
    <?xml version='1.0' encoding='UTF-8'?>
    <feed xmlns:ap="http://ap.org/schemas/03/2005/aptypes" xmlns:apxh="http://www.w3.org/1999/xhtml" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns="http://www.w3.org/2005/Atom">
      <entry>
        <content type="xhtml">
          <apxh:div>
            <apxh:p>This is some text</apxh:p>
            <apxh:p>This is text with a &lt;apxh:a href=\"http://somewebsite.com;\"&gt;Link Text;&lt;/apxh:a&gt; in it.</apxh:p>
            <apxh:p>Theater</apxh:p>
          </apxh:div>
        </content>
      </entry>
    </feed>



Is This A Good Question/Topic? 0
  • +

Replies To: How can I output non-escaped element tag in XML?

#2 atraub  Icon User is offline

  • Pythoneer
  • member icon

Reputation: 813
  • View blog
  • Posts: 2,192
  • Joined: 23-December 08

Re: How can I output non-escaped element tag in XML?

Posted 15 April 2015 - 04:32 PM

I wrote a short snippet very similar to yours just so that we could see some input and output. Here's what I came up with
def test(filePath):
    with open (filePath, "r") as inFile:
        return inFile.read().replace(r"<a ",r"<apxh:a").replace(r"</a>",r"</apxh:a>")



And I tested it against this file
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns:ap="http://ap.org/schemas/03/2005/aptypes" xmlns:apxh="http://www.w3.org/1999/xhtml" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns="http://www.w3.org/2005/Atom">
  <entry>
    <content type="xhtml">
      <apxh:div>
        <apxh:p>This is some text</apxh:p>
        <apxh:p>This is text with a <a href="http://somewebsite.com">Link</a> in it.</apxh:p>
        <apxh:p>Theater</apxh:p>
      </apxh:div>
    </content>
  </entry>
</feed>



and here's the result
>>> print(test("./test.xml"))
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns:ap="http://ap.org/schemas/03/2005/aptypes" xmlns:apxh="http://www.w3.org/1999/xhtml" xmlns:apnm="http://ap.org/schemas/03/2005/apnm" xmlns:apcm="http://ap.org/schemas/03/2005/apcm" xmlns="http://www.w3.org/2005/Atom">
  <entry>
    <content type="xhtml">
      <apxh:div>
        <apxh:p>This is some text</apxh:p>
        <apxh:p>This is text with a <apxh:ahref="http://somewebsite.com">Link</apxh:a> in it.</apxh:p>
        <apxh:p>Theater</apxh:p>
      </apxh:div>
    </content>
  </entry>
</feed>


What do you think? Does this help?

EDIT:
Also, what is up with your string formatting??? {%s} is mixing the old and new style. Unless there's a specific reason you're doing it that way that I have never heard of, here's what you should be doing (assuming this is python 2.6 or later).
    for article in articles:
        entry = etree.Element("{}entry".format(nm[None]), nsmap=nm)
        content = etree.Element("{}content".format(nm[None]), nsmap=nm)
        content.set("type", "xhtml")

        div = etree.Element("{}div".format(nm["apxh"]), nsmap=nm)
        for paragraph in article.body.replace("&amp;", "&").split("\n"):
            par = etree.Element("{}p".format(nm["apxh"]), nsmap=nm)


This post has been edited by atraub: 15 April 2015 - 04:41 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1