2 Replies - 33791 Views - Last Post: 09 November 2010 - 12:44 PM Rate Topic: -----

#1 Sergio Tapia  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1253
  • View blog
  • Posts: 4,168
  • Joined: 27-January 10

Using Python and Google Translate.

Posted 05 November 2010 - 03:35 PM

I’ll show you how to use Python to create a simple script that translates your text for use in other applications. We’ll see how to scrape HTML using BeautifulSoup, and how to use the official Google AJAX service for translation.

HTML Scraping

We need to import some modules first:

import urllib2
import urllib
import json

from BeautifulSoup import BeautifulSoup



Now let’s define a method that will return translated text:

def fromHtml(self, text, languageFrom, languageTo):
        """
        Returns translated text that is scraped from Google Translate's HTML
        source code.
        """
        #We create a List of key:value so we can know which language code to use.
        langCode={
            "arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
            "croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
            "english":"en", "finnish":"fi", "french":"fr", "german":"de",
            "greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
            "korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
            "romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }
        #Set the user agent.
        urllib.FancyURLopener.version = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008070400 SUSE/3.0.1-0.1 Firefox/3.0.1"

        #Encode the parameters we're going to send to the Google servers.
        try:
            postParameters = urllib.urlencode({"langpair":"%s|%s" %(langCode[languageFrom.lower()],langCode[languageTo.lower()]), "text":text,"ie":"UTF8", "oe":"UTF8"})
        except KeyError, error:
            print "Currently we do not support %s" %(error.args[0])
            return

        #Send the request with the above parameters and save to 'page' variable.
        page = urllib.urlopen("http://translate.google.com/translate_t", postParameters)

        #content now contains the HTML source code of the website.
        content = page.read()
        #Don't forget to close the connection!
        page.close()



Let's break this down.

First you create a dictionary to store the language symbols Google uses. Then we setup a user agent for our scraper. Then we encode our translation parameters to the query. Then we read the contents of the HTML to a local variable.

So far so good.

Now let's use BeautifulSoup to scrape what we need.

        #content now contains the HTML source code of the website.
        content = page.read()

        htmlSource = BeautifulSoup(content)
        #Google creates a span with title the same as the text you wanted to translate.
        #So let's find a 'span' that has as a Title the 'text' we passed to this method.
        translation = htmlSource.find('span', title=text )

        #the renderContents() method returns the body that is inside of the span we found.
        return translation.renderContents()




We use the .find() method to find a span that has the title of the text we searched. This is unique to Google's markup. It's just a matter of finding a pattern.

The .renderContents() method returns the inner contents of the tag.

We're done! The method will return translated text! :D


Official AJAX Response

Using the AJAX response is better in my opinion because you save bandwith by not having to download the complete source code of the site.

Here’s how you do it:

def fromAjax(self, text, languageFrom, languageTo):
        """
        Returns a simple string translating the text from "languageFrom" to
        "LanguageTo" using Google Translate AJAX Service.
        """
        LANG={
            "arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
            "croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
            "english":"en", "finnish":"fi", "french":"fr", "german":"de",
            "greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
            "korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
            "romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }

        base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
        langpair='%s|%s'%(LANG.get(languageFrom.lower(),languageFrom),
                          LANG.get(languageTo.lower(),languageTo))
        params=urllib.urlencode( (('v',1.0),
                           ('q',text.encode('utf-8')),
                           ('langpair',langpair),) )
        url=base_url+params
        content=urllib2.urlopen(url).read()
        try: trans_dict=json.loads(content)
        except AttributeError:
            try: trans_dict=json.load(content)
            except AttributeError: trans_dict=json.read(content)
        return trans_dict['responseData']['translatedText']



It's even easier! This AJAX request query returns the translated text without the need to scape and parse HTML using an external library. It Just Works™.

I hope this helps you learn a bit more about Python.

Thanks for reading and leave some feedback!

Is This A Good Question/Topic? 1
  • +

Replies To: Using Python and Google Translate.

#2 kader-dz  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 25-July 10

Re: Using Python and Google Translate.

Posted 06 November 2010 - 01:11 PM

Thank you for the lesson
I want to learn this language
Was This Post Helpful? 0
  • +
  • -

#3 Seta00  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 84
  • Joined: 22-September 10

Re: Using Python and Google Translate.

Posted 09 November 2010 - 12:44 PM

I wanted to do some Google Translate hackery a few weeks ago but got too lazy to actually go and try it :P
This is very useful info :D
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1