1 Replies - 419 Views - Last Post: 15 May 2013 - 11:46 AM Rate Topic: -----

#1 donfanzu  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 31
  • Joined: 08-April 12

Non ascii characters conversion

Posted 14 May 2013 - 11:09 AM

Hi guys,

I'm trying to import an ASCII file into my webapp but the file contains non ascii characters (,,π,) and its borking the whole process.
What I've done is write a small script that checks for non ascii characters and prints out the line plus its index.
What I want to be able to do is convert the characters into ascii format or convert the whole thing into utf-8 if thats possible?

Here my code so far:
def is_ascii(s):
    return all(ord(c) < 128 for c in s)

with open('MYFILE.TXT', 'r') as the_file:
    data_list = the_file.readlines()


for index, line in enumerate(data_list):
    if is_ascii(line) is False:
        the_line = index
        the_line += 1
        print "%s: "% the_line, data_list[index]





This is a sample output of an invalid charater:
2:      00000737663700000000000 Hllo/   


Basically I want to reconcile the data before doing any import. Thanks.

Is This A Good Question/Topic? 0
  • +

Replies To: Non ascii characters conversion

#2 alexr1090  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 38
  • View blog
  • Posts: 108
  • Joined: 08-May 11

Re: Non ascii characters conversion

Posted 15 May 2013 - 11:46 AM

did you try converting them to unicode yet and seeing if that'll work? Simply
unicode(the_line)
. Let me know if that works or not. Thanks.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1