I'm trying to import an ASCII file into my webapp but the file contains non ascii characters (é,µ,π,ú) and its borking the whole process.
What I've done is write a small script that checks for non ascii characters and prints out the line plus its index.
What I want to be able to do is convert the characters into ascii format or convert the whole thing into utf-8 if thats possible?
Here my code so far:
def is_ascii(s): return all(ord(c) < 128 for c in s) with open('MYFILE.TXT', 'r') as the_file: data_list = the_file.readlines() for index, line in enumerate(data_list): if is_ascii(line) is False: the_line = index the_line += 1 print "%s: "% the_line, data_list[index]
This is a sample output of an invalid charater:
2: 00000737663700000000000 Héllo/
Basically I want to reconcile the data before doing any import. Thanks.