2/08/2007

UTF-8 issue in python: imaplib and MySQLdb

When I tried to save UTF-8 charset words to a MySQL database, I had a lot of problems. I finally figured it out.

The table in MySQL should be in utf-8 format. It can be configured in MySQL

When I connected to the MySQL, I must specify the connection to utf8: db.set_character_set('utf8').

I thought all emails have the same charset, however, I found that they have different charsets. Thus in the python code, I process them by using different decoder.

r,data =M.fetch(num, "(BODY[HEADER.FIELDS (CONTENT-TYPE)])")
m = email.message_from_string(data[0][1]) #data[0][1] is a string that contains the content-type
Message_encoding = m.get_charsets()[0] #get_charsets() return a list, the first one([0]) indicates the charset of the message body

if Message_encoding == "utf-8": #if the encoding is ascii, I don't need to decode that
Error_Description = Error_Description.decode("utf-8")
elif Message_encoding == "iso-8859-1":
Error_Description = Error_Description.decode("ISO8859-1")
elif Message_encoding == "us-ascii":
pass
else:
Error_Description = Error_Description.decode("utf-8", "replace")

No comments: