mysql - Python MySQLdb change string encoding -
I believe Python does not play well with the character encoding of a column in the SQL table:
| Column | Varchar (255) | Latin 1_swedish_c | Yes. | Faucet | Choose, insert, update, reference |
Shows output for this column above. Thus is varchar (255)
and encoding is latin1_swedish_ci.
Now when I want to play Python with this data, then I'm getting the following error:
word = gs.corpora.Dictionary (tweets ) File "/usr/local/lib/python2.7/dist-packages/gensim-0.9.1-py2.7.egg/gensim/corpora/dictionary.py", in line 50, in __init__ self.add_documents (document) The file "/usr/local/lib/python2.7/dist-packages/gensim-0.9.1-py2.7.egg/Gensim/corpora/kun.py", line 97, add_documents _ = in self.doc2bow (document , Allow_update = true) # Do not ignore the results, we only care to update token id "/ usr / local / lib / python2" 7 / dist-packages / gensim-0 9.1-py2.7.egg / gensim / corpora / dictionary.py ", line 121, doc2bow document = sorted (for tokens in utils.to_utf8 (token) document)" / Usr / local / lib / python2 7 / dist-packages / gensim-0.9.1-py2.7.egg / gensim / corpora / dictionary.py, in line 121, in & lt; genexpr & gt; document = sorted (utils for tokens in document .to_utf8 (token)) file "/usr/local/lib/python2.7/dist-packages/gensim-0.9.1-py2.7.egg/gensim/utils.py", line 164, any 2utf8 returns Unicode (Text, encoding, errors = errors) .encode ('utf8') file "/usr/lib/python2.7/encodings/utf_8.py", line 16, decoded return In Odeks. Utf_8_decode (Input, Errors, True) Unicodecode Error: 'utf8' can not decode byte 0x96 in codec state 0: Invalid start byte
GS
Topics Modeling The library is I believe the problem is that Genius needs Unicode encoding.
- How can I change the character encoding for this column in my database?
- Is there any alternative solution?
Thanks for all help!
I think your MYSQLdb dragon library does not know it to encode Utf8
and the default dragon is encoding for system-defined charsets.
When you connect to your database, pass charset = 'utf8'
parameter to it manually SET NAMES
Comments
Post a Comment