python 2.7 - How can I use word_tokenize in nltk and keep the spaces? -


takes the word_tokenize function in nltk, as far as I understand, a string sentence and return Represents a list of all your words:

  & gt; & Gt; & Gt; Import from nltk word_tokenize, wordpunct_tokenize & gt; & Gt; & Gt; S = ("Good muffins cost $ 3.88 in New York, please buy me two of them. \ N \ nThank you. \ N" "> Gt; & Gt; & Gt; Word_tokenize ('good', 'muffin', 'cost', '$', '3.88', 'in', 'new', 'york', 'please', 'buy', 'm' It is important to keep a blank space for further calculations, but 'so I would like to return it to it like the  word_tokenize : 

  [' good ' '', '', '' '', '', '', '', '', '', '' '', '' '' '', '' '' '', '' '' '' ' 'York', '', 'please', '', 'buy', '', 'm', '', 'two', '', 'ki', '', 'them', '.', 'Thank you', '.']]  

How do I change / change it Can you / tweak word_tokenize ?

Step 1: Break by string and spacing

Step 2: word_tokenize

 < Code> using each word (split according to location in step 1)>> S = "Good muffins cost $ 3.88 in New York, please let me know \ n" & gt; & Gt; & Gt; Ll = [[word_tokenize (w), ''] s split () for w & gt; & Gt; & Gt; List ('*', '', '', '', 'cost' '' '' '$' ',' 3.88 ',' '' '' '' '' '', '', 'York', '.', '', 'Please', '', 'buy', '', 'me' '']  

Comments

Popular posts from this blog

import - Python ImportError: No module named wmi -

Editing Python Class in Shell and SQLAlchemy -

c# - MySQL Parameterized Select Query joining tables issue -