python 2.7 - How can I use word_tokenize in nltk and keep the spaces? -

- May 15, 2013

takes the word_tokenize function in nltk, as far as I understand, a string sentence and return Represents a list of all your words:

  & gt; & Gt; & Gt; Import from nltk word_tokenize, wordpunct_tokenize & gt; & Gt; & Gt; S = ("Good muffins cost $ 3.88 in New York, please buy me two of them. \ N \ nThank you. \ N" "> Gt; & Gt; & Gt; Word_tokenize ('good', 'muffin', 'cost', '$', '3.88', 'in', 'new', 'york', 'please', 'buy', 'm' It is important to keep a blank space for further calculations, but 'so I would like to return it to it like the  word_tokenize : 
   [' good ' '', '', '' '', '', '', '', '', '', '' '', '' '' '', '' '' '', '' '' '' ' 'York', '', 'please', '', 'buy', '', 'm', '', 'two', '', 'ki', '', 'them', '.', 'Thank you', '.']]  
  How do I change / change it Can you / tweak  word_tokenize ?

  Step 1: Break by string and spacing 
  Step 2:  word_tokenize

 < Code> using each word (split according to location in step 1)>> S = "Good muffins cost $ 3.88 in New York, please let me know \ n" & gt; & Gt; & Gt; Ll = [[word_tokenize (w), ''] s split () for w & gt; & Gt; & Gt; List ('*', '', '', '', 'cost' '' '' '$' ',' 3.88 ',' '' '' '' '' '', '', 'York', '.', '', 'Please', '', 'buy', '', 'me' '']

Search This Blog

City

python 2.7 - How can I use word_tokenize in nltk and keep the spaces? -

Comments

Post a Comment

Popular posts from this blog

Editing Python Class in Shell and SQLAlchemy -

uislider - In a MATLAB GUI, how does one implement a continuously varying slider from a GUIDE created .m file? -

python - Django Custom Admin Block For center of admin -