I'm trying to lemmatize tokenized column comments_tokenized
I do:
import nltk
from nltk.stem import WordNetLemmatizer
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
def lemmatize_text(text):
return [lemmatizer.lemmatize(w) for w in df1["comments_tokenized"]]
df1['comments_lemmatized'] = df1["comments_tokenized"].apply(lemmatize_text)
but have
TypeError: unhashable type: 'list'
What can I do to lemmatize a column with bag of words?
And also how to avoid the problem with tokenization that divides [don't] to [do,n't]?
