1

I'm trying to lemmatize tokenized column comments_tokenized

enter image description here

I do:

import nltk
from nltk.stem import WordNetLemmatizer 

# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in df1["comments_tokenized"]]

df1['comments_lemmatized'] = df1["comments_tokenized"].apply(lemmatize_text)

but have

TypeError: unhashable type: 'list'

What can I do to lemmatize a column with bag of words?

And also how to avoid the problem with tokenization that divides [don't] to [do,n't]?

1 Answer 1

2

You were close on your function! since you are using apply on the series, you don't need to specifically call out the column in the function. you also are not using the input text at all in your function. So change

def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in df1["comments_tokenized"]]

to

def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(w) for w in text]  ##Notice the use of text.

An example:

df = pd.DataFrame({'A':[["cats","cacti","geese","rocks"]]})
                             A
0  [cats, cacti, geese, rocks]

def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(w) for w in text]

df['A'].apply(lemmatize_text)

0    [cat, cactus, goose, rock]
Sign up to request clarification or add additional context in comments.

1 Comment

How do I do that for df = pd.DataFrame({'A':[["cats are playing","He is running","The were fishing"]]})?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.