Lemmatize tokenised column in pandas

Question

I'm trying to lemmatize tokenized column comments_tokenized

I do:

import nltk
from nltk.stem import WordNetLemmatizer 

# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in df1["comments_tokenized"]]

df1['comments_lemmatized'] = df1["comments_tokenized"].apply(lemmatize_text)

but have

TypeError: unhashable type: 'list'

What can I do to lemmatize a column with bag of words?

And also how to avoid the problem with tokenization that divides [don't] to [do,n't]?

MattR · Accepted Answer · 2020-01-02 17:56:20Z

2

You were close on your function! since you are using apply on the series, you don't need to specifically call out the column in the function. you also are not using the input text at all in your function. So change

def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in df1["comments_tokenized"]]

to

def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(w) for w in text]  ##Notice the use of text.

An example:

df = pd.DataFrame({'A':[["cats","cacti","geese","rocks"]]})
                             A
0  [cats, cacti, geese, rocks]

def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(w) for w in text]

df['A'].apply(lemmatize_text)

0    [cat, cactus, goose, rock]

answered Jan 2, 2020 at 17:56

MattR

5,1949 gold badges44 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

asn Over a year ago

How do I do that for df = pd.DataFrame({'A':[["cats are playing","He is running","The were fishing"]]})?

Collectives™ on Stack Overflow

Lemmatize tokenised column in pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related