Newest 'lemmatization' Questions

0 votes

0 answers

38 views

simpler gmail Filter syntax for "word family" [verif +(y/ied/ification] + similar loanwords [term +(s/es/a)]?

Is there simpler filter that I can use for below cases? Google has a very smart AI gemini, I hope there is a shortcut for this as I am receiving bilingual emails and loan words in Malay/Indonesia are ...

Quarky

13

asked Mar 31 at 12:10

2 votes

1 answer

54 views

Lemma of puncutation in spacy

I'm using spacy for some downstream tasks, mainly noun phrase extraction. My texts contain a lot of parentheses, and while applying the lemma, I noticed all the punctuation that doesn't end sentences ...

MERose

4,481

asked Jan 5 at 15:02

-1 votes

2 answers

143 views

With spaCy, how can I get all lemmas from a string?

I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...

Patrick

2,346

asked Oct 12, 2024 at 21:03

0 votes

1 answer

83 views

How to lemmatize text column in pandas dataframes using stanza?

I read csv file into pandas dataframe. my text column is df['story']. how do I lemmatize this colummn ? should I tokenize before?

rafine

469

asked May 16, 2024 at 12:34

1 vote

1 answer

65 views

How to speed up the lemmatization of a Serie in a Python Dataframe

I got this line that lemmatize a serie of a pandas DataFrame in python. res = serie.parallel_apply(lambda x :' '.join([d.lemma_ for d in self.nlp_spacy(x)])) I got 200 000 rows of datas in this ...

MrGran1

11

asked Apr 5, 2024 at 9:15

-1 votes

2 answers

153 views

Comparison between stemmiation and lemmatization

Based on several research , i found following important compartive analysis : if we look on texts, most probably lemmatization should return more correct output right? not only correct, but also ...

user23666587

asked Mar 24, 2024 at 19:41

0 votes

1 answer

284 views

LookupError in NLTK for WordNet Lemmatizer Despite Successful Download of Resources

I am working on a text processing task in a Kaggle notebook and facing a LookupErrorwhen using NLTK's WordNetLemmatizer. Despite my efforts to download the required NLTK resources, the error continues ...

Amirreza Jalili

1

asked Dec 19, 2023 at 14:29

0 votes

1 answer

562 views

How to make stanza lemmatizer to return just the lemma instead of a dictionary?

I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...

trashparticle

13

asked Dec 5, 2023 at 23:30

0 votes

1 answer

221 views

Switch spacy lemmatizer's mode for french language

With Spacy, I want to change the lemmatizer of the French model ('rule-based' by default) to 'lookup'. I'm using spacy 3.6.1, fr_core_news_lg-3.6.0 model and spacy-lookups-data 1.0.5 This seemed to be ...

JulienBr

70

asked Oct 26, 2023 at 14:02

1 vote

2 answers

71 views

How to avoid lemmatizing already lemmatized sentences of a row in pandas dataframe for speedup

Given: A simple and small pandas dataframe as follows: df = pd.DataFrame( { "user_ip": ["u7", "u3", "u1", "u9", "u4","...

farid

1,631

asked Aug 1, 2023 at 18:00

0 votes

1 answer

85 views

How to solve an Attribute error when lemmatizing a list.lower()

word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns] AttributeError: 'list' object has no attribute 'lower' I can't figure out how to fix the error. Its saying that my list ...

L bozo

3

asked Jul 23, 2023 at 16:59

0 votes

0 answers

282 views

How to speed up Stanza lemmatizer by excluding reduntant words

Given: I have a small sample document with limited number of words as follows: d =''' I go to school by the school bus everyday with all of my best friends. There are several students who also take ...

farid

1,631

asked Jul 14, 2023 at 11:31

1 vote

0 answers

136 views

Library to lemmatize German compound verbs

I'm trying to lemmatize a German text using HannoverTagger (same results were achieved with SpaCy). from HanTa import HanoverTagger as ht tagger = ht.HanoverTagger('morphmodel_ger.pgz') print(tagger....

Stanislaw

41

asked Jun 29, 2023 at 20:58

2 votes

0 answers

106 views

Is there a method in spacy to "normalize" feminine nouns to masculine?

The title pretty says it all... We are using spacy in spanish to preprocess a series of texts and we need consistent nouns across our documents. For instance, we would need to normalize "clienta&...

Marina Boyero

21

asked May 12, 2023 at 11:58

0 votes

0 answers

70 views

spaCY lemmatizer different results on repeated words

The lemmatization of the following sentence "Not finished! Not finished! A gem for our adopted daughter, Kiri, - - born of Grace's avatar, - - and whose conception was a complete mystery."* ...

Denys Murakhovskyi

21

asked Apr 26, 2023 at 13:31

1 vote

0 answers

44 views

re.findall does not find some dots

I have a file with preprocessed german text all 39lines have a dot at the end In order to get rid of some nulls in text I use this code: text_with_nulls = open('lemmatizeAFD', 'r') text_without_nulls =...

Mikhail Rotar

19

asked Apr 24, 2023 at 21:29

0 votes

1 answer

55 views

Failed lemmatization

I'm trying to lemmatize german texts which are in a dataframe. I use german library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing My code: ...

Mikhail Rotar

19

asked Apr 24, 2023 at 12:03

1 vote

1 answer

63 views

lemmatization or normalization using a dictionary and list of variations

I have a pandas data frame with string column which is a transaction string column. I am trying to some manual lemmatization. I have manually created a dictionary which has the main word as the key ...

Zenvega

2,064

asked Apr 19, 2023 at 1:47

0 votes

1 answer

129 views

How to get pos-tag lemmatiser to iterate through df

I want to use POS-labelling and lemmatisation on my text data. I've found this example code from kaggle. This applies it to a sentence, but I want to modify this code in order to apply it to a column ...

wick

59

asked Mar 13, 2023 at 11:07

1 vote

1 answer

195 views

Getting the adjectives and plurals of lemma's in various languages

Could anyone point me to a solution/lib to instead of lemmatise, to do inflection(?). And for multiple languages (English, Dutch, German and French). Or to give an example. I have the lemma 'science' ...

dderom

11

asked Feb 26, 2023 at 9:22

0 votes

2 answers

399 views

Stemming texts separates words into letters

I am trying to process my text using tokenization, stemming, normalization and stop-word/punctuation removal, etc. When I use snowball stemming technique, my text gets separated into letters with ...

No_Name

165

asked Feb 24, 2023 at 22:05

1 vote

0 answers

167 views

Base word for same word is different using spacy

I am trying to extract base word for an entire text however it reacts differently for a same word coming at different locations. Below is the code for reference: from nltk.stem import PorterStemmer ...

Amar Pal Singh

31

asked Feb 14, 2023 at 15:57

0 votes

0 answers

61 views

Getting blank lemmatization using xx_use_lg model with Spacy Python

I got a list of words using for element in text: print(element). But when I try to use lemma_ like this (for element in text: print(element.lemma_)) it returns me blank list. I can't understand why ...

Adrien Villemin

23

asked Jan 31, 2023 at 9:37

1 vote

1 answer

1k views

Lemmatization taking forever with Spacy

I'm trying to lemmatize chat registers in a dataframe using spacy. My code is: nlp = spacy.load("es_core_news_sm") df["text_lemma"] = df["text"].apply(lambda row: " &...

Laura R

13

asked Jan 23, 2023 at 19:26

1 vote

0 answers

118 views

How to lemmatize pos tagged column in dataframe

I have a Dataframe of some tweets about the Russia-Ukraine conflict and I have pos_tagged the tweets after cleaning and want to lemmatize postagged column. My code returns only the first pos_tagged ...

susne

23

asked Jan 17, 2023 at 20:57

1 vote

0 answers

106 views

NLP Lemmatization with Wordnet: Error [E1041] Expected a string, Doc, or bytes as input, but got: <class 'NoneType'>

I have defined the 'pre_processing' function as follows: import nltk import unicodedata from nltk.tokenize import TweetTokenizer from nltk.corpus import wordnet from nltk.stem import WordNetLemmatizer ...

salkyna

13

asked Jan 16, 2023 at 12:24

0 votes

1 answer

328 views

How to start a python lemmatizer for Malay language with a dataset?

I have a dataset and I want to perform Malay language lemmatizer. Can guide me how to do this project? What is the code to create a lemmatizer for Malay language. It can aslo allow user to enter words ...

angel

1

asked Dec 23, 2022 at 14:21

0 votes

1 answer

348 views

Lemmatize a list of tokens in dataframe

I am trying to lemmatize a column that contains a list of tokens in each cell. I am using the below code for this. Can anyone suggest what changes should be made to get the expected output? from nltk....

Rohan

47

asked Dec 19, 2022 at 17:13

0 votes

2 answers

129 views

Python: Can I create a dummy based on search conditions in one column with text series?

I was wondering how I could create a dummy variable for the following condition: column 'lemmatised' contains at least two words from 'innovation_words'. Innovation_words is a list I defined myself: ...

Hans.nl

65

asked Dec 17, 2022 at 16:11

2 votes

1 answer

254 views

NLP - Worse result when adding stemming or lemmitization for Sentiment Analysis

I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage i.e. without any pre-...

AdamG

21

asked Dec 13, 2022 at 0:15

2 votes

1 answer

125 views

lemmatizing verbs in SVOs

I've looked at the suggested similar questions, and I think that this question has enough specificity that it warrants being asked, but I am completely okay if someone can point to an already answered ...

John Laudun

407

asked Dec 9, 2022 at 3:30

0 votes

1 answer

576 views

Why does the nltk lemmatizer not work for every word in Python?

import ntlk lemmatizer = ntlk.WordNetLemmatizer() print(lemmatizer.lemmatize("goes")) print(lemmatizer.lemmatize("transforming")) The first example will with "goes" do ...

TheRi

67

asked Dec 6, 2022 at 16:30

0 votes

1 answer

287 views

do not run lemmatize of nltk package

Hello I Have a code for lemmatization a string in python . code is below from nltk.stem.wordnet import WordNetLemmatizer lemmatizer = WordNetLemmatizer() print("better :", lemmatizer....

Mostafa Heydar

1

asked Nov 29, 2022 at 11:55

0 votes

0 answers

116 views

Lemmatization using Spacy gives wrong output

I have a csv file with 70k german tweets and want to lemmatize the text, but it reduces every tweets to only one word and the output looks completely wrong. I did several steps of data cleaning before,...

socialscientist90

19

asked Nov 25, 2022 at 16:44

0 votes

1 answer

140 views

What does count() method of nltk.corpus.reader.wordnet.Lemma return?

Here is my code: from nltk.corpus import wordnet as wn eat = wn.lemma('eat.v.03.eat') print(eat.count()) print(help(eat.count)) The output should be like this: 4 Help on method count in module nltk....

jasonzhou

1

asked Oct 30, 2022 at 3:52

1 vote

0 answers

28 views

Python spacy lemmatization issue

While performing lemmatization using spacy i want all proper nouns to be considered as nouns. How do I customise this? Tried :[(term.lemma_ for term in nlp("awards mvp awards") ] Expected ...

Abi

11

asked Oct 28, 2022 at 14:17

2 votes

1 answer

1k views

Lemmatizer/PoS-tagger for italian in Python

I'm searching for a Lemmatizer/PoS-tagger for the Italian language, that works on Python. I tried with Spacy, it works but it's not very precise, expecially for verbs it often returns the wrong lemma. ...

sunhearth

93

asked Oct 18, 2022 at 18:42

2 votes

1 answer

459 views

nltk.lemmatizer doesn't work for even a simple input text

Sorry guys I'm new to NLP and I'm trying to apply NLTK Lemmatizer to the whole input text, however it seems not to work for even a simple sentence. from nltk.corpus import stopwords from nltk.tokenize ...

jsacharz

21

asked Sep 30, 2022 at 1:55

2 votes

1 answer

229 views

Special Case Lemmatization ValueError while Using spacy for NLP

Special Case Lemmatization ValueError while Using spacy for NLP Problem (What I think is happening) While exploring special case lemmatization, I ran into a ValueError (provided below). I actually ...

jack

21

asked Sep 27, 2022 at 3:44

0 votes

1 answer

1k views

Trouble with applying a UDF on a column in Pyspark Dataframe

My goal is to clean the Data in a column in a Pyspark DF. I have written a function for cleaning . def preprocess(text): text = text.lower() text=text.strip() text=re.compile('<.*?&...

user3234112

113

asked Aug 3, 2022 at 22:02

1 vote

1 answer

132 views

Is there an easier way to do custom lemmatization?

I have a dataframe of two columns. First one["lemm"] of what words to change if they occur. Second one["word"], what to change them to. I'm new to this so I spent a lot of time ...

Masood Khan

55

asked Jul 28, 2022 at 18:23

0 votes

1 answer

92 views

While lemmatizing the corpus and splitting and joining it , it shows Word List Corpus Reader not callable error

My code where I get error goes as follows : import re corpus = [] for i in range(len(sentences)): review = re.sub('[^a-zA-z]', ' ',sentences[i]) review = review.lower() review = review.split()...

Prince Thakkar

27

asked Jul 28, 2022 at 5:01

0 votes

0 answers

512 views

Is it always necessary to either stem/lemmatize words when working with TF-IDF?

I'm using TF-IDF along with cosine similarity in order to compute document similarity. I was wondering if it's always necessary to stem/lemmatize the words in the document. Are there times where based ...

dfish

3

asked Jul 16, 2022 at 11:43

0 votes

1 answer

319 views

ModuleNotFoundError in spacy version 3.3.1 tried previous mentioned solution not working

import spacy from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) ...

Aman Choudhary

23

asked Jul 11, 2022 at 3:41

-1 votes

1 answer

111 views

How to create a dictionnary whose key:value pairs are the values of two different lists of dictionnaries?

I have 2 lists of dictionnaries that result from a pymongo extraction. A list of dicts containing id's (string) and lemmas (strings): lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': ...

Perrupi

79

asked Jul 7, 2022 at 10:42

1 vote

1 answer

891 views

Faster Python Lemmatization

I have been testing different lemmatization methods since it will be used on a very large corpus. Below are my methods and results. Does anyone have any tips to speed any of these methods up? Spacy ...

Mike Zoucha

83

asked Jun 21, 2022 at 12:55

0 votes

0 answers

99 views

Lemmatize a tm corpus

I am trying to lemmatize a VCorpus using a custom dictionary, in particular this one I found on github. I have not been able to find a function that correctly performs this. I have tried the following:...

aooo

53

asked Jun 6, 2022 at 17:02

0 votes

1 answer

193 views

lemmatizing a verb list in a data frame in Python

I want to ask a seemingly simple question to Python wizs (I am a total newbie so have no idea how simple/complex this question is)! I have a verb list in a dataframe looking as below: id verb 15 ...

Sangeun

45

asked May 26, 2022 at 16:09

0 votes

2 answers

411 views

Does WordNet have Levels?

I'm reading a paper that says to use WordNet level 3 because if he used level 5 would be lost a lot, but I can't see how to use these supposed levels. I don't have his codes, so I can't share them, ...

JohnV

23

asked May 23, 2022 at 15:00

3 votes

1 answer

541 views

Neither stemmer nor lemmatizer seem to work very well, what should I do?

I am new to text analysis and am trying to create a bag of words model(using sklearn's CountVectorizer method). I have a data frame with a column of text with words like 'acid', 'acidic', 'acidity', '...

Rebecca James

395

asked May 16, 2022 at 19:59

Collectives™ on Stack Overflow