437 questions
0
votes
0
answers
38
views
simpler gmail Filter syntax for "word family" [verif +(y/ied/ification] + similar loanwords [term +(s/es/a)]?
Is there simpler filter that I can use for below cases?
Google has a very smart AI gemini, I hope there is a shortcut for this as I am receiving bilingual emails and loan words in Malay/Indonesia are ...
2
votes
1
answer
54
views
Lemma of puncutation in spacy
I'm using spacy for some downstream tasks, mainly noun phrase extraction. My texts contain a lot of parentheses, and while applying the lemma, I noticed all the punctuation that doesn't end sentences ...
-1
votes
2
answers
143
views
With spaCy, how can I get all lemmas from a string?
I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...
0
votes
1
answer
83
views
How to lemmatize text column in pandas dataframes using stanza?
I read csv file into pandas dataframe.
my text column is df['story'].
how do I lemmatize this colummn ?
should I tokenize before?
1
vote
1
answer
65
views
How to speed up the lemmatization of a Serie in a Python Dataframe
I got this line that lemmatize a serie of a pandas DataFrame in python.
res = serie.parallel_apply(lambda x :' '.join([d.lemma_ for d in self.nlp_spacy(x)]))
I got 200 000 rows of datas in this ...
-1
votes
2
answers
153
views
Comparison between stemmiation and lemmatization
Based on several research , i found following important compartive analysis :
if we look on texts, most probably lemmatization should return more correct output right? not only correct, but also ...
0
votes
1
answer
284
views
LookupError in NLTK for WordNet Lemmatizer Despite Successful Download of Resources
I am working on a text processing task in a Kaggle notebook and facing a LookupErrorwhen using NLTK's WordNetLemmatizer. Despite my efforts to download the required NLTK resources, the error continues ...
0
votes
1
answer
562
views
How to make stanza lemmatizer to return just the lemma instead of a dictionary?
I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...
0
votes
1
answer
221
views
Switch spacy lemmatizer's mode for french language
With Spacy, I want to change the lemmatizer of the French model ('rule-based' by default) to 'lookup'.
I'm using spacy 3.6.1, fr_core_news_lg-3.6.0 model and spacy-lookups-data 1.0.5
This seemed to be ...
1
vote
2
answers
71
views
How to avoid lemmatizing already lemmatized sentences of a row in pandas dataframe for speedup
Given:
A simple and small pandas dataframe as follows:
df = pd.DataFrame(
{
"user_ip": ["u7", "u3", "u1", "u9", "u4","...
0
votes
1
answer
85
views
How to solve an Attribute error when lemmatizing a list.lower()
word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns]
AttributeError: 'list' object has no attribute 'lower'
I can't figure out how to fix the error. Its saying that my list ...
0
votes
0
answers
282
views
How to speed up Stanza lemmatizer by excluding reduntant words
Given:
I have a small sample document with limited number of words as follows:
d ='''
I go to school by the school bus everyday with all of my best friends.
There are several students who also take ...
1
vote
0
answers
136
views
Library to lemmatize German compound verbs
I'm trying to lemmatize a German text using HannoverTagger (same results were achieved with SpaCy).
from HanTa import HanoverTagger as ht
tagger = ht.HanoverTagger('morphmodel_ger.pgz')
print(tagger....
2
votes
0
answers
106
views
Is there a method in spacy to "normalize" feminine nouns to masculine?
The title pretty says it all...
We are using spacy in spanish to preprocess a series of texts and we need consistent nouns across our documents.
For instance, we would need to normalize "clienta&...
0
votes
0
answers
70
views
spaCY lemmatizer different results on repeated words
The lemmatization of the following sentence
"Not finished! Not finished! A gem for our adopted daughter, Kiri, - - born of Grace's avatar, - - and whose conception was a complete mystery."*
...
1
vote
0
answers
44
views
re.findall does not find some dots
I have a file with preprocessed german text all 39lines have a dot at the end
In order to get rid of some nulls in text I use this code:
text_with_nulls = open('lemmatizeAFD', 'r')
text_without_nulls =...
0
votes
1
answer
55
views
Failed lemmatization
I'm trying to lemmatize german texts which are in a dataframe.
I use german library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing
My code:
...
1
vote
1
answer
63
views
lemmatization or normalization using a dictionary and list of variations
I have a pandas data frame with string column which is a transaction string column. I am trying to some manual lemmatization. I have manually created a dictionary which has the main word as the key ...
0
votes
1
answer
129
views
How to get pos-tag lemmatiser to iterate through df
I want to use POS-labelling and lemmatisation on my text data. I've found this example code from kaggle. This applies it to a sentence, but I want to modify this code in order to apply it to a column ...
1
vote
1
answer
195
views
Getting the adjectives and plurals of lemma's in various languages
Could anyone point me to a solution/lib to instead of lemmatise, to do inflection(?). And for multiple languages (English, Dutch, German and French).
Or to give an example. I have the lemma 'science' ...
0
votes
2
answers
399
views
Stemming texts separates words into letters
I am trying to process my text using tokenization, stemming, normalization and stop-word/punctuation removal, etc.
When I use snowball stemming technique, my text gets separated into letters with ...
1
vote
0
answers
167
views
Base word for same word is different using spacy
I am trying to extract base word for an entire text however it reacts differently for a same word coming at different locations. Below is the code for reference:
from nltk.stem import PorterStemmer
...
0
votes
0
answers
61
views
Getting blank lemmatization using xx_use_lg model with Spacy Python
I got a list of words using for element in text: print(element).
But when I try to use lemma_ like this (for element in text: print(element.lemma_)) it returns me blank list.
I can't understand why ...
1
vote
1
answer
1k
views
Lemmatization taking forever with Spacy
I'm trying to lemmatize chat registers in a dataframe using spacy. My code is:
nlp = spacy.load("es_core_news_sm")
df["text_lemma"] = df["text"].apply(lambda row: " &...
1
vote
0
answers
118
views
How to lemmatize pos tagged column in dataframe
I have a Dataframe of some tweets about the Russia-Ukraine conflict and I have pos_tagged the tweets after cleaning and want to lemmatize postagged column. My code returns only the first pos_tagged ...
1
vote
0
answers
106
views
NLP Lemmatization with Wordnet: Error [E1041] Expected a string, Doc, or bytes as input, but got: <class 'NoneType'>
I have defined the 'pre_processing' function as follows:
import nltk
import unicodedata
from nltk.tokenize import TweetTokenizer
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
...
0
votes
1
answer
328
views
How to start a python lemmatizer for Malay language with a dataset?
I have a dataset and I want to perform Malay language lemmatizer. Can guide me how to do this project?
What is the code to create a lemmatizer for Malay language. It can aslo allow user to enter words ...
0
votes
1
answer
348
views
Lemmatize a list of tokens in dataframe
I am trying to lemmatize a column that contains a list of tokens in each cell. I am using the below code for this. Can anyone suggest what changes should be made to get the expected output?
from nltk....
0
votes
2
answers
129
views
Python: Can I create a dummy based on search conditions in one column with text series?
I was wondering how I could create a dummy variable for the following condition: column 'lemmatised' contains at least two words from 'innovation_words'. Innovation_words is a list I defined myself:
...
2
votes
1
answer
254
views
NLP - Worse result when adding stemming or lemmitization for Sentiment Analysis
I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage
i.e. without any pre-...
2
votes
1
answer
125
views
lemmatizing verbs in SVOs
I've looked at the suggested similar questions, and I think that this question has enough specificity that it warrants being asked, but I am completely okay if someone can point to an already answered ...
0
votes
1
answer
576
views
Why does the nltk lemmatizer not work for every word in Python?
import ntlk
lemmatizer = ntlk.WordNetLemmatizer()
print(lemmatizer.lemmatize("goes"))
print(lemmatizer.lemmatize("transforming"))
The first example will with "goes" do ...
0
votes
1
answer
287
views
do not run lemmatize of nltk package
Hello I Have a code for lemmatization a string in python . code is below
from nltk.stem.wordnet import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print("better :", lemmatizer....
0
votes
0
answers
116
views
Lemmatization using Spacy gives wrong output
I have a csv file with 70k german tweets and want to lemmatize the text, but it reduces every tweets to only one word and the output looks completely wrong. I did several steps of data cleaning before,...
0
votes
1
answer
140
views
What does count() method of nltk.corpus.reader.wordnet.Lemma return?
Here is my code:
from nltk.corpus import wordnet as wn
eat = wn.lemma('eat.v.03.eat')
print(eat.count())
print(help(eat.count))
The output should be like this:
4
Help on method count in module nltk....
1
vote
0
answers
28
views
Python spacy lemmatization issue
While performing lemmatization using spacy i want all proper nouns to be considered as nouns. How do I customise this?
Tried :[(term.lemma_ for term in nlp("awards mvp awards") ]
Expected ...
2
votes
1
answer
1k
views
Lemmatizer/PoS-tagger for italian in Python
I'm searching for a Lemmatizer/PoS-tagger for the Italian language, that works on Python. I tried with Spacy, it works but it's not very precise, expecially for verbs it often returns the wrong lemma. ...
2
votes
1
answer
459
views
nltk.lemmatizer doesn't work for even a simple input text
Sorry guys I'm new to NLP and I'm trying to apply NLTK Lemmatizer to the whole input text, however it seems not to work for even a simple sentence.
from nltk.corpus import stopwords
from nltk.tokenize ...
2
votes
1
answer
229
views
Special Case Lemmatization ValueError while Using spacy for NLP
Special Case Lemmatization ValueError while Using spacy for NLP
Problem (What I think is happening)
While exploring special case lemmatization, I ran into a ValueError (provided below). I actually ...
0
votes
1
answer
1k
views
Trouble with applying a UDF on a column in Pyspark Dataframe
My goal is to clean the Data in a column in a Pyspark DF. I have written a function for cleaning .
def preprocess(text):
text = text.lower()
text=text.strip()
text=re.compile('<.*?&...
1
vote
1
answer
132
views
Is there an easier way to do custom lemmatization?
I have a dataframe of two columns. First one["lemm"] of what words to change if they occur. Second one["word"], what to change them to. I'm new to this so I spent a lot of time ...
0
votes
1
answer
92
views
While lemmatizing the corpus and splitting and joining it , it shows Word List Corpus Reader not callable error
My code where I get error goes as follows :
import re
corpus = []
for i in range(len(sentences)):
review = re.sub('[^a-zA-z]', ' ',sentences[i])
review = review.lower()
review = review.split()...
0
votes
0
answers
512
views
Is it always necessary to either stem/lemmatize words when working with TF-IDF?
I'm using TF-IDF along with cosine similarity in order to compute document similarity. I was wondering if it's always necessary to stem/lemmatize the words in the document. Are there times where based ...
0
votes
1
answer
319
views
ModuleNotFoundError in spacy version 3.3.1 tried previous mentioned solution not working
import spacy
from spacy.lemmatizer import Lemmatizer
from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES
lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
...
-1
votes
1
answer
111
views
How to create a dictionnary whose key:value pairs are the values of two different lists of dictionnaries?
I have 2 lists of dictionnaries that result from a pymongo extraction.
A list of dicts containing id's (string) and lemmas (strings):
lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': ...
1
vote
1
answer
891
views
Faster Python Lemmatization
I have been testing different lemmatization methods since it will be used on a very large corpus. Below are my methods and results. Does anyone have any tips to speed any of these methods up? Spacy ...
0
votes
0
answers
99
views
Lemmatize a tm corpus
I am trying to lemmatize a VCorpus using a custom dictionary, in particular this one I found on github. I have not been able to find a function that correctly performs this. I have tried the following:...
0
votes
1
answer
193
views
lemmatizing a verb list in a data frame in Python
I want to ask a seemingly simple question to Python wizs (I am a total newbie so have no idea how simple/complex this question is)!
I have a verb list in a dataframe looking as below:
id verb
15 ...
0
votes
2
answers
411
views
Does WordNet have Levels?
I'm reading a paper that says to use WordNet level 3 because if he used level 5 would be lost a lot, but I can't see how to use these supposed levels. I don't have his codes, so I can't share them, ...
3
votes
1
answer
541
views
Neither stemmer nor lemmatizer seem to work very well, what should I do?
I am new to text analysis and am trying to create a bag of words model(using sklearn's CountVectorizer method). I have a data frame with a column of text with words like 'acid', 'acidic', 'acidity', '...