Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
38 views

Is there simpler filter that I can use for below cases? Google has a very smart AI gemini, I hope there is a shortcut for this as I am receiving bilingual emails and loan words in Malay/Indonesia are ...
Quarky's user avatar
  • 13
2 votes
1 answer
54 views

I'm using spacy for some downstream tasks, mainly noun phrase extraction. My texts contain a lot of parentheses, and while applying the lemma, I noticed all the punctuation that doesn't end sentences ...
MERose's user avatar
  • 4,481
-1 votes
2 answers
143 views

I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...
Patrick's user avatar
  • 2,346
0 votes
1 answer
83 views

I read csv file into pandas dataframe. my text column is df['story']. how do I lemmatize this colummn ? should I tokenize before?
rafine's user avatar
  • 469
1 vote
1 answer
65 views

I got this line that lemmatize a serie of a pandas DataFrame in python. res = serie.parallel_apply(lambda x :' '.join([d.lemma_ for d in self.nlp_spacy(x)])) I got 200 000 rows of datas in this ...
MrGran1's user avatar
  • 11
-1 votes
2 answers
153 views

Based on several research , i found following important compartive analysis : if we look on texts, most probably lemmatization should return more correct output right? not only correct, but also ...
user avatar
0 votes
1 answer
284 views

I am working on a text processing task in a Kaggle notebook and facing a LookupErrorwhen using NLTK's WordNetLemmatizer. Despite my efforts to download the required NLTK resources, the error continues ...
Amirreza Jalili's user avatar
0 votes
1 answer
562 views

I'm implementing stanza's lemmatizer because it works well with spanish texts but the lemmatizer retuns a whole dictionary with ID and other characteristics I don't care about for the time being. I ...
trashparticle's user avatar
0 votes
1 answer
221 views

With Spacy, I want to change the lemmatizer of the French model ('rule-based' by default) to 'lookup'. I'm using spacy 3.6.1, fr_core_news_lg-3.6.0 model and spacy-lookups-data 1.0.5 This seemed to be ...
JulienBr's user avatar
1 vote
2 answers
71 views

Given: A simple and small pandas dataframe as follows: df = pd.DataFrame( { "user_ip": ["u7", "u3", "u1", "u9", "u4","...
farid's user avatar
  • 1,631
0 votes
1 answer
85 views

word_patterns = [lemmatizer.lemmatize(word.lower()) for word in word_patterns] AttributeError: 'list' object has no attribute 'lower' I can't figure out how to fix the error. Its saying that my list ...
L bozo's user avatar
  • 3
0 votes
0 answers
282 views

Given: I have a small sample document with limited number of words as follows: d =''' I go to school by the school bus everyday with all of my best friends. There are several students who also take ...
farid's user avatar
  • 1,631
1 vote
0 answers
136 views

I'm trying to lemmatize a German text using HannoverTagger (same results were achieved with SpaCy). from HanTa import HanoverTagger as ht tagger = ht.HanoverTagger('morphmodel_ger.pgz') print(tagger....
Stanislaw's user avatar
2 votes
0 answers
106 views

The title pretty says it all... We are using spacy in spanish to preprocess a series of texts and we need consistent nouns across our documents. For instance, we would need to normalize "clienta&...
Marina Boyero's user avatar
0 votes
0 answers
70 views

The lemmatization of the following sentence "Not finished! Not finished! A gem for our adopted daughter, Kiri, - - born of Grace's avatar, - - and whose conception was a complete mystery."* ...
Denys Murakhovskyi's user avatar
1 vote
0 answers
44 views

I have a file with preprocessed german text all 39lines have a dot at the end In order to get rid of some nulls in text I use this code: text_with_nulls = open('lemmatizeAFD', 'r') text_without_nulls =...
Mikhail Rotar's user avatar
0 votes
1 answer
55 views

I'm trying to lemmatize german texts which are in a dataframe. I use german library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing My code: ...
Mikhail Rotar's user avatar
1 vote
1 answer
63 views

I have a pandas data frame with string column which is a transaction string column. I am trying to some manual lemmatization. I have manually created a dictionary which has the main word as the key ...
Zenvega's user avatar
  • 2,064
0 votes
1 answer
129 views

I want to use POS-labelling and lemmatisation on my text data. I've found this example code from kaggle. This applies it to a sentence, but I want to modify this code in order to apply it to a column ...
wick's user avatar
  • 59
1 vote
1 answer
195 views

Could anyone point me to a solution/lib to instead of lemmatise, to do inflection(?). And for multiple languages (English, Dutch, German and French). Or to give an example. I have the lemma 'science' ...
dderom's user avatar
  • 11
0 votes
2 answers
399 views

I am trying to process my text using tokenization, stemming, normalization and stop-word/punctuation removal, etc. When I use snowball stemming technique, my text gets separated into letters with ...
No_Name's user avatar
  • 165
1 vote
0 answers
167 views

I am trying to extract base word for an entire text however it reacts differently for a same word coming at different locations. Below is the code for reference: from nltk.stem import PorterStemmer ...
Amar Pal Singh's user avatar
0 votes
0 answers
61 views

I got a list of words using for element in text: print(element). But when I try to use lemma_ like this (for element in text: print(element.lemma_)) it returns me blank list. I can't understand why ...
Adrien Villemin's user avatar
1 vote
1 answer
1k views

I'm trying to lemmatize chat registers in a dataframe using spacy. My code is: nlp = spacy.load("es_core_news_sm") df["text_lemma"] = df["text"].apply(lambda row: " &...
Laura R's user avatar
  • 13
1 vote
0 answers
118 views

I have a Dataframe of some tweets about the Russia-Ukraine conflict and I have pos_tagged the tweets after cleaning and want to lemmatize postagged column. My code returns only the first pos_tagged ...
susne's user avatar
  • 23
1 vote
0 answers
106 views

I have defined the 'pre_processing' function as follows: import nltk import unicodedata from nltk.tokenize import TweetTokenizer from nltk.corpus import wordnet from nltk.stem import WordNetLemmatizer ...
salkyna's user avatar
  • 13
0 votes
1 answer
328 views

I have a dataset and I want to perform Malay language lemmatizer. Can guide me how to do this project? What is the code to create a lemmatizer for Malay language. It can aslo allow user to enter words ...
angel's user avatar
  • 1
0 votes
1 answer
348 views

I am trying to lemmatize a column that contains a list of tokens in each cell. I am using the below code for this. Can anyone suggest what changes should be made to get the expected output? from nltk....
Rohan's user avatar
  • 47
0 votes
2 answers
129 views

I was wondering how I could create a dummy variable for the following condition: column 'lemmatised' contains at least two words from 'innovation_words'. Innovation_words is a list I defined myself: ...
Hans.nl's user avatar
  • 65
2 votes
1 answer
254 views

I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage i.e. without any pre-...
AdamG's user avatar
  • 21
2 votes
1 answer
125 views

I've looked at the suggested similar questions, and I think that this question has enough specificity that it warrants being asked, but I am completely okay if someone can point to an already answered ...
John Laudun's user avatar
0 votes
1 answer
576 views

import ntlk lemmatizer = ntlk.WordNetLemmatizer() print(lemmatizer.lemmatize("goes")) print(lemmatizer.lemmatize("transforming")) The first example will with "goes" do ...
TheRi's user avatar
  • 67
0 votes
1 answer
287 views

Hello I Have a code for lemmatization a string in python . code is below from nltk.stem.wordnet import WordNetLemmatizer lemmatizer = WordNetLemmatizer() print("better :", lemmatizer....
Mostafa Heydar's user avatar
0 votes
0 answers
116 views

I have a csv file with 70k german tweets and want to lemmatize the text, but it reduces every tweets to only one word and the output looks completely wrong. I did several steps of data cleaning before,...
socialscientist90's user avatar
0 votes
1 answer
140 views

Here is my code: from nltk.corpus import wordnet as wn eat = wn.lemma('eat.v.03.eat') print(eat.count()) print(help(eat.count)) The output should be like this: 4 Help on method count in module nltk....
jasonzhou's user avatar
1 vote
0 answers
28 views

While performing lemmatization using spacy i want all proper nouns to be considered as nouns. How do I customise this? Tried :[(term.lemma_ for term in nlp("awards mvp awards") ] Expected ...
Abi's user avatar
  • 11
2 votes
1 answer
1k views

I'm searching for a Lemmatizer/PoS-tagger for the Italian language, that works on Python. I tried with Spacy, it works but it's not very precise, expecially for verbs it often returns the wrong lemma. ...
sunhearth's user avatar
2 votes
1 answer
459 views

Sorry guys I'm new to NLP and I'm trying to apply NLTK Lemmatizer to the whole input text, however it seems not to work for even a simple sentence. from nltk.corpus import stopwords from nltk.tokenize ...
jsacharz's user avatar
2 votes
1 answer
229 views

Special Case Lemmatization ValueError while Using spacy for NLP Problem (What I think is happening) While exploring special case lemmatization, I ran into a ValueError (provided below). I actually ...
jack's user avatar
  • 21
0 votes
1 answer
1k views

My goal is to clean the Data in a column in a Pyspark DF. I have written a function for cleaning . def preprocess(text): text = text.lower() text=text.strip() text=re.compile('<.*?&...
user3234112's user avatar
1 vote
1 answer
132 views

I have a dataframe of two columns. First one["lemm"] of what words to change if they occur. Second one["word"], what to change them to. I'm new to this so I spent a lot of time ...
Masood Khan's user avatar
0 votes
1 answer
92 views

My code where I get error goes as follows : import re corpus = [] for i in range(len(sentences)): review = re.sub('[^a-zA-z]', ' ',sentences[i]) review = review.lower() review = review.split()...
Prince Thakkar's user avatar
0 votes
0 answers
512 views

I'm using TF-IDF along with cosine similarity in order to compute document similarity. I was wondering if it's always necessary to stem/lemmatize the words in the document. Are there times where based ...
dfish's user avatar
  • 3
0 votes
1 answer
319 views

import spacy from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) ...
Aman Choudhary's user avatar
-1 votes
1 answer
111 views

I have 2 lists of dictionnaries that result from a pymongo extraction. A list of dicts containing id's (string) and lemmas (strings): lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': ...
Perrupi's user avatar
  • 79
1 vote
1 answer
891 views

I have been testing different lemmatization methods since it will be used on a very large corpus. Below are my methods and results. Does anyone have any tips to speed any of these methods up? Spacy ...
Mike Zoucha's user avatar
0 votes
0 answers
99 views

I am trying to lemmatize a VCorpus using a custom dictionary, in particular this one I found on github. I have not been able to find a function that correctly performs this. I have tried the following:...
aooo's user avatar
  • 53
0 votes
1 answer
193 views

I want to ask a seemingly simple question to Python wizs (I am a total newbie so have no idea how simple/complex this question is)! I have a verb list in a dataframe looking as below: id verb 15 ...
Sangeun's user avatar
  • 45
0 votes
2 answers
411 views

I'm reading a paper that says to use WordNet level 3 because if he used level 5 would be lost a lot, but I can't see how to use these supposed levels. I don't have his codes, so I can't share them, ...
JohnV's user avatar
  • 23
3 votes
1 answer
541 views

I am new to text analysis and am trying to create a bag of words model(using sklearn's CountVectorizer method). I have a data frame with a column of text with words like 'acid', 'acidic', 'acidity', '...
Rebecca James's user avatar

1
2 3 4 5
9