3,794 questions
Advice
0
votes
1
replies
38
views
Organisation/Person tagging using Spacy
We’re working on a problem where our master dataset contains names of organizations and individuals, but some entries are untagged. We only have the names (no additional details such as email or ...
1
vote
1
answer
65
views
Output of for loop filling down in dataframe instead of returning corresponding values for each row
I'm using SpaCy to process a series of sentences and return the five most common words in each sentence. My goal is to store the output of that frequency analysis (using Counter) in a column beside ...
0
votes
0
answers
51
views
Training data format for SpanCategorizer when using custom suggester function
I'm taking a stab at building my own claim extraction pipeline (first time spaCy user).
Upstream in my pipeline, I feed n amount of docs to NER in the en_core_web_sm pretrained model in order to ...
0
votes
0
answers
55
views
Training with spaCy from command line, don't know why gpu-id not recognized
I am having the hardest of times getting my training session to use my gpu 0 which by every measure is present and correctly setup with cuda 12.2.
When I try to do python -m spacy train base_config....
1
vote
1
answer
139
views
How to make Microsoft Presidio detect and mask Indian names and unusual text patterns in banking data?
I’m working on anonymizing PII in banking text using Microsoft Presidio
.
The built-in PERSON recognizer (which uses spaCy under the hood) works for some Western names and when the sentence is clear
...
2
votes
1
answer
77
views
How can I extract symptoms/diseases from a running transcription?
I'm working on a project where I'm attempting to extract medical symptoms from a running transcription. I'm using SocketIO to get mic audio and then using Whisper to transcribe the audio into text ...
1
vote
2
answers
90
views
how to efficiently use spacy for pos tagging and ner
I am having 200 documents and I want to do NER and pos_tagging. However I find spacy to be too slow(I am running this code in google colab):
for doc in nlp.pipe(dataset["text"], batch_size=...
0
votes
0
answers
62
views
spaCy spancat won’t learn (zero F-score) while NER on same data scores 0.40 — Prodigy-generated KPI/target corpus
I am traing to train a spaCy v 3.8.7 spancat model on ~100 sustainability reports (annotated with Prodigy) to extract KPIs and targets.
An NER pipeline trained on the same data reaches F≈0.40, but ...
0
votes
1
answer
175
views
Unable to install spacy on MacOS 15.5 (M2) with Python 3.13.3 [duplicate]
Having created a new venv I am attempting to install spacy strictly in accordance with the documentation
Specifically:
pip install -U pip setuptools wheel
pip install -U 'spacy[apple]'
This fails (...
-2
votes
2
answers
512
views
pip install spacy errors with Python 3.13
I'm new to Python and I was given this code by my professor which includes "import spacy" and when I run the code I get the line: ModuleNotFoundError: No module named 'spacy'
That's where I ...
0
votes
0
answers
23
views
spaCy DependencyMatcher: One head for multiple children
How can I extract a single noun that is the head of multiple children?
I'm facing an issue in dependency matching in spaCy. I want to extract the nouns describing the name entities (identified by ...
4
votes
0
answers
62
views
Can older spaCy models be ported to future spaCy versions?
The latest spaCy versions have better performance and compatibility for GPU acceleration on Apple devices, but I have an existing project that depends on spaCy 3.1.4 and some of the specific behavior ...
0
votes
0
answers
32
views
Retrieving spaCy transformer tokenization ids
While using spacy transformer pipeline en_core_web_trf. How to retrieve the transformer tokenization (often roberta-base), it can be the tokenizer ids, tokenizer strings, or both (preferably).
Actual ...
0
votes
1
answer
313
views
Accessing Docling features from within spaCy Layout in Python
Currently I'm using spacy-layout as part of a pipeline to OCR documents and analyse documents. However, I also need to access other features of Docling such as counting the number of images in each ...
0
votes
1
answer
163
views
Problems with installing spacy on windows laptop
Hi Im trying to install Spacy on my win 11 laptop. I have python (3+) and pip (latest) already installed. However when I run the install command as indicated on the website -
pip install -U spacy
the ...
1
vote
1
answer
207
views
pyproject.toml related error while installing spacy library
I get the following error while installing the spacy library in Python 3.13.0. The pip version is 25.0.1. Can someone help? Thank you.
(I made sure to install numpy, scipy, preshed,Pyrebase4 based on ...
0
votes
1
answer
476
views
Why does Presidio with spacy nlp engine not recognize organizations and PESEL while spaCy does?
I'm using spaCy with the pl_core_news_lg model to extract named entities from Polish text. It correctly detects both organizations (ORG) and people's names (PER):
import spacy
nlp = spacy.load("...
0
votes
1
answer
70
views
Converting data into spacy format "convert_to_spacy_format" in Name entity recognition Model
Dataset structureCan somebody help me with the NER model in converting the data into spacy format.
The dataset format is shown in the screenshot here (https://www.kaggle.com/datasets/naseralqaydeh/...
2
votes
0
answers
76
views
How to normalize ingredient names in a recipe dataset and handle NOUN + NOUN cases using spaCy in python?
I'm working on normalizing ingredient names from a recipe dataset using Python and spaCy. My goal is to extract only the relevant ingredients and ignore measurement units, fractions, and other ...
0
votes
2
answers
282
views
Pip python Cannot install module spaCy
I wanted to use spacy to work on a project, but it cannot be installed using pip and is showing the following error message in command prompt
pip install spacy
Collecting spacy
Using cached spacy-3....
0
votes
1
answer
69
views
Spacy rules matching entities before text
I'm trying to write a spacy parser to extract the names and terms of a contract.
To do that, I've written a rule to extract the sellers and buyers, except it's extracting multiple times over a simple ...
4
votes
2
answers
377
views
Presidio with Langchain Experimental does not detect Polish names
I am using presidio/langchain_experimental to anonymize text in Polish, but it does not detect names (e.g., "Jan Kowalski"). Here is my code:
from presidio_anonymizer import ...
0
votes
1
answer
115
views
Download data models while installing my python library
Sometimes, a Python library depends on additional data, such as ML models. This could be a model from transformers, spacy, nltkand so on. Typically there is a command to download such a model:
python -...
0
votes
0
answers
50
views
Relation Extraction Model returns only one entity instead of entity pairs
I'm working on a relation extraction model task using a transformer-based model. the `pipeline is expected to extract entity pairs along with their labelled relation labels. When I run the evaluation ...
0
votes
1
answer
42
views
What version of Rasa is compatible with what version of spaCy?
I would like to know which version of Rasa is compatible with which version of spaCy.
I tried to create a bot with Rasa==3.5.10, Spacy==3.2.4 but couldn't.
I tried to use another version of spacy and ...
2
votes
1
answer
54
views
Lemma of puncutation in spacy
I'm using spacy for some downstream tasks, mainly noun phrase extraction. My texts contain a lot of parentheses, and while applying the lemma, I noticed all the punctuation that doesn't end sentences ...
3
votes
1
answer
187
views
Attaching custom KB to Spacy "entity_linker" pipe makes NER calls very poor
I want to run an entity linking job using a custom Knowledgebase alone, and not use the second step ML re-ranker that requires a training dataset / Spacy corpus. I want the NEL pipeline to only assign ...
1
vote
1
answer
76
views
finding similarity of keywords between many product reviews to detect duplicates
I have a series of product reviews from multiple websites and am trying to identify reviews that are potentially duplicates (ie very similar in the words used). I know there's a lot of room for ...
2
votes
1
answer
74
views
How to correctly identify entity types for tokens using spaCy using python?
I'm using spaCy to extract and identify entity types (like ORG, GPE, DATE, etc.) from a text description. However, I am noticing some incorrect results, and I'm unsure how to fix this.
Here is the ...
2
votes
4
answers
2k
views
How to install spacy?
I am using trying to install spacy library using 'pip install -U spacy' in the command prompt (run as admin) in Windows-11 O.S., but it shows some error I don't understand. I am using Python 3.13.0, ...
0
votes
0
answers
255
views
Dependency issue in virtual environment
I tried installing spacy but got the following error message:
blis 1.0.1 has requirement numpy<3.0.0,>=2.0.0, but you have numpy 1.23.5.
thinc 8.3.2 has requirement numpy<2.1.0,>=2.0.0; ...
1
vote
0
answers
86
views
Installing SpaCy in Fedora 41
I am getting into troubles as I try to install spaCy in a Fedora 41, AMD Ryzen machine. I got this:
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-...
1
vote
1
answer
203
views
How to extract specific entities from unstructured text
Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library?
For example given a ...
1
vote
1
answer
82
views
Trying to create chatbot on M1, vscode. Using chatterbot but getting errors with spacy
Hi I am trying to create a chatbot using chatterbot, any ideas on what I should do regarding the error?
my code:
///this is to just train the bot///
from chatterbot import ChatBot
from chatterbot....
2
votes
1
answer
62
views
Handling Multiple Entity Candidates in Short Texts for Entity Linking with SciSpacy
I am working on linking short texts to entities in a biomedical knowledge graph (UMLS CUIs) using SciSpacy for a research project. The goal is to analyze the relationship between the linked entity and ...
3
votes
2
answers
141
views
How can I share a complex spaCy NLP model across multiple Python processes to minimize memory usage?
I'm working on a multiprocessing python application where multiple processes need access to a large, pre-loaded spaCy NLP model (e.g., en_core_web_lg). Since the model is memory-intensive, I want to ...
4
votes
1
answer
259
views
How to save and load spacy encodings in a Polars DataFrame
I want to use Spacy to generate embeddings of text stored in a polars DataFrame and store the results in the same DataFrame. Next, I want to save this DataFrame to the disk and be able to load again ...
0
votes
1
answer
649
views
Failed to satisfy constraint: Member must satisfy regular expression pattern
I'm trying to follow a simple example from spacy universe layers page, but this is failing for me:
Code Implementation:
# template.yaml file
AWSTemplateFormatVersion: "2010-09-09"
Transform:...
-1
votes
2
answers
143
views
With spaCy, how can I get all lemmas from a string?
I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...
5
votes
4
answers
8k
views
Spacy installation fails on python 3.13
trying to install Spacy using pip install -U spacy, but getting the following error message:
C:\Windows\System32>pip install spacy
Collecting spacy
Using cached spacy-3.8.2.tar.gz (1.3 MB)
...
1
vote
0
answers
228
views
Not able to run my org's Azure OpenAI API with SpaCy-LLM
I am trying to run spacy-llm with my organization's azure openai api that they have provided me.
But since I am pretty beginner in Python, I guess I am not writing the config file correctly, and hence ...
0
votes
1
answer
215
views
Medspacy can not detect any entity although it exists
I am working in a text that contains biomedical entities. However medspacy package failed to detect those:
import medspacy
nlp = medspacy.load()
text = "The patient was treated with warfarin ...
1
vote
1
answer
133
views
How to Remove Stopwords from a Polars DataFrame Column Using SpaCy-GPU?
I'm working with a Polars DataFrame and I want to remove stopwords from a specific column using SpaCy with GPU support. I have the following setup:
import polars as pl
import spacy
# Load SpaCy with ...
2
votes
1
answer
392
views
Numpy Error : Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly
I 'am trying to find similar vector with spacy and numpy. I found the code following url :
Mapping word vector to the most similar/closest word using spaCy
But I'm getting type error
import numpy as ...
1
vote
1
answer
47
views
How can I use multiple spacy.train files in one training run?
I've downloaded the UD Treebank dataset, set up a shell script to discover all folders for a given language and converted the .conllu files to .spacy.
Now I have a collection of files like this: ...
1
vote
0
answers
35
views
SpaCy not recognizing NORP (nationality)
nlp= spacy.load(en_core_web_lg-3.7.1)
name = 'Las Palmas Mexican Restaurant & Bar'
doc = nlp(name)
for token in doc:
print(f"{token.text} \t{token.ent_type_} \t{token.ent_iob_}&...
0
votes
0
answers
95
views
Is it possible to install spaCy models using a different C++ compiler
I am trying to install a spaCy model:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
and getting the following error:
error: ...
2
votes
0
answers
139
views
Using NER to label big parts of text
I'm trying to process a CV-like text, more exactly to split it into parts by their meaning (Description, Contacts, Experience, Education, Certifications etc).
Would NER be suitable for this purpose (...
0
votes
1
answer
52
views
Break after first PER sequence found with Spacy
I am trying to extract only the first speaker's name from a list of texts using spaCy. Currently, my function returns all "PER" tags, but I want to reduce the overhead and get only the first ...
-1
votes
1
answer
51
views
Error: raise IOError and OSError while compiling the program in vs code
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to data directory.
this is the error i am getting
MY code:
from chatterbot import ChatBot
from ...