477 questions
25
votes
2
answers
15k
views
Difference between Fasttext .vec and .bin file
I recently downloaded fasttext pretrained model for english. I got two files:
wiki.en.vec
wiki.en.bin
I am not sure what is the difference between the two files?
21
votes
3
answers
21k
views
fastText embeddings sentence vectors?
I wanted to understand the way fastText vectors for sentences are created. According to this issue 309, the vectors for sentences are obtained by averaging the vectors for words.
In order to confirm ...
15
votes
6
answers
28k
views
How to find similar words with FastText?
I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was ...
14
votes
1
answer
12k
views
How does the Gensim Fasttext pre-trained model get vectors for out-of-vocabulary words?
I am using gensim to load pre-trained fasttext model. I downloaded the English wikipedia trained model from fasttext website.
here is the code I wrote to load the pre-trained model:
from gensim....
13
votes
2
answers
14k
views
FastText using pre-trained word vector for text classification
I am working on a text classification problem, that is, given some text, I need to assign to it certain given labels.
I have tried using fast-text library by Facebook, which has two utilities of ...
12
votes
3
answers
9k
views
How to save fasttext model in vec format?
I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretrainedVectors parameter in fasttext....
12
votes
2
answers
17k
views
ModuleNotFoundError: No module named 'fasttext'
I have tried installing fasttext through conda using two channels:
conda install -c conda-forge fasttext
and
conda install -c conda-forge/label/cf201901 fasttext
as per (https://anaconda.org/conda-...
10
votes
3
answers
8k
views
Continue training a FastText model
I have downloaded a .bin FastText model, and I use it with gensim as follows:
model = FastText.load_fasttext_format("cc.fr.300.bin")
I would like to continue the training of the model to adapt it to ...
10
votes
1
answer
5k
views
Use Tensorflow and pre-trained FastText to get embeddings of unseen words
I am using a pre-trained fasttext model https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md).
I use Gensim to load the fasttext model. It can output a vector for any words,...
9
votes
3
answers
13k
views
How to vectorize whole text using fasttext?
To get vector of a word, I can use:
model["word"]
but if I want to get the vector of a sentence, I need to either sum vectors of all words or get average of all vectors.
Does FastText provide a ...
9
votes
5
answers
6k
views
FastText - Cannot load model.bin due to C++ extension failed to allocate the memory
I'm trying to use the FastText Python API https://pypi.python.org/pypi/fasttext Although, from what I've read, this API can't load the newer .bin model files at https://github.com/facebookresearch/...
9
votes
3
answers
3k
views
FastText quantize documentation incorrect?
I'm unable to run FastText quantization as shown in the documentation. Specifically, as shown at the bottom of the cheat sheet page:
https://fasttext.cc/docs/en/cheatsheet.html
When I attempt to ...
9
votes
0
answers
1k
views
Gensim FastText compute Training Loss
I am training a fastText model using gensim.models.fasttext. However, I can't seem to find a method to compute the loss of the iteration for logging purposes. If I look at gensim.models.word2vec, it ...
8
votes
4
answers
21k
views
Unable to install fastText for python on windows.
So I am unable to install fasttext for python on windows. I followed the methods mentioned in this issue
When I enter python setup.py install, I get the following error:
error: command 'C:\\Program ...
8
votes
1
answer
7k
views
Can't suppress fasttext warning: 'load_model' does not return [...]
I'm struggling to suppress a specific warning related to fasttext.
The warning is Warning : 'load_model' does not return WordVectorModel or SupervisedModel any more, but a 'FastText' object which is ...
8
votes
2
answers
4k
views
Fasttext algorithm use only word and subword? or sentences too?
I read the paper and googled as well if there is any good example of the learning method(or more likely learning procedure)
For word2vec, suppose there is corpus sentence
I go to school with lunch ...
8
votes
1
answer
5k
views
gensim - fasttext - Why `load_facebook_vectors` doesn't work?
I've tried to load pre-trained FastText vectors from fastext - wiki word vectors.
My code is below, and it works well.
from gensim.models import FastText
model = FastText.load_fasttext_format('./...
8
votes
3
answers
4k
views
Use tf-idf with FastText vectors
I'm interested in using tf-idf with FastText library, but have found a logical way to handle the ngrams. I have used tf-idf with SpaCy vectors already for what I have found several examples like these ...
7
votes
3
answers
20k
views
ERROR: Could not build wheels for fasttext, which is required to install pyproject.toml-based projects
I'm trying to install fasttext using pip install fasttext in python 3.11.4 but I'm running into trouble when building wheels. The error reads as follows:
error: command 'C:\\Program Files (x86)\\...
7
votes
1
answer
8k
views
SPACY - Confusion about word vectors and tok2vec
it would be really helpful for me if you would help me understand some underlying concepts about Spacy.
I understand some spacy models have some predefined static vectors, for example, for the Spanish ...
7
votes
2
answers
3k
views
How to handle unbalanced label data using FastText?
In FastText, I have unbalanced labels. What is the best way to handle it?
6
votes
1
answer
3k
views
Unable to recreate Gensim docs for training FastText. TypeError: Either one of corpus_file or corpus_iterable value must be provided
I am trying to make my own Fasttext embeddings so I went to official Gensim documentation and implemented this exact code below with exact 4.0 version.
from gensim.models import FastText
from gensim....
6
votes
1
answer
4k
views
Proper way to add new vectors for OOV words
I'm using some domain-specific language which have a lot of OOV words as well as some typos. I have noticed Spacy will just assign an all-zero vector for these OOV words, so I'm wondering what's the ...
6
votes
1
answer
39k
views
Process finished with exit code -1073740791 (0xC0000409) pycharm error
I am trying to use fastText with PyCharm. Whenever I run below code:
import fastText
model=fastText.train_unsupervised("data_parsed.txt")
model.save_model("model")
The process exits with this error:...
6
votes
2
answers
3k
views
Error when loading FastText's french pre-trained model with gensim
I am trying to use the FastText's french pre-trained binary model (downloaded from the official FastText's github page). I need the .bin model and not the .vec word-vectors so as to approximate ...
6
votes
2
answers
8k
views
Latest Pre-trained Multilingual Word Embedding
Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)?
I have looked at the following but they don't fit my needs:
FastText / ...
6
votes
2
answers
6k
views
fasttext cannot load training txt file
I am trying to train a fasttext classifier in windows using fasttext python package. I have a utf8 file with lines like
__label__type1 sample sentence 1
__label__type2 sample sentence 2
...
6
votes
2
answers
2k
views
How to save fasttext model in binary and text formats?
The documentation is a bit unclear how to save the fasttext model to disk - how do you specify a path in the argument, I tried doing so and it failed with an error
Example in documentation
>>&...
5
votes
2
answers
10k
views
How to use pre-trained word vectors in FastText?
I've just started to use FastText. I'm doing a cross validation of a small dataset by using as input the .csv file of my dataset. To process the dataset I'm using this parameters:
model = fasttext....
5
votes
1
answer
2k
views
What is the difference between syntactic analogy and semantic analogy?
At 15:10 of this video about fastText it mentions syntactic analogy and semantic analogy. But I am not sure what the difference is between them.
Could anybody help explain the difference with ...
5
votes
1
answer
4k
views
fasttext error: predict processes one line at a time (remove '\n')
Hi I have a dataframe column contains text. I want to use fasttext model to make prediction from it.
I can achieve this by passing an array of text to fasttext model.
import fasttext
d = {'id':[1, 2, ...
5
votes
1
answer
2k
views
FastText 0.9.2 - why is recall 'nan'?
I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall.
First, I trained a model:
model = fasttext.train_supervised("train.txt&...
5
votes
2
answers
6k
views
How to export a fasttext model created by gensim, to a binary file?
I'm trying to export the fasttext model created by gensim to a binary file. But the docs are unclear about how to achieve this.
What I've done so far:
model.wv.save_word2vec_format('model.bin')
But ...
5
votes
2
answers
4k
views
precision and recall in fastText?
I implement the fastText for text classification, link https://github.com/facebookresearch/fastText/blob/master/tutorials/supervised-learning.md
I was wondering what's the precision@1, or P@5 means? I ...
5
votes
1
answer
723
views
Gensim fasttext cannot get latest training loss
Problem description
It seems that the get_latest_training_loss function in fasttext returns only 0. Both gensim 4.1.0 and 4.0.0 do not work.
from gensim.models.callbacks import CallbackAny2Vec
from ...
5
votes
1
answer
7k
views
How to get nearest neighbours in fasttext for unsupervised learning models (cbow, skipgram)?
The examples (related to word representations) on fasttext official web site (fasttext.cc) suggest that it is possible to calculate the nearest neighbors on vectors derived with cbow (or skip-gram ...
5
votes
1
answer
806
views
How i can maintain a temporary dictionary in a pyspark application?
I want to use pretrained embedding model (fasttext) in a pyspark application.
So if I broadcast the file (.bin), the following exception is thrown:
Traceback (most recent call last):
cPickle....
5
votes
0
answers
706
views
Incorporate fasttext vectors in tf.keras embedding layer?
Fasttext could handle OOV easily, i.e., it could be assumed that emb = fasttext_model(raw_input) always holds. However, I am not sure how I could build this layer into tf.keras embedding. I couldn't ...
4
votes
1
answer
4k
views
Handling C++ arrays in Cython (with numpy and pytorch)
I am trying to use cython to wrap a C++ library (fastText, if its relevant). The C++ library classes load a very large array from disk. My wrapper instantiates a class from the C++ library to load the ...
4
votes
1
answer
1k
views
What is the difference between args wordNgrams, minn and maxn in fassttext supervised learning?
I'm a little confused after reading Bag of tricks for efficient text classification.
What is the difference between args wordNgrams, minn and maxn
For example, a text classification task and Glove ...
4
votes
1
answer
2k
views
How to Find Top N Similar Words in a Dictionary of Words / Things?
I have a list of str that I want to map against. The words could be "metal" or "st. patrick". The goal is to map a new string against this list and find Top N Similar items. For ...
4
votes
1
answer
3k
views
loading of fasttext pre trained german word embedding's .vec file throwing out of memory error
I am using gensim to load the fasttext's pre-trained word embedding
de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec')
But this gives me a memory error.
Is there any way I can load ...
4
votes
2
answers
2k
views
Difference between Gensim's FastText and Facebook's FastText
I came upon the realization that there exists the original implementation of FastText here by which you can use fasttext.train_unsupervised in order to generate word vectors (see this link as an ...
4
votes
2
answers
1k
views
Why FastText is not handling finding multi-word phrases?
FastText pre-trained model works great for finding similar words:
from pyfasttext import FastText
model = FastText('cc.en.300.bin')
model.nearest_neighbors('dog', k=2000)
[('dogs', 0.8463464975357056)...
4
votes
1
answer
6k
views
Reducing size of Facebook's fastText
I am building a machine learning model which will process documents and extract some key information from it. For this, I need to use word embedding for OCRed output. I have several different options ...
4
votes
1
answer
5k
views
Value of alpha in gensim word-embedding (Word2Vec and FastText) models?
I just want to know the effect of the value of alpha in gensim word2vec and fasttext word-embedding models? I know that alpha is the initial learning rate and its default value is 0.075 form Radim ...
4
votes
1
answer
3k
views
Converting Fasttext vector to word
I am having trouble converting a fast FastText vector back to a word.
Here is my python code:
from gensim.models import KeyedVectors
en_model = KeyedVectors.load_word2vec_format('wiki.en/wiki.en.vec'...
4
votes
1
answer
1k
views
Language names of Languages supported by Fasttext
I am trying to find out the names of languages supported by Fasttext's LID tool, given these language codes listed here:
af als am an ar arz as ast av az azb ba bar bcl be bg bh bn bo bpy br bs bxr ca ...
4
votes
1
answer
8k
views
Is it possible to fine tune FastText models
I'm working on a project for text similarity using FastText, the basic example I have found to train a model is:
from gensim.models import FastText
model = FastText(tokens, size=100, window=3, ...
4
votes
1
answer
5k
views
Install fasttext on Windows 10 with anaconda
I am trying to install fasttext in anaconda with Windows 10 using the command: pip install fasttext as explained here: https://pypi.org/project/fasttext/
The error messages are:
ValueError: Unknown ...