3

I have a question about fasttext (https://fasttext.cc/). I want to download a pre-trained model and use it to retrieve the word vectors from text.

After downloading the pre-trained model (https://fasttext.cc/docs/en/english-vectors.html) I unzipped it and got a .vec file. How do I import this into fasttext?

I've tried to use the mentioned function as follows:

import fasttext
import io

def load_vectors(fname):
    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    data = {}
    for line in fin:
        tokens = line.rstrip().split(' ')
        data[tokens[0]] = map(float, tokens[1:])
    return data

vectors = load_vectors('/Users/username/Downloads/wiki-news-300d-1M.vec')
model = fasttext.load_model(vectors)

However, I can't completely run this code because python crashes. How can I successfully load these pre-trained word vectors?

Thank you for your help.

2
  • Pleas edit your question to specify whether there is an error message. Commented Apr 14, 2021 at 12:23
  • How big is the vector file? How much RAM does your machine have? Commented Apr 14, 2021 at 12:48

2 Answers 2

8

FastText's advantage over word2vec or glove for example is that they use subword information to return vectors for OOV (out-of-vocabulary) words.

So they offer two types of pretrained models : .vec and .bin.

.vec is a dictionary Dict[word, vector], the word vectors are pre-computed for the words in the training vocabulary.

.bin is a binary fasttext model that can be loaded using fasttext.load_model('file.bin') and that can provide word vector for unseen words (OOV), be trained more, etc.

In your case you are loading a .vec file, so vectors is the "final form" of the data. fasttext.load_model expects a .bin file.

If you need more than a python dictionary you can use gensim.models.keyedvector (which handles any word vectors, such as word2vec, glove, etc...).

Sign up to request clarification or add additional context in comments.

2 Comments

Any idea how to load .vec file using fasttext module?
@SwapnilMasurekar have you checked this function ? radimrehurek.com/gensim/models/…
0

I use the following code to load the .vec file in Python 3, where PATH_TO_FASTTEXT is the path to the .vec file.

Most notably, the map needs to be explicitly cast to a list.


def LoadFastText():
    input_file = io.open(PATH_TO_FASTTEXT, 'r', encoding='utf-8', newline='\n', errors='ignore')
    no_of_words, vector_size = map(int, input_file.readline().split())
    word_to_vector: Dict[str, List[float]] = dict()
    for i, line in enumerate(input_file):
        tokens = line.rstrip().split(' ')
        word = tokens[0]
        vector = list(map(float, tokens[1:]))
        assert len(vector) == vector_size
        word_to_vector[word] = vector
    return word_to_vector

1 Comment

How do you build a model out of those vectors then? I tried to use load_model for that and pass into vectors as a parameter but getting the following error: TypeError: loadModel(): incompatible function arguments. The following argument types are supported: 1. (self: fasttext_pybind.fasttext, arg0: str) -> None

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.