Loading pre trained fasttext model

Question

I have a question about fasttext (https://fasttext.cc/). I want to download a pre-trained model and use it to retrieve the word vectors from text.

After downloading the pre-trained model (https://fasttext.cc/docs/en/english-vectors.html) I unzipped it and got a .vec file. How do I import this into fasttext?

I've tried to use the mentioned function as follows:

import fasttext
import io

def load_vectors(fname):
    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    data = {}
    for line in fin:
        tokens = line.rstrip().split(' ')
        data[tokens[0]] = map(float, tokens[1:])
    return data

vectors = load_vectors('/Users/username/Downloads/wiki-news-300d-1M.vec')
model = fasttext.load_model(vectors)

However, I can't completely run this code because python crashes. How can I successfully load these pre-trained word vectors?

Thank you for your help.

Pleas edit your question to specify whether there is an error message. — ygorg
– ygorg, Commented Apr 14, 2021 at 12:23
How big is the vector file? How much RAM does your machine have? — dennlinger
– dennlinger, Commented Apr 14, 2021 at 12:48

ygorg · Accepted Answer · 2021-09-21 12:56:36Z

8

FastText's advantage over word2vec or glove for example is that they use subword information to return vectors for OOV (out-of-vocabulary) words.

So they offer two types of pretrained models : .vec and .bin.

.vec is a dictionary Dict[word, vector], the word vectors are pre-computed for the words in the training vocabulary.

.bin is a binary fasttext model that can be loaded using fasttext.load_model('file.bin') and that can provide word vector for unseen words (OOV), be trained more, etc.

In your case you are loading a .vec file, so vectors is the "final form" of the data. fasttext.load_model expects a .bin file.

If you need more than a python dictionary you can use gensim.models.keyedvector (which handles any word vectors, such as word2vec, glove, etc...).

edited Sep 21, 2021 at 12:56

answered Apr 14, 2021 at 13:06

ygorg

7704 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Swapnil Masurekar Over a year ago

Any idea how to load .vec file using fasttext module?

ygorg Over a year ago

@SwapnilMasurekar have you checked this function ? radimrehurek.com/gensim/models/…

Eric McLachlan · Accepted Answer · 2022-05-28 21:23:06Z

0

I use the following code to load the .vec file in Python 3, where PATH_TO_FASTTEXT is the path to the .vec file.

Most notably, the map needs to be explicitly cast to a list.


def LoadFastText():
    input_file = io.open(PATH_TO_FASTTEXT, 'r', encoding='utf-8', newline='\n', errors='ignore')
    no_of_words, vector_size = map(int, input_file.readline().split())
    word_to_vector: Dict[str, List[float]] = dict()
    for i, line in enumerate(input_file):
        tokens = line.rstrip().split(' ')
        word = tokens[0]
        vector = list(map(float, tokens[1:]))
        assert len(vector) == vector_size
        word_to_vector[word] = vector
    return word_to_vector

answered May 28, 2022 at 21:23

Eric McLachlan

3,6302 gold badges30 silver badges41 bronze badges

1 Comment

Deil Over a year ago

How do you build a model out of those vectors then? I tried to use load_model for that and pass into vectors as a parameter but getting the following error:

TypeError: loadModel(): incompatible function arguments. The following argument types are supported:     1. (self: fasttext_pybind.fasttext, arg0: str) -> None

Collectives™ on Stack Overflow

Loading pre trained fasttext model

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related