Reduce fastText memory usage for big models

Question

I trained a machine learning sentence classification model that uses, among other features, also the vectors obtained from a pretrained fastText model (like these) which is 7Gb. I use the pretrained fastText Italian model: I am using this word embedding only to get some semantic features to feed into the effective ML model.

I built a simple API based on fastText that, at prediction time, computes the vectors needed by the effective ML model. Under the hood, this API receives a string as input and calls get_sentence_vector. When the API starts, it loads the fastText model into memory.

How can I reduce the memory footprint of fastText, which is loaded into RAM?

Constraints:

My model works fine, training was time-consuming and expensive, so I wouldn't want to retrain it using smaller vectors
I need the fastText ability to handle out-of-vocabulary words, so I can't use just vectors but I need the full model
I should reduce the RAM usage, even at the expense of a reduction in speed.

At the moment, I'm starting to experiment with compress-fasttext...

Please share your suggestions and thoughts even if they do not represent full-fledged solutions.

What parameters did you use when training FastText, & which FastText implementation? How crucial to you is the ability to generate vectors for OOV words? Also, why is the RAM size important to minimize - because a system with more RAM isn’t possible or too expensive, or other speed/performance considerations? — gojomo
– gojomo, Commented Jun 29, 2022 at 16:25
Thank you @gojomo! I tried to add this information into the updated question. A small addition: I should reduce RAM usage, based on constraints imposed by system administrators. — Stefano Fiorucci - anakin87
– Stefano Fiorucci - anakin87, Commented Jun 30, 2022 at 8:43
Thanks! Because you need the subword info, one quick possibility - going to just full-word vectors, & possibly even slimming those to a most-frequent-word subset – isn't available. (It might still be possible to save some space by discarding some less-frequent words, which might not have much effect on whole-system performance, expecially since they'd still get OOV-synthesized vectors. But it'd likely require some custom model-trimming-and-resaving code, & you'd want to check effects in some repeatable evaluation.) — gojomo
– gojomo, Commented Jun 30, 2022 at 17:20
Sometimes people's concern about RAM is really about load-time, especially in some systems that might reload the model regularly (in each request, or across many service processes) - but if you're really hitting a hard cap based on some fixed/shared deployment system, you'll have to shrink the usage – or upgrade the system. (Given that +8GB RAM isn't too expensive, in either hardware or cloud rentals, at some point you may want to lobby for that. The crossover point, where lost time searching for workarounds has cost more than more-hardware would've, may be closer than 1st assumed.) — gojomo
– gojomo, Commented Jun 30, 2022 at 17:30
With that said, not sure I could outdo whatever that compress-fasttext project has achieved – which I've not used but looks effective & throrough in its evaluations. (Other ad hoc things that might work – discarding some arbitrary dimensions of the existin model, other matrix refactorizations to fewer dimensions – are probably done much better by that project.) — gojomo
– gojomo, Commented Jun 30, 2022 at 17:32

Stefano Fiorucci - anakin87 · Accepted Answer · 2022-08-29 08:08:25Z

1

There is no easy solution for my specific problem: if you are using a fastText embedding as a feature extractor, and then you want to use a compressed version of this embedding, you have to retrain the final classifier, since produced vectors are somewhat different.

Anyway, I want to give a general answer for

fastText models reduction

Unsupervised models (=embeddings)

You are using pretrained embeddings provided by Facebook or you trained your embeddings in an unsupervised fashion. Format .bin. Now you want to reduce model size/memory consumption.

Straight-forward solutions:

compress-fasttext library: compress fastText word embedding models by orders of magnitude, without significantly affecting their quality; there are also available several pretrained compressed models (other interesting compressed models here).
fastText native reduce_model: in this case, you are reducing vector dimension (eg from 300 to 100), so you are explictly losing expressiveness; under the hood, this method employs PCA.

If you have training data and can perform retraining, you can use floret, a fastText fork by explosion (the company of Spacy), that uses a more compact representation for vectors.

If you are not interested in fastText ability to represent out-of-vocabulary words (words not seen during training), you can use .vec file (containing only vectors and not model weights) and select only a portion of the most common vectors (eg the first 200k words/vectors). If you need a way to convert .bin to .vec, read this answer. Note: gensim package fully supports fastText embedding (unsupervised mode), so these operations can be done through this library (more details in this answer)

Supervised models

You used fastText to train a classifier, producing a .bin model. Now you want to reduce classifier size/memory consumption.

The best solution is fastText native quantize: the model is retrained applying weights quantization and feature selection. With the retrain parameter, you can decide whether to fine-tune the embeddings or not.
You can still use fastText reduce_model, but it leads to less expressive models and the size of the model is not heavily reduced.

edited Aug 29, 2022 at 8:08

answered Aug 23, 2022 at 12:44

Stefano Fiorucci - anakin87

3,57610 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

gojomo Over a year ago

Looks good! I think that the native-to-Facebook-fasttext reduce_model approach may work on --supervised-mode models, too – but I've not tried it & the docs aren't clear. It could be worthwhile to note that its approach to reducing the number-of-dimensions is PCA behind the scenes.

Stefano Fiorucci - anakin87 Over a year ago

Thank you. I tested reduce_model for supervised models: it works, but it's not optimal.

Collectives™ on Stack Overflow

Reduce fastText memory usage for big models

1 Answer 1

fastText models reduction

Unsupervised models (=embeddings)

Supervised models

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

fastText models reduction

Unsupervised models (=embeddings)

Supervised models

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related