How to handle unbalanced label data using FastText?

Question

In FastText, I have unbalanced labels. What is the best way to handle it?

This blog machinelearningmastery.com/… gives some general answers can you add some details as to the domain specifics? — Veltzer Doron
– Veltzer Doron, Commented Jun 26, 2018 at 15:55
I dont see any satisfactory answer. is there a better resolution? — Anuj Gupta
– Anuj Gupta, Commented Sep 21, 2018 at 4:46

Remzouz · Accepted Answer · 2018-07-12 10:39:39Z

2

Fasttext seems to handle unbalanced data pretty well. According to the FAQ

Note also that this loss is thought for classes that are unbalanced, that is some classes are more frequent than others.

answered Jul 12, 2018 at 10:39

Remzouz

1577 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Flavio · Accepted Answer · 2019-03-21 12:20:49Z

In our case here we have a very skewed dataset with 200+ classes and 20% of the classes containing 80% of all data.

In our data, even with this highly skewed data, we have a clear definition of the texts inside our categories.

Example: Text of the Majority Class: "Hey, I need a computer and a mouse to open the internet and post a programming answer in Stack Overflow"

Text of the Minority Class: "Hey, could please give me the following items: Eggs, lettuce, onions, tomatoes, milk and wheat?"

As FastText deals with WordNGrams and hierarchical split if you have a very well defined category as my case above, the imbalance it's not a problem because of the nature of the algorithm.

Reference: Bag of Tricks for Efficient Text Classification - Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov

Collectives™ on Stack Overflow

How to handle unbalanced label data using FastText?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related