Meaning of get_input_matrix and get_output_matrix in FastText Python API

Question

I was playing with the FT Api and found out 2 methods, namely get_input_matrix and get_output_matrix. I am wondering what are the purposes of these? Are there any chance that those 2 are the weight matrices of the original model?

The help function doesn't say too much about them. But when I try to look inside the matrices, the input function returns a full matrix with 4M by 300; whereas the output matrix returns a 2M by 300 matrix with 0 entries. If it were a weight matrix, it would return a full matrix, right? Then what is the point ?

Could someone shed light on ?

B.R.

gojomo · Accepted Answer · 2023-03-13 21:54:41Z

The 'input matrix' stores the vectors for the words and subwords the model knows. Composing the input-layer-activations of the underlying neural-network involves looking-up full-word-vectors (when available) & combining subword vectors from that input-matrix.

I believe for the values you've reported, the 1st 2 million rows of the input-matrix will be full-word vectors of known in-vocabulary words, and the next 2 million will be the subword vectors (in a collision-oblivious hashtable, so multiple character-n-grams may wind up sharing a vector with arbitrary other n-grams.)

The 'output matrix' has weights controlling the calculation of the underlying neural-networks output-layer activations. The interpretation of each row in the output-matrix depends on the training-mode in use.

In the typical default negative-sampling mode, each row of the 'output matrix' corresponds to one output node corresponding to a specific predictable known-vocabulary word. Training involves checking the NN's activations at both the desired target word node, and N other ('negative') samples, and nudging the network to predict (be highly activated) at the desired node, and not-predict (be less-activated) the random other samples.

In hierarchical softmax, each row of the 'output matrix' corresponds to one of the huffman-tree encoding-nodes that (in certain combinations) predict specific output words. Training nudges all involved encoding nodes to better-predict a certain center target word.

If you mean that every entry in the output-matrix is 0.0, that suggests a model that's untrained or incomplete: a fully-trained & complete model would have varied values in the 'output-matrix'. But, after training is done, the output-matrix is no longer needed – any in-vocabulary or out-of-vocabulary word's vector can be calculated from the input-matrix.

Collectives™ on Stack Overflow

Meaning of get_input_matrix and get_output_matrix in FastText Python API

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related