0

I was playing with the FT Api and found out 2 methods, namely get_input_matrix and get_output_matrix. I am wondering what are the purposes of these? Are there any chance that those 2 are the weight matrices of the original model?

The help function doesn't say too much about them. But when I try to look inside the matrices, the input function returns a full matrix with 4M by 300; whereas the output matrix returns a 2M by 300 matrix with 0 entries. If it were a weight matrix, it would return a full matrix, right? Then what is the point ?

Could someone shed light on ?

B.R.

1 Answer 1

1

The 'input matrix' stores the vectors for the words and subwords the model knows. Composing the input-layer-activations of the underlying neural-network involves looking-up full-word-vectors (when available) & combining subword vectors from that input-matrix.

I believe for the values you've reported, the 1st 2 million rows of the input-matrix will be full-word vectors of known in-vocabulary words, and the next 2 million will be the subword vectors (in a collision-oblivious hashtable, so multiple character-n-grams may wind up sharing a vector with arbitrary other n-grams.)

The 'output matrix' has weights controlling the calculation of the underlying neural-networks output-layer activations. The interpretation of each row in the output-matrix depends on the training-mode in use.

In the typical default negative-sampling mode, each row of the 'output matrix' corresponds to one output node corresponding to a specific predictable known-vocabulary word. Training involves checking the NN's activations at both the desired target word node, and N other ('negative') samples, and nudging the network to predict (be highly activated) at the desired node, and not-predict (be less-activated) the random other samples.

In hierarchical softmax, each row of the 'output matrix' corresponds to one of the huffman-tree encoding-nodes that (in certain combinations) predict specific output words. Training nudges all involved encoding nodes to better-predict a certain center target word.

If you mean that every entry in the output-matrix is 0.0, that suggests a model that's untrained or incomplete: a fully-trained & complete model would have varied values in the 'output-matrix'. But, after training is done, the output-matrix is no longer needed – any in-vocabulary or out-of-vocabulary word's vector can be calculated from the input-matrix.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.