The 'input matrix' stores the vectors for the words and subwords the model knows. Composing the input-layer-activations of the underlying neural-network involves looking-up full-word-vectors (when available) & combining subword vectors from that input-matrix.
I believe for the values you've reported, the 1st 2 million rows of the input-matrix will be full-word vectors of known in-vocabulary words, and the next 2 million will be the subword vectors (in a collision-oblivious hashtable, so multiple character-n-grams may wind up sharing a vector with arbitrary other n-grams.)
The 'output matrix' has weights controlling the calculation of the underlying neural-networks output-layer activations. The interpretation of each row in the output-matrix depends on the training-mode in use.
In the typical default negative-sampling mode, each row of the 'output matrix' corresponds to one output node corresponding to a specific predictable known-vocabulary word. Training involves checking the NN's activations at both the desired target word node, and N other ('negative') samples, and nudging the network to predict (be highly activated) at the desired node, and not-predict (be less-activated) the random other samples.
In hierarchical softmax, each row of the 'output matrix' corresponds to one of the huffman-tree encoding-nodes that (in certain combinations) predict specific output words. Training nudges all involved encoding nodes to better-predict a certain center target word.
If you mean that every entry in the output-matrix is 0.0, that suggests a model that's untrained or incomplete: a fully-trained & complete model would have varied values in the 'output-matrix'. But, after training is done, the output-matrix is no longer needed – any in-vocabulary or out-of-vocabulary word's vector can be calculated from the input-matrix.