0

I'm currently training a seq2seq encoder-decoder network powered by LSTM in TensorFlow 2.x. The main problem right now is the loss approaches to NaN and the prediction returned are all NaN as well. I understand the possibility of exploding/vanishing gradients and have deployed several ways to try and combat it (ex: adding min-max layers, adding L2 regularizations, using clipnorm/clipvalues, and changing the learning rate, etc.). Almost all researchable methods for combating this issue have been tried but the issue persists.

The architecture is as followed:

embedding_size = 16
INPUT_LENGTH = X.shape[1]
# MAX_OUTPUT_LENGTH = y.shape[1]
MAX_OUTPUT_LENGTH = 10

# for min-max normalization - create a min-max layer
global_min = np.min(X_train)
global_max = np.max(X_train)
min_max_layer = keras.layers.Lambda(lambda x: (x - global_min) / (global_max - global_min))

# define encoder model
encoder = keras.models.Sequential()
encoder.add(keras.layers.Embedding(input_dim=len(aa_tokenizer.word_index) + 1,
                                   output_dim=embedding_size,
                                   input_shape=[None]))
                                #    input_length=MAX_OUTPUT_LENGTH))
encoder.add(min_max_layer)
encoder.add(keras.layers.LSTM(16, activity_regularizer=keras.regularizers.L2(0.1)))

# define decoder model
decoder = keras.models.Sequential()
# decoder.add(min_max_layer)
decoder.add(keras.layers.LSTM(16, return_sequences=True, activity_regularizer=keras.regularizers.L2(0.1)))
# decoder.add(min_max_layer)
decoder.add(keras.layers.Dense(len(codon_tokenizer.word_index) + 1, activation='softmax'))

# define inference model
model = keras.models.Sequential([encoder, keras.layers.RepeatVector(MAX_OUTPUT_LENGTH), decoder])

optimizer = keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.0)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(X_train[:, :10, :], y_train[:, :10], epochs=2, validation_split=0.15)

With the output: loss: nan - accuracy: 0.0527 - val_loss: nan - val_accuracy: 0.0000e+00

(Note: this network currently trains on sequences with length of 10 for the sake of speed and testing, but the original goal is to train on sequences with a length of 2,400).

2
  • can you please add your loss function? maybe it's bad coded? if possible, I'd recommend you to add print() steps to trace its path and see where does the nan appear Commented Dec 6, 2021 at 23:15
  • @noober I used sparse_categorical_crossentropy for the loss function. Let me look into how I can use the print() function to trace the path of the network. In addition, the loss appears to be a real number in the first few batches but suddenly just jumped to nan afterwards. @noober Commented Dec 7, 2021 at 0:13

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.