Loss value + output of encoder-decoder LSTM network return NaN - TensorFlow 2.x

Ask Question

Asked 3 years, 11 months ago

Modified 3 years, 11 months ago

Viewed 171 times

I'm currently training a seq2seq encoder-decoder network powered by LSTM in TensorFlow 2.x. The main problem right now is the loss approaches to NaN and the prediction returned are all NaN as well. I understand the possibility of exploding/vanishing gradients and have deployed several ways to try and combat it (ex: adding min-max layers, adding L2 regularizations, using clipnorm/clipvalues, and changing the learning rate, etc.). Almost all researchable methods for combating this issue have been tried but the issue persists.

The architecture is as followed:

embedding_size = 16
INPUT_LENGTH = X.shape[1]
# MAX_OUTPUT_LENGTH = y.shape[1]
MAX_OUTPUT_LENGTH = 10

# for min-max normalization - create a min-max layer
global_min = np.min(X_train)
global_max = np.max(X_train)
min_max_layer = keras.layers.Lambda(lambda x: (x - global_min) / (global_max - global_min))

# define encoder model
encoder = keras.models.Sequential()
encoder.add(keras.layers.Embedding(input_dim=len(aa_tokenizer.word_index) + 1,
                                   output_dim=embedding_size,
                                   input_shape=[None]))
                                #    input_length=MAX_OUTPUT_LENGTH))
encoder.add(min_max_layer)
encoder.add(keras.layers.LSTM(16, activity_regularizer=keras.regularizers.L2(0.1)))

# define decoder model
decoder = keras.models.Sequential()
# decoder.add(min_max_layer)
decoder.add(keras.layers.LSTM(16, return_sequences=True, activity_regularizer=keras.regularizers.L2(0.1)))
# decoder.add(min_max_layer)
decoder.add(keras.layers.Dense(len(codon_tokenizer.word_index) + 1, activation='softmax'))

# define inference model
model = keras.models.Sequential([encoder, keras.layers.RepeatVector(MAX_OUTPUT_LENGTH), decoder])

optimizer = keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.0)
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
history = model.fit(X_train[:, :10, :], y_train[:, :10], epochs=2, validation_split=0.15)

With the output: loss: nan - accuracy: 0.0527 - val_loss: nan - val_accuracy: 0.0000e+00

(Note: this network currently trains on sequences with length of 10 for the sake of speed and testing, but the original goal is to train on sequences with a length of 2,400).

edited Dec 7, 2021 at 0:10

asked Dec 6, 2021 at 22:48

YoYo

11 bronze badge

can you please add your loss function? maybe it's bad coded? if possible, I'd recommend you to add print() steps to trace its path and see where does the nan appear

OK 400
– OK 400

2021-12-06 23:15:42 +00:00
Commented Dec 6, 2021 at 23:15
@noober I used sparse_categorical_crossentropy for the loss function. Let me look into how I can use the print() function to trace the path of the network. In addition, the loss appears to be a real number in the first few batches but suddenly just jumped to nan afterwards. @noober

YoYo
– YoYo

2021-12-07 00:13:06 +00:00
Commented Dec 7, 2021 at 0:13

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Loss value + output of encoder-decoder LSTM network return NaN - TensorFlow 2.x

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest