The idea is that setting up a LearningRateScheduler and test several learning rates (in a increased manner) for n epochs, and plot the loss history graph to see what will be suitable value range of learning rate based on where the loss starts to jiggle and eventually burst up.
Here is the code snippet for training 100 epochs with learning rate being increased by 10 every 30 epochs from 1e-6, i.e., 1e-6 => 1e-5 (from epoch30) => 1e-4 (from epoch60) => 1e-3 (from epoch90).
Based on the produced training loss history below, we can see from 1e-4 the loss history start to be unstable and increasing with a larger value of 1e-3. Therefore, we can choose 1e-5 as an approprieate value for our learning rate.
# Tensorflow 2.9.1
# Every 30 epochs increases by 10
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-6 * 10**(epoch / 30))
optimizer = tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
optimizer=optimizer,
metrics=["mae"])
history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])
plt.semilogx(history.history["lr"], history.history["loss"])
plt.axis([1e-6, 1e-3, 0, 20])
plt.show()