Effective Techniques to Decrease Learning Rate for Adam Optimizer in PyTorch

2024-07-27

The learning rate controls how much the model's weights are adjusted during training.
A high learning rate can lead to the model oscillating or diverging, while a low learning rate can make training slow.
Decreasing the learning rate (learning rate decay) is often beneficial as training progresses, allowing the model to fine-tune near the optimal solution.

Methods for Learning Rate Decay in PyTorch with Adam:

Manual Decay:

Directly modify the learning_rate attribute of the optimizer after each epoch or a certain number of iterations.
This method gives you fine-grained control but requires manual intervention.

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(num_epochs):
    # Train loop...
    optimizer.learning_rate *= 0.9  # Reduce learning rate by 10% after each epoch

Learning Rate Schedulers:

PyTorch provides built-in schedulers that automatically adjust the learning rate based on a predefined strategy.
This approach is more flexible and avoids the need for manual updates.

a) ReduceLROnPlateau:

Reduces the learning rate when a monitored metric (e.g., validation loss) stops improving for a specified number of epochs (patience).
Useful for preventing overfitting when training plateaus.

from torch.optim.lr_scheduler import ReduceLROnPlateau

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = ReduceLROnPlateau(optimizer, factor=0.1, patience=3)  # Reduce by 10% after 3 epochs of no improvement

for epoch in range(num_epochs):
    # Train loop...
    scheduler.step(val_loss)  # Update scheduler with validation loss

b) ExponentialLR:

Multiplies the learning rate by a constant factor (gamma) at regular intervals.
Simple approach for general learning rate decay.

from torch.optim.lr_scheduler import ExponentialLR

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = ExponentialLR(optimizer, gamma=0.9)  # Reduce by 10% every epoch

for epoch in range(num_epochs):
    # Train loop...
    scheduler.step()  # Update scheduler at the end of each epoch

c) CosineAnnealingLR:

Gradually reduces the learning rate using a cosine annealing schedule, reaching a minimum learning rate at the end of training.
Can be helpful for fine-tuning in the later stages.

from torch.optim.lr_scheduler import CosineAnnealingLR

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs)  # Reduce to minimum LR over num_epochs epochs

for epoch in range(num_epochs):
    # Train loop...
    scheduler.step()  # Update scheduler at the end of each epoch

Choosing the Right Method:

Manual decay offers the most control but requires constant monitoring.
Learning rate schedulers are more automated but may require experimentation to find optimal hyperparameters (e.g., factor, patience, gamma).

Additional Considerations:

Experiment with different decay methods and hyperparameters to find what works best for your specific model and dataset.
Consider techniques like warmup (gradually increasing the learning rate initially) for better convergence in some cases.

import torch.optim as optim

# Define your model...

optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(num_epochs):
    # Train loop...

    # Manual learning rate decay (adjust factor and frequency as needed)
    optimizer.learning_rate *= 0.9  # Reduce learning rate by 10% after each epoch

    # Rest of your training code...

from torch.optim.lr_scheduler import ReduceLROnPlateau

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = ReduceLROnPlateau(optimizer, factor=0.1, patience=3, verbose=True)  # Reduce by 10% after 3 epochs of no improvement, with verbosity

for epoch in range(num_epochs):
    # Train loop...

    # Track validation loss for ReduceLROnPlateau
    val_loss = ...  # Calculate validation loss

    # Update scheduler with validation loss
    scheduler.step(val_loss)

    # Rest of your training code...

from torch.optim.lr_scheduler import ExponentialLR

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = ExponentialLR(optimizer, gamma=0.9, verbose=True)  # Reduce by 10% every epoch, with verbosity

for epoch in range(num_epochs):
    # Train loop...

    # Update scheduler at the end of each epoch
    scheduler.step()

    # Rest of your training code...

from torch.optim.lr_scheduler import CosineAnnealingLR

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs, verbose=True)  # Reduce to minimum LR over num_epochs epochs, with verbosity

for epoch in range(num_epochs):
    # Train loop...

    # Update scheduler at the end of each epoch
    scheduler.step()

    # Rest of your training code...

Explanation of Improvements:

Added comments for clarity.
Included verbose=True in some schedulers for informative output.
Emphasized the importance of calculating validation loss for ReduceLROnPlateau.
Highlighted the need for adjusting hyperparameters (factor, patience, gamma, T_max) based on your specific task.

This scheduler allows you to define a custom learning rate function based on the current epoch. It provides more flexibility than built-in schedulers with fixed decay rates.

from torch.optim.lr_scheduler import LambdaLR

def lr_lambda(epoch):
    return 0.95 ** epoch  # Reduce by 5% every epoch

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = LambdaLR(optimizer, lr_lambda=lr_lambda)

for epoch in range(num_epochs):
    # Train loop...
    scheduler.step()  # Update scheduler at the end of each epoch

Cyclic Learning Rate (CLR):

CLR involves periodically increasing and decreasing the learning rate during training. This can help the model escape local minima and improve generalization.

You'll need an external library like clr_callback for this approach. Here's a basic example:

from clr_callback import CyclicLR

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=2000, step_size_down=2000)  # Adjust hyperparameters

for epoch in range(num_epochs):
    # Train loop...
    scheduler.batch_step()  # Update scheduler after each batch

Gradual Warmup:

This technique gradually increases the learning rate from a very low initial value to the actual learning rate over a few epochs. It can improve convergence and stability, especially for complex models or datasets.

import numpy as np

optimizer = optim.Adam(model.parameters(), lr=0.01)
warmup_epochs = 5  # Adjust as needed

for epoch in range(num_epochs):
    if epoch < warmup_epochs:
        new_lr = epoch * (0.01 / warmup_epochs)  # Linear warmup
        for param_group in optimizer.param_groups:
            param_group['lr'] = new_lr
    else:
        # Rest of your training code...

The best method depends on your specific problem and dataset. Here are some general guidelines:

Manual Decay: Simple and offers control, but requires monitoring.
ReduceLROnPlateau: Effective for preventing overfitting, but sensitive to hyperparameter tuning.
ExponentialLR/CosineAnnealingLR: Simple decay strategies for general training.
LambdaLR: Provides more control with custom learning rate functions.
CLR: May help escape local minima and improve generalization.
Gradual Warmup: Can improve convergence in complex scenarios.

pytorch

Effective Techniques to Decrease Learning Rate for Adam Optimizer in PyTorch

Understanding Gradients in PyTorch Neural Networks

Crafting Convolutional Neural Networks: Standard vs. Dilated Convolutions in PyTorch

Building Linear Regression Models for Multiple Features using PyTorch

Loading PyTorch Models Smoothly: Fixing "KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'"

Demystifying the Relationship Between PyTorch and Torch: A Pythonic Leap Forward in Deep Learning

Demystifying DataLoaders: A Guide to Efficient Custom Dataset Handling in PyTorch

PyTorch for Deep Learning: Effective Regularization Strategies (L1/L2)

Optimizing Your PyTorch Code: Mastering Tensor Reshaping with view() and unsqueeze()

Understanding the "AttributeError: cannot assign module before Module.init() call" in Python (PyTorch Context)

Reshaping Tensors in PyTorch: Mastering Data Dimensions for Deep Learning