A simple implementation of Adaptive Learning Rate Clipping in Pytorch.
Please see the paper (linked above) or the accompanying code for TensorFlow for a succinct and well-written paper and repo implementation. Below is a (very) simple example on how to use this PyTorch version:
model = Net()
loss = nn.MSELoss()
clipper = ALRC()
optimizer = optim.SGD(model.parameters(), ...)
for input, target in data:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss = clipper.clip(loss)
loss.backward()
optimizer.step()