2

For my application I am considering a learning problem where I simulate a bunch of episodes say '$n$' first, and than carry out the recursive least squares update. Similar to $TD(1)$.

I know that RLS can be used to update parameters being learned as they arrive. This can be done efficiently for single data point and the derivations are easily available online and also easy to understand.

However for my case I am looking for same equations when data arrive as a mini batch and not a single data point at a time. I could not find any material regarding RLS for mini batches.

According to my understanding the same equations can be also used by appropriately considering matrix dimensions. However I do not know if this is valid.

What are the alternatives to be used?

hanugm
  • 4,102
  • 3
  • 29
  • 63

1 Answers1

0

I had just successfully implemented RLS when I found this question. In fact, I wanted to extend it to the mini batch setting, just like you (which I did!).

RLS is derived using the Sherman-Morrison formula, which is specific for rank-1 updates. We'd need to use its generalization for rank-n updates, the Woodbury Matrix Identity. By looking at the RLS formula, the Sherman-Morrison formula becomes evident, and the conversion can be made quite easily. My solution in Python with PyTorch is this:

R = (R - (R @ x.T @ torch.linalg.inv(l * torch.eye(k) + x @ R @ x.T) @ x @ R) ) / l

where $k$ is the mini batch size and $l$ is the forgetting factor (which can be set to 1 to give the same result as OLS). The transposes are inverted, since I use batch_size x input_dim format for $x$. It gives me the same result as the rank-1 updates, except for a $-0.05\%$ difference in accuracy on MNIST in my experiments, which I attribute to rounding errors and such. If anyone finds any error in my formula (or any improvement to it), please, leave a comment :)

rcpinto
  • 2,148
  • 1
  • 17
  • 31