How do checkpoints work in rllib

I was going through the basic TorchPolicy code for rllib, and it isn’t clear to me if the optimizer state and other training-related states are saved as part of the checkpoints. Though it seems PPO with Adam doesn’t get affected much for losing the optimizer state, but I’d like to know out of curiosity from people who are more familiar with rllib. I do think this may be relevant for custom policies though because the runs seem to sometimes stop in between and get resumed from the last checkpoint.

Hello @dipam_chakraborty

I think the optimizer states are saved when checkpointing. This seems to be the base class that has the checkpointing logic defined,

Ok, I was confusing the get_weights function in TorchPolicy with get_state. Thanks for the clarification, I guess for custom policy extra training related states need to go in get_state then.