I look through the code of “Baseline PFRL SQIL”, but I cannot find anything about Soft Q Learning or Soft Actor-Critic. Does someone know what happened? Did I miss the code or it just really does not exist?
The baseline implementation of SQIL is for discrete actions (derived from k-means clustering) and uses a DQN variant instead of SAC. The author put the soft Q learning part in the computation of the target Q values, found in the
_compute_target_values function (line 459 in
mod/agents/sqil.py). I would expect there to be an
alpha param, but it’s missing, so the author may have just set it to 1 and decided to remove it.
I got it, thanks a lot sir .
Besides, do you know why when dealing with continuous action space, for Soft Q-Learning, it uses SVGD to do policy literation. However for SAC, SVGD is not being used anymore, instead, it calculates KL divergence directly by “computing logprob from Gaussian, and then apply correction for Tanh squashing.”
Thanks a lot for your valuable time!