Hi, I have a question about GPU utilization.
When I run my program locally, it runs at about 1000 timesteps/sec.
However, when I submit it, its speed decreases to about 40 timesteps/sec.
I suspect that it doesn’t utilize GPU correctly.
https://gitlab.aicrowd.com/shogoakiyama/neurips-2020-procgen-starter-kit/issues/22
Resources requested: 11/16 CPUs, 1/1 GPUs, 0.0/55.96 GiB heap, 0.0/19.24 GiB objects
How can I utilize GPU? What parameters do I need to set?
I would appreciate it if you could answer my question.
Hello @shogoakiyama
Can you try setting
num_workers: 6 # Number of rollout workers to run
num_envs_per_worker: 20 # Number of envs to run per rollout worker
num_gpus: 0.6 # Fraction of GPU used by trainer
num_gpus_per_worker: 0.05 # Fraction of GPU used by rollout worker
Please make sure that num_gpus + num_gpus_per_worker*num_workers <= 1
. Setting num_gpus
to 0.5
doesn’t mean that half of the GPU memory is available to the trainer process. rllib
doesn’t allocate GPUs but schedules the workers based on these values. They exist to make it easier to scale the training process to multiple GPUs. Since we use a single GPU during the evaluation, setting these values to some non-zero value should suffice.
Thank you for your reply.
I will try that 