How can i get the 97D body state in the evaluation environment?

gupengju · July 15, 2019, 3:23am

In order to train a model locally and visualizably, using the following code we get an dict observation_2 that contains 4 keys and 339 numbers. These numbers contains an 2×11×11 2D target velocities on an 11×11 grid and the 97D body state(242+97=339).

[observation_2, reward, done, info] = env.step(_action,obs_as_dict=True)

Here it says S is a 97D vector representing the body state.

But in the evaluation environment which is not visualizable, we can use the following code to get an dict observation containing 14 keys and 688 numbers,

[observation, reward, done, info] = client.env_step(_action)

And when I use the first code to get the observation_2, it reports the following error:

Attempt to call step function after max_steps=1000 in a single simulation. Please reset your environment before calling the step function after max_step s

and here is the question: how can i get the 97D body state in the evaluation environment?

thank you very much.

smsong · July 15, 2019, 7:10pm

Sorry about the confusion. We recently noticed this issue and am working on it so that the evaluation environment will give the same observation dictionary as the current local environment. We will let you know once this is solved.

schatty · July 25, 2019, 2:10pm

Hello! Can you please specify, will it be possible to obtain 14-keys observation dict from evaluation client in future (without errors), or should we operate with 4-keys in both training and evaluation? Thanks

smsong · July 27, 2019, 8:25pm

The evaluation environment will be with the dictionary with 4 keys.

gupengju · August 1, 2019, 9:29am

In the following code,

observation = client.env_create()
while True:
    print(observation.keys())
    observation_np=obs2np(observation)
    observation_T=torch.from_numpy(observation_np).float().to(torch.device('cuda'))
    _action = model.target_policy(observation_T)
    [observation, reward, done, info] = client.env_step(_action.cpu())
    
    if done:
        observation = client.env_reset()
        if not observation:
            break
client.submit()

client.env_create() and client.env_reset() can get a dict of 4 keys
but the function client.env_step() still get a dict of 14 keys

How can i fixed this code.

thanks a lot.

gupengju · August 1, 2019, 6:20pm

or how can i convert the 14-key dict to the 4-key one?

smsong · August 6, 2019, 1:34am

@gupengju The problem should’ve been solved (let us know if it not). Also, note that now you have another way to submit your solution: https://www.aicrowd.com/organizers/stanford-neuromuscular-biomechanics-laboratory/challenges/neurips-2019-learn-to-move-walk-around#get-started

gupengju · August 6, 2019, 10:14pm

Thanks a lot. The docker image aicrowd/neurips-learning-to-move-subcontractor:latest has updated and the local grader runs successfully now. And the new submit option is more convenient,