Requesting logs

anass_elidrissi · October 9, 2020, 7:59pm

Hello.

My submission fails shortly after startup, while it manages to work fine locally. would it be possible to have the logs needed for debugging?

Thank you

ec_ai · October 12, 2020, 9:51am

Dear @anass_elidrissi,
here is the final part of the agent log:

2020-10-11T20:01:57.289113523Z 2020-10-11 20:01:57.288914: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2020-10-11T20:01:57.667584462Z [REALRobot] Copying over data into pybullet_data_path.This is a one time operation.
2020-10-11T20:01:57.667603822Z 1 Physical GPUs
2020-10-11T20:01:57.667607494Z 1 Logical GPUs
2020-10-11T20:01:57.667610593Z INFO:matplotlib.font_manager:generated new fontManager
2020-10-11T20:01:57.818557511Z pybullet build time: Oct  8 2020 00:10:04
2020-10-11T20:01:57.81892566Z Traceback (most recent call last):
2020-10-11T20:01:57.818944719Z   File "1050662482_evaluation.py", line 4, in <module>
2020-10-11T20:01:57.818948903Z     from my_controller import SubmittedPolicy
2020-10-11T20:01:57.818951424Z   File "/home/aicrowd/my_controller.py", line 2, in <module>
2020-10-11T20:01:57.81895421Z     from baseline.policy import Baseline
2020-10-11T20:01:57.818956597Z   File "/home/aicrowd/baseline/policy.py", line 6, in <module>
2020-10-11T20:01:57.818959236Z     import baseline.explorer as exp
2020-10-11T20:01:57.818961451Z   File "/home/aicrowd/baseline/explorer.py", line 1, in <module>
2020-10-11T20:01:57.818963879Z     from baseline.curiosity import Curiosity
2020-10-11T20:01:57.818966122Z   File "/home/aicrowd/baseline/curiosity.py", line 1, in <module>
2020-10-11T20:01:57.818968519Z     import torch
2020-10-11T20:01:57.818970835Z ModuleNotFoundError: No module named 'torch'

It seems torch is not installed when the agent is evaluated.
You have to modify environment.yml so that all requirements of your code are included.

It is suggested to work using a Conda environment and then exporting it to the environment.yml so that the submission has always all the modules it needs.
See Setup and How do I specify my software runtime? sections in https://github.com/AIcrowd/REAL2020_starter_kit
Let me know if you need further assistance.

boussif_oussama · October 13, 2020, 8:59pm

Hello,

Anass’s teammate here. We no longer get the agent error that apparently came for the environment.yml but this now it’s ‘error’ instead. It’s weird since it works perfectly on my ubuntu environment even after clearning Anaconda and only executing the environment.yml.

Is it possible to have a look at the logs again?

Thanks in advance and have a good day

Oussama

ec_ai · October 14, 2020, 12:15am

Dear @boussif_oussama,
here is the agent log:

2020-10-13T20:52:13.794930498Z 2020-10-13 20:52:13.794721: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2020-10-13T20:52:14.180614548Z [REALRobot] Copying over data into pybullet_data_path.This is a one time operation.
2020-10-13T20:52:14.180641021Z 1 Physical GPUs
2020-10-13T20:52:14.180644362Z 1 Logical GPUs
2020-10-13T20:52:14.1806469Z INFO:matplotlib.font_manager:generated new fontManager
2020-10-13T20:52:14.582229999Z /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: e[33mWARN: Box bound precision lowered by casting to float32e[0m
2020-10-13T20:52:14.582253826Z   warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
2020-10-13T20:52:14.582698571Z [WARNING] Skipping Intrinsic Phase as intrinsic_timesteps = 0 or False
2020-10-13T20:52:14.582704243Z ######################################################
2020-10-13T20:52:14.582706577Z # Extrinsic Phase Initiated
2020-10-13T20:52:14.582709014Z ######################################################
2020-10-13T20:52:14.582711317Z pybullet build time: Oct  8 2020 00:10:04
2020-10-13T20:52:14.584499436Z 
Extrinsic Phase:   0%|          | 0/5 [00:00<?, ?trials /s]
                                                           

Extrinsic Phase:   0%|          | 0/5 [00:00<?, ?trials /s]
                                                           

Extrinsic Phase:   0%|          | 0/5 [00:00<?, ?trials /s]
                                                           

Extrinsic Phase:   0%|          | 0/5 [00:00<?, ?trials /s]Traceback (most recent call last):
2020-10-13T20:52:14.584520268Z   File "1050662482_evaluation.py", line 26, in <module>
2020-10-13T20:52:14.584524556Z     goals_dataset_path=DATASET_PATH
2020-10-13T20:52:14.584526997Z   File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 424, in evaluate
2020-10-13T20:52:14.584529811Z     evaluation_service.run_extrinsic_phase()
2020-10-13T20:52:14.584532269Z   File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 320, in run_extrinsic_phase
2020-10-13T20:52:14.584534824Z     raise e
2020-10-13T20:52:14.584553011Z   File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 314, in run_extrinsic_phase
2020-10-13T20:52:14.584555699Z     self._run_extrinsic_phase()
2020-10-13T20:52:14.584557957Z   File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 340, in _run_extrinsic_phase
2020-10-13T20:52:14.584560498Z     self.controller.start_extrinsic_phase()
2020-10-13T20:52:14.584562701Z   File "/home/aicrowd/baseline/policy.py", line 530, in start_extrinsic_phase
2020-10-13T20:52:14.584565042Z     self.planner = Planner(allAbstractedActions)
2020-10-13T20:52:14.584567343Z   File "/home/aicrowd/baseline/planner.py", line 45, in __init__
2020-10-13T20:52:14.584569781Z     self.abstractor = abstr.DynamicAbstractor(actions)
2020-10-13T20:52:14.584571977Z   File "/home/aicrowd/baseline/abstractor.py", line 256, in __init__
2020-10-13T20:52:14.584574454Z     if len(actions[0]) != 3:
2020-10-13T20:52:14.584576742Z IndexError: list index out of range
2020-10-13T20:52:14.809836903Z 
Extrinsic Phase:   0%|          | 0/5 [00:00<?, ?trials /s]

ec_ai · October 14, 2020, 8:48am

Looking at the error I would guess that the variable actions passed to the Dynamic abstractor is empty.
This may happen because the intrinsic phase was not run and no action was loaded from a file, so the variable actions stayed empty.

During Round1 and Round2, only the extrinsic phase is evaluated for the submissions - the intrinsic phase must be run previously on your own computer and the results saved so that your algorithm can reload and use them when the extrinsic phase is run.

As an example, the Baseline algorithm saves at the end of the intrinsic phase a transitions file and then uses it for the extrinsic phase if the baseline/config.yaml is configured to do so (use_experience_data = True and the name of the transition file set in experience_data).

boussif_oussama · October 15, 2020, 8:03pm

Dear ec_ai team,

The submission part is now working without errors, however, the score on the leaderboard and the one displayed in the “submission” tab is 0 for all configurations which doesn’t correspond to the score we get for the last submission v0.9 tag. Is it something we missed in the config too?

We also only sent the transitions file for 15e5 iterations because the one with 15e6 would simply not be pushed (3.5GB file) and it gives rise to:

fatal: Out of memory, malloc failed (tried to allocate x bytes)

Is this problem coming from our submission or is the host not handling it well?

Also, I think I might have missed it somewhere, but the extrinsic trials is only 5 for the submission, isn’t it supposed to run for 50 trials?

Thank you in advance for answering our questions.

AICrows team

ec_ai · October 15, 2020, 8:27pm

Dear @boussif_oussama,
if it says 5 trials and 0 score, then it must have been a debug submission.
In aicrowd.json set debug to False and it will run the whole 50 extrinsic trials (and you will get a score > 0 in the leaderboard).

I will have a check at the Out of Memory error.

ec_ai · October 15, 2020, 11:29pm

For the 3.5GB file, have a look at How to upload large files (size) to your submission
Large files can be uploaded to the AICrowd repository using git-lfs.