Hello.
My submission fails shortly after startup, while it manages to work fine locally. would it be possible to have the logs needed for debugging?
Thank you
Hello.
My submission fails shortly after startup, while it manages to work fine locally. would it be possible to have the logs needed for debugging?
Thank you
Dear @anass_elidrissi,
here is the final part of the agent log:
2020-10-11T20:01:57.289113523Z 2020-10-11 20:01:57.288914: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2020-10-11T20:01:57.667584462Z [REALRobot] Copying over data into pybullet_data_path.This is a one time operation.
2020-10-11T20:01:57.667603822Z 1 Physical GPUs
2020-10-11T20:01:57.667607494Z 1 Logical GPUs
2020-10-11T20:01:57.667610593Z INFO:matplotlib.font_manager:generated new fontManager
2020-10-11T20:01:57.818557511Z pybullet build time: Oct 8 2020 00:10:04
2020-10-11T20:01:57.81892566Z Traceback (most recent call last):
2020-10-11T20:01:57.818944719Z File "1050662482_evaluation.py", line 4, in <module>
2020-10-11T20:01:57.818948903Z from my_controller import SubmittedPolicy
2020-10-11T20:01:57.818951424Z File "/home/aicrowd/my_controller.py", line 2, in <module>
2020-10-11T20:01:57.81895421Z from baseline.policy import Baseline
2020-10-11T20:01:57.818956597Z File "/home/aicrowd/baseline/policy.py", line 6, in <module>
2020-10-11T20:01:57.818959236Z import baseline.explorer as exp
2020-10-11T20:01:57.818961451Z File "/home/aicrowd/baseline/explorer.py", line 1, in <module>
2020-10-11T20:01:57.818963879Z from baseline.curiosity import Curiosity
2020-10-11T20:01:57.818966122Z File "/home/aicrowd/baseline/curiosity.py", line 1, in <module>
2020-10-11T20:01:57.818968519Z import torch
2020-10-11T20:01:57.818970835Z ModuleNotFoundError: No module named 'torch'
It seems torch is not installed when the agent is evaluated.
You have to modify environment.yml so that all requirements of your code are included.
It is suggested to work using a Conda environment and then exporting it to the environment.yml so that the submission has always all the modules it needs.
See Setup and How do I specify my software runtime? sections in https://github.com/AIcrowd/REAL2020_starter_kit
Let me know if you need further assistance.
Hello,
Anassâs teammate here. We no longer get the agent error that apparently came for the environment.yml but this now itâs âerrorâ instead. Itâs weird since it works perfectly on my ubuntu environment even after clearning Anaconda and only executing the environment.yml.
Is it possible to have a look at the logs again?
Thanks in advance and have a good day
Oussama
Dear @boussif_oussama,
here is the agent log:
2020-10-13T20:52:13.794930498Z 2020-10-13 20:52:13.794721: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2020-10-13T20:52:14.180614548Z [REALRobot] Copying over data into pybullet_data_path.This is a one time operation.
2020-10-13T20:52:14.180641021Z 1 Physical GPUs
2020-10-13T20:52:14.180644362Z 1 Logical GPUs
2020-10-13T20:52:14.1806469Z INFO:matplotlib.font_manager:generated new fontManager
2020-10-13T20:52:14.582229999Z /srv/conda/envs/notebook/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: e[33mWARN: Box bound precision lowered by casting to float32e[0m
2020-10-13T20:52:14.582253826Z warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
2020-10-13T20:52:14.582698571Z [WARNING] Skipping Intrinsic Phase as intrinsic_timesteps = 0 or False
2020-10-13T20:52:14.582704243Z ######################################################
2020-10-13T20:52:14.582706577Z # Extrinsic Phase Initiated
2020-10-13T20:52:14.582709014Z ######################################################
2020-10-13T20:52:14.582711317Z pybullet build time: Oct 8 2020 00:10:04
2020-10-13T20:52:14.584499436Z
Extrinsic Phase: 0%| | 0/5 [00:00<?, ?trials /s]
Extrinsic Phase: 0%| | 0/5 [00:00<?, ?trials /s]
Extrinsic Phase: 0%| | 0/5 [00:00<?, ?trials /s]
Extrinsic Phase: 0%| | 0/5 [00:00<?, ?trials /s]Traceback (most recent call last):
2020-10-13T20:52:14.584520268Z File "1050662482_evaluation.py", line 26, in <module>
2020-10-13T20:52:14.584524556Z goals_dataset_path=DATASET_PATH
2020-10-13T20:52:14.584526997Z File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 424, in evaluate
2020-10-13T20:52:14.584529811Z evaluation_service.run_extrinsic_phase()
2020-10-13T20:52:14.584532269Z File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 320, in run_extrinsic_phase
2020-10-13T20:52:14.584534824Z raise e
2020-10-13T20:52:14.584553011Z File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 314, in run_extrinsic_phase
2020-10-13T20:52:14.584555699Z self._run_extrinsic_phase()
2020-10-13T20:52:14.584557957Z File "/srv/conda/envs/notebook/lib/python3.6/site-packages/real_robots/evaluate.py", line 340, in _run_extrinsic_phase
2020-10-13T20:52:14.584560498Z self.controller.start_extrinsic_phase()
2020-10-13T20:52:14.584562701Z File "/home/aicrowd/baseline/policy.py", line 530, in start_extrinsic_phase
2020-10-13T20:52:14.584565042Z self.planner = Planner(allAbstractedActions)
2020-10-13T20:52:14.584567343Z File "/home/aicrowd/baseline/planner.py", line 45, in __init__
2020-10-13T20:52:14.584569781Z self.abstractor = abstr.DynamicAbstractor(actions)
2020-10-13T20:52:14.584571977Z File "/home/aicrowd/baseline/abstractor.py", line 256, in __init__
2020-10-13T20:52:14.584574454Z if len(actions[0]) != 3:
2020-10-13T20:52:14.584576742Z IndexError: list index out of range
2020-10-13T20:52:14.809836903Z
Extrinsic Phase: 0%| | 0/5 [00:00<?, ?trials /s]
Looking at the error I would guess that the variable actions passed to the Dynamic abstractor is empty.
This may happen because the intrinsic phase was not run and no action was loaded from a file, so the variable actions stayed empty.
During Round1 and Round2, only the extrinsic phase is evaluated for the submissions - the intrinsic phase must be run previously on your own computer and the results saved so that your algorithm can reload and use them when the extrinsic phase is run.
As an example, the Baseline algorithm saves at the end of the intrinsic phase a transitions file and then uses it for the extrinsic phase if the baseline/config.yaml is configured to do so (use_experience_data = True and the name of the transition file set in experience_data).
Dear ec_ai team,
The submission part is now working without errors, however, the score on the leaderboard and the one displayed in the âsubmissionâ tab is 0 for all configurations which doesnât correspond to the score we get for the last submission v0.9 tag. Is it something we missed in the config too?
We also only sent the transitions file for 15e5 iterations because the one with 15e6 would simply not be pushed (3.5GB file) and it gives rise to:
fatal: Out of memory, malloc failed (tried to allocate x bytes)
Is this problem coming from our submission or is the host not handling it well?
Also, I think I might have missed it somewhere, but the extrinsic trials is only 5 for the submission, isnât it supposed to run for 50 trials?
Thank you in advance for answering our questions.
AICrows team
Dear @boussif_oussama,
if it says 5 trials and 0 score, then it must have been a debug submission.
In aicrowd.json set debug to False and it will run the whole 50 extrinsic trials (and you will get a score > 0 in the leaderboard).
I will have a check at the Out of Memory error.
For the 3.5GB file, have a look at How to upload large files (size) to your submission
Large files can be uploaded to the AICrowd repository using git-lfs.