In “MinerlObtainDiamond-v0” environment.
When agent get one stick, it has reward 4.
But if agent get two stick, can it has reward 8 ?
It seems that the current evalutation environment will give repeat reward.
In “MinerlObtainDiamond-v0” environment.
When agent get one stick, it has reward 4.
But if agent get two stick, can it has reward 8 ?
It seems that the current evalutation environment will give repeat reward.
Not sure if the organizers have fixed this bug …
so many bugs …
Did he talk about this evaluation_locally.sh?
I can’t find anything about ObtainDiamond or ObtainDiamondDense.
By the way, in “MinerlObtainDiamond-v0” environment agent is rewarded repeatedly on my computer. Is it a bug? How to fix it …
This is evaluation_locally.sh.
#!/bin/bash
set -e
AICROWD_DATA_ENABLED="YES"
if [[ " $@ " =~ " --no-data " ]]; then
AICROWD_DATA_ENABLED="NO"
else
python3 ./utility/verify_or_download_data.py
fi
EXTRAOUTPUT=" > /dev/null 2>&1 "
if [[ " $@ " =~ " --verbose " ]]; then
EXTRAOUTPUT=""
fi
# Run local name server
eval "pyro4-ns $EXTRAOUTPUT &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT
# Run instance manager to generate performance report
export EVALUATION_STAGE='manager'
eval "python3 run.py --seeds 1 $EXTRAOUTPUT &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT
# Run the evaluation
sleep 2
export MINERL_INSTANCE_MANAGER_REMOTE="1"
export EVALUATION_STAGE='testing'
export EVALUATION_RUNNING_ON='local'
export EXITED_SIGNAL_PATH='shared/exited'
rm -f $EXITED_SIGNAL_PATH
export ENABLE_AICROWD_JSON_OUTPUT='False'
eval "python3 run.py $EXTRAOUTPUT && touch $EXITED_SIGNAL_PATH || touch $EXITED_SIGNAL_PATH &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT
# View the evaluation state
export ENABLE_AICROWD_JSON_OUTPUT='True'
python3 utility/parser.py || true
kill $(jobs -p)
@shadowyzy you can find the information about environments here:
During an episode the agent is rewarded only once per item the first time it obtains that item in the requisite item hierarchy for obtaining an iron pickaxe.
During an episode the agent is rewarded every time it obtains an item in the requisite item hierarchy to obtaining a diamond.
If this is true, it might be that ObtainDiamond and ObtainDiamondDense is reversed? Not sure what’s going on exactly
My agent run in “ObtainDiamond”, and it is rewarded repeatedly
It seems that the environment has some bug about it