Can the agent get reward repeatedly?

#1

In “MinerlObtainDiamond-v0” environment.

When agent get one stick, it has reward 4.
But if agent get two stick, can it has reward 8 ?
It seems that the current evalutation environment will give repeat reward.

#2

Not sure if the organizers have fixed this bug …
so many bugs …

#3

Did he talk about this evaluation_locally.sh?
I can’t find anything about ObtainDiamond or ObtainDiamondDense.
By the way, in “MinerlObtainDiamond-v0” environment agent is rewarded repeatedly on my computer. Is it a bug? How to fix it …

This is evaluation_locally.sh.

#!/bin/bash
set -e


AICROWD_DATA_ENABLED="YES"
if [[ " $@ " =~ " --no-data " ]]; then
   AICROWD_DATA_ENABLED="NO"
else
    python3 ./utility/verify_or_download_data.py
fi


EXTRAOUTPUT=" > /dev/null 2>&1 "
if [[ " $@ " =~ " --verbose " ]]; then
   EXTRAOUTPUT=""
fi



# Run local name server
eval "pyro4-ns $EXTRAOUTPUT &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT

# Run instance manager to generate performance report
export EVALUATION_STAGE='manager'
eval "python3 run.py --seeds 1 $EXTRAOUTPUT &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT

# Run the evaluation
sleep 2
export MINERL_INSTANCE_MANAGER_REMOTE="1"
export EVALUATION_STAGE='testing'
export EVALUATION_RUNNING_ON='local'
export EXITED_SIGNAL_PATH='shared/exited'
rm -f $EXITED_SIGNAL_PATH
export ENABLE_AICROWD_JSON_OUTPUT='False'
eval "python3 run.py $EXTRAOUTPUT && touch $EXITED_SIGNAL_PATH || touch $EXITED_SIGNAL_PATH &"
trap "kill -11 $! > /dev/null 2>&1;" EXIT

# View the evaluation state
export ENABLE_AICROWD_JSON_OUTPUT='True'
python3 utility/parser.py || true
kill $(jobs -p)
Evaluation environment
#4

@shadowyzy you can find the information about environments here:

MineRLObtainDiamond-v0

During an episode the agent is rewarded only once per item the first time it obtains that item in the requisite item hierarchy for obtaining an iron pickaxe.

MineRLObtainDiamondDense-v0

During an episode the agent is rewarded every time it obtains an item in the requisite item hierarchy to obtaining a diamond.


If this is true, it might be that ObtainDiamond and ObtainDiamondDense is reversed? Not sure what’s going on exactly :thinking:

#5

My agent run in “ObtainDiamond”, and it is rewarded repeatedly
It seems that the environment has some bug about it