Reducing tolerance for value grid, please resubmit your code

Hi all,

There was a lot of confusion in Value Iteration not matching the expected results. This is due to small implementation details mismatching. We’ve decided to significantly reduce the tolerance for checking the value_grid. We’ll try to provide better debug tools and anticipate issues like these from the next assignment.

Please resubmit your code.

Note that the policy still needs to match exactly.

As your TAs must have communicated, please follow this algorithm to get the correct results.

image

Hi! I tried using the provided stopping criteria. My policies match exactly for the 3 test cases and my J matches with a mean absolute difference of at most 0.005. However, I am still unable to get a non-zero score. Can you please look into this?

1 Like

im having the same issue, please let us know what is the scoring criteria

The algorithm you have provided seems to be of Gauss Scheidel Value Iteration. I implemented this for standard state order and it is not matching. Gauss scheidel VI depends on order in which states are iterated through. Please clarify this.

3 Likes

Hi nischith_shadagopan

Sorry for the confusion in notation, yes its supposed to be standard VI not Gauss Scheidel.

Sir, I’m still getting a small difference between the targets given in the test cases, using both standard VI and Gauss Scheidel, and hence not getting a perfect score.

Hi suhas_pai_cs17b116,

If you implement standard VI with the correct stopping condition you should get the correct score as everyone else has. The tolerance has been relaxed significantly.

In any case please share the submission id I can review it manually.