Reducing tolerance for value grid, please resubmit your code

dipam · March 13, 2021, 5:16am

Hi all,

There was a lot of confusion in Value Iteration not matching the expected results. This is due to small implementation details mismatching. We’ve decided to significantly reduce the tolerance for checking the value_grid. We’ll try to provide better debug tools and anticipate issues like these from the next assignment.

Please resubmit your code.

Note that the policy still needs to match exactly.

As your TAs must have communicated, please follow this algorithm to get the correct results.

OpenLoopPolicy · March 13, 2021, 7:48am

Hi! I tried using the provided stopping criteria. My policies match exactly for the 3 test cases and my J matches with a mean absolute difference of at most 0.005. However, I am still unable to get a non-zero score. Can you please look into this?

Mizhaan · March 13, 2021, 9:11am

im having the same issue, please let us know what is the scoring criteria

nischith_shadagopan · March 13, 2021, 10:21am

The algorithm you have provided seems to be of Gauss Scheidel Value Iteration. I implemented this for standard state order and it is not matching. Gauss scheidel VI depends on order in which states are iterated through. Please clarify this.

dipam · March 13, 2021, 4:09pm

Hi nischith_shadagopan

Sorry for the confusion in notation, yes its supposed to be standard VI not Gauss Scheidel.

suhas_pai_cs17b116 · March 13, 2021, 4:14pm

Sir, I’m still getting a small difference between the targets given in the test cases, using both standard VI and Gauss Scheidel, and hence not getting a perfect score.

dipam · March 14, 2021, 5:24am

Hi suhas_pai_cs17b116,

If you implement standard VI with the correct stopping condition you should get the correct score as everyone else has. The tolerance has been relaxed significantly.

In any case please share the submission id I can review it manually.