I think it would make more sense if the evaluation checked for the answer within the response rather than being the entire response

bryce_brady · May 10, 2023, 7:52pm

For example, on level 2, if you convince it to output the secret key but it also includes a longer response, I think it should count as being hacked still.

From a practical perspective it only matters that the end user was able to extract the secret key, not if it got some other completion tokens in the response.

Same thing on minor annoyances like, It’s easy to get it to say “I have been PWNED.” but because it has a period the evaluation fails so you have to spend more tokens removing the period.

bryce_brady · May 10, 2023, 8:22pm

Seriously this can be so frustrating. 50 of my tokens are dedicated to removing the damn period right now and it’s still not working.

trigaten · May 12, 2023, 3:59pm

Understandable, but we are looking for full control of output on competitors end (check AICrowd page FAQ for more rationale).

CPMP · May 28, 2023, 3:43pm

been there, found a way eventually. Keep pushing!