For example, on level 2, if you convince it to output the secret key but it also includes a longer response, I think it should count as being hacked still.
From a practical perspective it only matters that the end user was able to extract the secret key, not if it got some other completion tokens in the response.
Same thing on minor annoyances like, It’s easy to get it to say “I have been PWNED.” but because it has a period the evaluation fails so you have to spend more tokens removing the period.