Do evalution levels appear in training phase?

In the overview of this competition, it says, 'When evaluating generalization, we will provide participants 200 levels from each environment during the training phase. ’
However, the Procgen paper section3 states ‘… and we measure performance on held out levels’.
So will evaluation levels appear in the training set…?

Round 1 is on complete distribution of levels, the held out levels you’re describing will be in the next round. In that context, there are 2^32 levels for each game, which makes repeating levels unlikely, but the training encounters many more levels than if the number of levels is explicitly limited to 200. This means it generalizes much better when trained on the fully distribution of levels. Round 1 is all about sample efficiency to get a higher score on all 4 environments with the same algorithm.

Thanks for the clarification!