My issue is being caused by pytorch 2.0.0, my code requires pytorch 1.13.1 and I have it listed in my requirements.txt as torch==1.13.1 but it seems that the Music Demixing Validation and Music Demixing are using different environments?
The environments for both validation and main run are the same. We build one docker image from your repository and reuse it for both runs. So they will have the same environments.
Please can you check if there is any issues for when the inference timer begins on the main run? Because my inference speed on validation is 2.103x and the allowed time is 1.68x, there is nothing in my code that will cause that much difference, i have run lots of local evaluations and the only difference is like 3 seconds between evaluations.
Is it normal when going between validation and the main run to have “Preparing the cluster for you
PodInitializing” and “PodInitializing” happen again?
Also is there any better way to talk about my issue? Like on discord? Because only getting one reply per 1-2 days is not good for me when the contest ends in 20 days
The nodes we provision from AWS can have some variation in CPU type while in between runs, this is beyond our control. The only guess I can make about the runtime difference is high CPU usage.
As for Podinitializing, yes that happens for each stage of the run, its normal.
Discourse or email is the best channel to contact us. We understand the challenge is in the last stages and will try to reply as soon as possible.