[Announcement][git error] Large Model Weights in your Submission Repository

Dear All,

We noticed a misconfiguration on our gitlab deployment, which let all the users push files bigger than the allowed limit into the git repository. Which means, many users were actually checking in files as large as 4-5GB into the git history, which in itself has a lot of bad effects downstream.

Ideally, our gitlab server should have had thrown an error when you tried to push. And now after reconfiguring everything at our end, it will.

The recommended way to checkin large model weights etc into your submission repository, is to use git-lfs. More on that here : https://about.gitlab.com/2017/01/30/getting-started-with-git-lfs-tutorial/

If you want to try to remove the large offending files from your git history, you are welcome to also try this :
https://help.github.com/en/articles/moving-a-file-in-your-repository-to-git-large-file-storage

You could also always create a fresh fork of the starter kit, and try to make a new submission from there by checking in your model weights via git-lfs.

remote: fatal: pack exceeds maximum allowed size
error: pack-objects died of signal 13
@mohanty

@ChenKuanSun: Are you on gitter ? It might be better to go over it there.

2019-03-08T16:41:29.918810415Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.
2019-03-08T17:31:30.971471728Z ckpt:  135  MB

Have something wrong?
https://gitlab.aicrowd.com/ChenKuanSun/obg/issues/4

@mohanty the git lfs files are being added to gitlab, but i don’t think the build server is pulling the lfs files.

getting this error

2019-03-13T14:16:44.691582001Z     raise OSError('Not a gzipped file (%r)' % magic)
2019-03-13T14:16:44.691735028Z OSError: Not a gzipped file (b've')

https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/2

When i clone repo with git clone, I also dont get lfs files

I have same problem, but you can try git lfs pull

@banjtheman: I also get this when I try to clone the repository :

Encountered 1 file(s) that may not have been copied correctly on Windows:
	checkpoints/$store$_observation_ckpt.147.gz

I will check with @shivam to debug this issue on gitlab side.
But, in themeantime, the $store$_observation_ckpt* files are not needed for inference. They just store the replay bugger, and you can ignore them, by, removing them from the repository (or gitignoring them). More info here : Starter kit stuck "pending" state for a day

@mohanty i have 200mb ckpt file also can’t upload too.

@mohanty ok getting close created fresh repo with out the $store$ files and confirmed that the image runs the evaluation locally after some minor updates, but getting weird error on build server

DataLossError (see above for traceback): Checksum does not match: stored 3961381255 vs. calculated on the restored bytes 87951365

https://gitlab.aicrowd.com/banjtheman/fresh-obs/issues/2

@ChenKuanSun I found it easier to just delete the repo and start fresh and then use git lfs track before you add the file to the repo.

git init
git remote add YOUR_URL
git lfs install
git lfs track file
git add .
git commit -am “initial commit”
git tag -am “test” tag
git push -u origin master tag

@mohanty @shivam ok after making my build fail I noticed the commands the server is running is not running the correct commands

https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/9

Currently

git clone git@gitlab.aicrowd.com:USER/obstacle-tower-challenge.git
cd obstacle-tower-challenge
git checkout lfs

Which just makes a new detached branch and doesn’t pull down files
The server should use git lfs pull instead to pull down LFS files

git clone git@gitlab.aicrowd.com:USER/obstacle-tower-challenge.git
cd obstacle-tower-challenge
git lfs pull

this is a better and easier solution then the one I had before

@banjtheman : The git checkout lfs there refers to lfs as the tag name that you have specified in your submission.

That said, git-lfs is installed on the build nodes, and the current gitlab has a transparent git-lfs configuration. So ideally it should work out of the box (And has for many other participants in this competition).

I do acknowledge the build issues with your submissions, and will try to get back to you as soon as I can after debugging it closely.
Sorry for all the inconvenience.

Update : We made some changes to our build server configuration. Hopefully some of the LFS related issues would be resolved now. @banjtheman : Please do retry submitting.

Thanks in advance for an update on this thread if this works for you !!
I will buy myself a beer right-away if this works :’))

@mohanty huzzah its going through now, go and enjoy that beer

1 Like

Yayy !

:beer: :beers: :beer: :beers: :beer: :beers: :beer: :beers:

1 Like