🧞 Requesting feedback and suggestions

Hi all,

We are constantly trying to make this competition better for everyone and would really appreciate your feedback. :raised_hands:

Feel free to reply to this thread with your suggestions and feedback on making the competition better for you!

  • What have been your major pain points so far?
  • What would you like to see improved?

Cheers!


:point_right: In case you missed it, here is the thread containing FAQs and common mistakes.

1 Like

Hey Shivam,

For peeps who are using aws ec2 instances for training, it would be great to mount the training data bucket using s3fs.

Although partial dataset download is a useful feature for playing around initially, one has to delete and download another partial chunk iteratively (while making sure same sequences aren’t downloaded) for full fledged training. It is quite vexing. The dataset is too huge

If the organizers can provide read only access to the data buckets, training process would become much simpler.

Please let me know what you think.

Regards,
Suraj

2 Likes

Hi @suraj_bonagiri,

The dataset is already available as public s3.

Here is an example notebook showing how to use s3fs for accessing the dataset (and mounting it, etc).

I hope it helps!

:point_right: https://colab.research.google.com/drive/1sOms7aoTJSudL5XHRhrQbKBBICvkKZ5D?usp=sharing

Hey @shivam,

We have tried that earlier. We get the following:

cannot open directory 'data/part1': Operation not permitted
cannot open directory 'data/part2': Operation not permitted
cannot open directory 'data/part3': Operation not permitted

I think a read only access key pair must be generated and provided to the participants to access the mounted data.

Please let me know what you think.

Regards,
Suraj

@suraj_bonagiri Please recheck the example Colab notebook, did you mount using -o public_bucket=1,ro?

In case you are accessing using non-sudo user in your machine, you can do -o public_bucket=1,ro,allow_other for getting proper access.

Public buckets don’t need read-only access key pair for s3fs to work.

Hey @shivam,

Yes, I followed the one mentioned in colab.

sudo was the problem. Even with allow_other, I had to use sudo.

It works now. Thank you!

1 Like

Awesome. :raised_hands:

Optional, in case you want to skip using sudo you can checkout umask parameter and set it to your non-sudo user id.

Oh okay. Will try that. Thanks!

Hi Shivam,
Can I still access the dataset in public S3 bucket After the competition?

Hi @xydy666,

Yes, the dataset will be available after the challenge as well.

With best regards and heartfelt thanks for your team to provide us with such valuable data!