I also used query expansion with a pagerank inspired thresholding, it gave a .03 boost to the score.
I want to ask how did you fit the ViT-H model in the 10 min inference, I tried using it through timm, and jit compiled, but could never fit it in time.
Unfortunately, I couldn’t do any post or pre-processing during the inference because ViT-H was too slow. I tried to improve the speed of my models by pruning and exporting them to ONNX, and also by using jit compilation, but neither of these methods resulted in a noticeable speed-up. It’s possible that I did something wrong, or that any improvements were only visible on the CPU rather than the GPU.
I was planning to train a CLIP model with the Amazon dataset since they have product descriptions but I didn’t have any time to experiment with it.
We attempted to incorporate multiple external datasets into our experiments, spending considerable time trying to train our VIT-H model jointly with Product10k and other datasets, as well as training on other datasets and fine-tuning on Product10k. Surprisingly, despite our efforts, our current leaderboard score was achieved only by using the Product10k dataset; all other datasets resulted in a decrease in our score.
To improve our results, we utilized re-ranking for postprocessing, which gave us a marginal improvement of approximately 0.01%. Additionally, we experimented with convnext and VIT-G models, which boosted our local score by about 0.03%. However, even with the use of TensorRT, our models were unable to pass inference in 10 minutes.
Did you convert the model to TensorRT before or during the inference? Because our final solution uses convnext xxlarge with image size of 256 and embedding of dimm 2048.
To do this, we used nvidia’s docker image with half precision and reranking on GPU
We have tried TRT before inference in our server. We also use reranking on GPU, but maybe this was longer…
even with VIT-H + reranking our solution was almost 10min, cause in some cases it failed and in some cases run successfully, depending on hardware.