I’ve tested this several times inside the generate_answer function and it only takes about 8 seconds to return the output, but the server determines that it has timed out. I suspect that the timing starts when the data is decompressed rather than when the function is called.
@jiazunchen : Can you share a bit more on the hardware you are benchmarking it locally ? Also please do point us to the submission ID where you believe there is a discrepancy in throughput.
The runtime is printed in the log returned by the server and has nothing to do with local testing. This submission id are #253185 #253183 #253142 #253102 #253092