Help with video transcriptions

Hi everyone,

I wonder if some of you will accept to share some tips or references that could help a beginner facing the video transcriptions problem :smile: β™œβ™˜

What I did so far is capturing images from each video (my notebook is here).
Do you think I could build a model on this new dataset? Maybe some pre-trained models already exist? Probably I have to build a new one … I also got a solution that give the FEN notation from a chessboard picture, maybe it should be used to compare successive pictures from a same video? Like building a quite complex function that give chess moves from the first notation to the second …

I’m not sure I will have time to explore all these ideas, that’s why I ask you some comments to help my priorise some directions.

Thank you in advance!