I wonder if some of you will accept to share some tips or references that could help a beginner facing the video transcriptions problem ♜♘
What I did so far is capturing images from each video (my notebook is here).
Do you think I could build a model on this new dataset? Maybe some pre-trained models already exist? Probably I have to build a new one … I also got a solution that give the FEN notation from a chessboard picture, maybe it should be used to compare successive pictures from a same video? Like building a quite complex function that give chess moves from the first notation to the second …
I’m not sure I will have time to explore all these ideas, that’s why I ask you some comments to help my priorise some directions.
Thank you in advance!