If I were going to productize this, I'd do AF passes on a huge training dataset like The Stack and generate some kind of fingerprint for each program. (Estimated cost: billions!)
https://huggingface.co/datasets/bigcode/the-stack
Then, I'd have a tool to let you fingerprint your own code and C it against the big database -- maybe give you a list of high-similarity codebases.
And you could re-run the comparison each time you push to Git -- maybe only Cing what changed.