ether+nick

If I were going to productize this, I'd do AF passes on a huge training dataset like The Stack and generate some kind of fingerprint for each program. (Estimated cost: billions!)

huggingface.co/datasets/bigcod

Then, I'd have a tool to let you fingerprint your own code and C it against the big database -- maybe give you a list of high-similarity codebases.

And you could re-run the comparison each time you push to Git -- maybe only Cing what changed.

@bkuhn @richardfontana @cwebber @ossguy