ether+nick

I gave it a try. It's quite wordy! Claude thought that a lot of Pilgrim's work would be filtered since it was a direct port from the Mozilla C++ codebase. I pushed back that they shared the same license, and it loosened up that constraint.

claude.ai/share/e4aae73c-14d1-

Warning: if you read this document, it will get AI in you, and it will make you AI and you will become an AI-booster like me and Sam Altman. It will also burn down the rainforest.

@bkuhn @richardfontana @cwebber @ossguy

@evan I can't help thinking you are kind of of positively biased towards France and French people? DeeAnn is so facinated by this law, she is investigating the backgrounds...

"it doesn't matter what the code is all that matters is that it runs and produces the desired result"

liar. what you're saying is bullshit. you know it's bullshit and you're saying it anyway

If I were going to productize this, I'd do AF passes on a huge training dataset like The Stack and generate some kind of fingerprint for each program. (Estimated cost: billions!)

huggingface.co/datasets/bigcod

Then, I'd have a tool to let you fingerprint your own code and C it against the big database -- maybe give you a list of high-similarity codebases.

And you could re-run the comparison each time you push to Git -- maybe only Cing what changed.

@bkuhn @richardfontana @cwebber @ossguy

@bkuhn I just did an abstraction and filtration pass on a medium-sized application framework (~30K LOC), and as an expert on the code I think it did a good job:

claude.ai/share/071ccb69-5d22-

It missed a few things (e.g. relay specs). Then again, I have no idea how this kind of review is supposed to work. I didn't go down to the function or statement level -- that'd probably be much noisier.

Maybe chardet 2 and 7 would be a better test of the technique?

@richardfontana @cwebber @ossguy

@evan

I actually think that these copyright concepts aren't particularly automatable.
Even if we try, it's pure arms race.

And the merger doctrine isn't the big problem here, it is the more complex analysis where merger doctrine clearly doesn't apply that needs analysis and I suspect the analysis is difficult to (even partially) automate.

But I'm looking into it.

Cf: chardet situation github.com/chardet/chardet/iss

@richardfontana @cwebber @ossguy