This is a really interesting question! TIL about CA vs. Altai and the abstraction-filtration-comparison test.
I'm not sure how automatable it is. Interesting to try though!
This is a really interesting question! TIL about CA vs. Altai and the abstraction-filtration-comparison test.
I'm not sure how automatable it is. Interesting to try though!
@evan That’s not enough code for copyright enforcement. People have been finding identical code in the output - you just need something “rare”. It’s similar for subjects with little text in the corpus - I’ve been seeing listings that *can only have one source* (retro datasheets by AMD, in my case).
@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/
@cwebber it's sometimes a distinction that people blur!
But maybe that's wrong; I don't know. Maybe if I wrote a Person.setName() method that was in the training set, and the LLM generated an identical Person.setName() code snippet for someone else, I could claim that the code is a copyright violation, even if there were thousands of other identical and independent Person.setName() methods in the training set.
@evan @richardfontana @bkuhn @ossguy Sorry, I missed a word when I edited the sentence, I meant "genAI output"
@cwebber the weights themselves?
I think the worst case scenario is that the inserted code matches exactly one snippet in the training data.
So you could try to go for zero matches, by using such idiosyncratic and unrecommended coding conventions that nobody else has code like yours.
Or you could try to go for lots of matches, by using bog standard coding conventions and software patterns.
@evan
@cwebber @richardfontana @bkuhn @ossguy or just... not at all
@evan @richardfontana @bkuhn @ossguy Yeah! I actually already said elsewhere in the thread I don't think we need to worry about using these tools for such scenarios from a *licensing* perspective, only when the genAI is explicitly checked into the codebase