Training data can be used "regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise."
I was skeptical of this, but it checks out: I easily got ChatGPT to print out the full text to The Tell-Tale Heart, without any errors at all in the various spots I accuracy-checked.
Granted I chose it because it’s a very short public domain work - I was more skeptical of its technical ability to recall the exact text without errors than of the ability to trick it into violating copyright law.
I still suspect it’s much easier to (accidentally) trick it into writing a fanfiction of a copyrighted work that it claims is the original than it is to get it to produce the true original, though.
Your argument that it is useful as a copyright infringing machine is that it can reproduce a public domain work? That’s… not the argument you think it is.
I was skeptical of this, but it checks out: I easily got ChatGPT to print out the full text to The Tell-Tale Heart, without any errors at all in the various spots I accuracy-checked.
Granted I chose it because it’s a very short public domain work - I was more skeptical of its technical ability to recall the exact text without errors than of the ability to trick it into violating copyright law.
I still suspect it’s much easier to (accidentally) trick it into writing a fanfiction of a copyrighted work that it claims is the original than it is to get it to produce the true original, though.
Your argument that it is useful as a copyright infringing machine is that it can reproduce a public domain work? That’s… not the argument you think it is.
My message was pretty clear about which part of their claim I was skeptical about and what I was testing for. It’s not what you described here.