I was trying to find out if any product that was legal can bridge that gap other than buying books in print, in bulk, and scanning them and destroying them. From the responses here, it sounds like the answer is a vehement "no".
Wasn't asking for advice on copyright, but since we're here, your statement is slightly too strict, at least with respect to US copyright law. The copyright holder has sole distribution authority over the first sale of the work in the United States, but thereafter the first-sale doctrine allows it to be distributed by anyone thereafter. It is limited to the US, though, as far as I know. This is what allowed anthropic to train on printed books, which they then destroyed: they were able to purchase them in bulk because of the first-sale doctrine, as the publishers and authors would likely try to destroy the first-sale doctrine if they could, as evidenced by what's happened in the world of digital books.
Do you have the same legal rights to something that you've borrowed as you do with something you've purchased, though?
Would it be legal for me to borrow a book from the library, then scan and OCR every page and create an EPUB file of the result? Even if I didn't distribute it, that sounds questionable to me. Whereas if I had purchased the book and done the same, I believe that might be ok (format shifting for personal use).
Back when VHS and video rental was a thing, my parents would routinely copy rented VHS tapes if we liked the movie (camcorder connected to VCR with composite video and audio cables, worked great if there wasn't Macrovision copy protection on the source). I don't think they were under any illusions that what they were doing was ok.
Well If I copied it word for word maybe, but if I read it and "trained it" into my brain then it's clearly not illegal.
SO the grey area here is if I "trained" an LLM in a similar way and not copied it word for word then is it legal? Because fundamentally speaking it's literally the same action taken.
>Businesses routinely break the law if they believe the benefits in doing so will outweigh the consequences.
I'm saying there's collective incentive among businesses to restrict the LLM from producing illegal output. That is aligned and ultra clear. THAT was my point.
But if LLMs produce illegal output as a side effect and it can't be controlled than your point comes into play here because now they have to weigh the cost + benefit as they don't have a choice in the matter. But that wasn't what I'm getting at. That's your new point, which you introduced here.
In short it is clear all corporations do not want LLMs to produce illegal content and are actively trying to restrict it.
Right, but in the weed analogy, the scale is used as a proxy to assume intent. When someone is caught with those 400 joints, the prosecution doesn't have to prove intent, because the law has that baked in already.
You could say the same in LLM training, that doing so at scale implies the intent to commit copyright infringement, whereas reading a single book does not. (I don't believe our current law would see it this way, but it wouldn't be inconsistent if it did, or if new law would be written to make it so.)
Unfortunately a settlement doesn't really show you anything definitive about the legality or illegality of something.
It only shows you that the defendant thought it would be better for them to pay up rather than continue to be dragged through court, and that the plaintiff preferred some amount of certain money now over some other amount of uncertain money later, or never.
We cannot say with any amount of confidence how the court would have ruled on the legality, had things been allowed to play out without a settlement.
I've been using vim for over 20 years as my primary editor. I'm faster and more comfortable in it than I am in any other editor, but I still feel like a vim noob
I still have to look up how to do things I rarely do (like insert the contents of another file at the cursor position). And I don't really use many (if any) of vim's intermediate features, let alone advanced ones.
I've tried various ways to get more fluent, but nothing really stuck or kept my interest. This has always annoyed me a bit...
I've been using vim for 20 years as well, for everything other than Java code. I type my .vimrc by hand on each new machine to set a half dozen options.
Of the intermediate features, I use tabs and, more recently, split windows.
My favorite 'advanced' feature is visual block selection and replacement over multiple lines - super convenient.
I don't come here to read lengthy quotes from random authors. I come here for discussion and interesting viewpoints from our community. Posting a lengthy quote is low-effort; more valuable would be to write up your take on it in your own words, and, if you want, link to the original source material that you would have otherwise quoted.
If it isn't distributed in a manner to your liking, the only legal thing you can do is not have a copy of it at all.
reply