That's really surprising! It kind of implies that LLMs are an incredibly effective lossy compression of the training inputs. I would not have thought that 3B weights would be enough to memorize texts ...
In recent years, numerous plaintiffs—including publishers of books, newspapers, computer code, and photographs—have sued AI companies for training models using copyrighted material. A key question in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results