ChatGPT Is a Blurry JPEG of the Web

There is so much to unpack in this piece, but it offers a really helpful metaphor for understanding Large Language Models.

This excerpt, in particular, really hit me… hard. Using AI to generate content and put it on the web would eventually result in a huge reduction of quality, just like making a photocopy of a photocopy:

Even if it is possible to restrict large-language models from engaging in fabrication, should we use them to generate Web content? This would make sense only if our goal is to repackage information that’s already available on the Web. Some companies exist to do just that—we usually call them content mills. Perhaps the blurriness of large-language models will be useful to them, as a way of avoiding copyright infringement. Generally speaking, though, I’d say that anything that’s good for content mills is not good for people searching for information. The rise of this type of repackaging is what makes it harder for us to find what we’re looking for online right now; the more that text generated by large-language models gets published on the Web, the more the Web becomes a blurrier version of itself.