Analysis shows that indiscriminately training generative artificial intelligence on real and generated content, usually done by scraping data from the Internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.
It’s obvious enough that this will happen if you cycle one model’s output through itself, but they looked at different types of models (LLMs, VAEs, and GMMs) and found the same collapse in all of them. I think that’s a big finding.
It’s obvious enough that this will happen if you cycle one model’s output through itself, but they looked at different types of models (LLMs, VAEs, and GMMs) and found the same collapse in all of them. I think that’s a big finding.