With foundation models running out of data for training, companies are placing their bets on the use of synthetic data for training the models. When Meta released the latest iteration of its open-source model, Llama 3.1 405B, it also updated the model’s licence to generate synthetic data which can be further used to train proprietary small models. On paper, this sounds plausible.
ETtech
But recent studies cast doubts on this model. Based on a research paper published in science journal ‘Nature’, “indiscriminate use of model-generated content in training causes irreversible defects in the resulting models”.
Elevate Your Tech Prowess with High-Value Skill Courses
Offering College
Course
Website
Indian School of Business
ISB Product Management
Visit
MIT xPRO
MIT Technology Leadership and Innovation
Visit
IIT Delhi
Certificate Programme in Data Science & Machine Learning