With foundation models running out of data for training, companies are placing their bets on the use of synthetic data for training the models. When Meta released the latest iteration of its open-source model, Llama 3.1 405B, it also updated the model’s licence to generate synthetic data which can be further used to train proprietary small models. On paper, this sounds plausible.
But recent studies cast doubts on this model. Based on a research paper published in science journal ‘Nature’, “indiscriminate use of model-generated content in training causes irreversible defects in the resulting models”.
Elevate Your Tech Prowess with High-Value Skill Courses
Offering College | Course | Website |
---|---|---|
Indian School of Business | ISB Product Management | Visit |
MIT xPRO | MIT Technology Leadership and Innovation | Visit |
IIT Delhi | Certificate Programme in Data Science & Machine Learning | Visit |
ET takes a look.