Data is intensely valuable. That's for sure. However, this article does not do a sufficient job of distinguishing between high quality data and data annotation? How much of the LLAMA 3 data is annotated? (Not a public statistic.) The trend we observe is that the technology is moving significantly toward self-supervision with minimal intervention, suggestion the annotation market is declining although the overall data market is increasing. This is the real message.
Data is intensely valuable. That's for sure. However, this article does not do a sufficient job of distinguishing between high quality data and data annotation? How much of the LLAMA 3 data is annotated? (Not a public statistic.) The trend we observe is that the technology is moving significantly toward self-supervision with minimal intervention, suggestion the annotation market is declining although the overall data market is increasing. This is the real message.
These thoughts are captured in my Annotation is Dead blog from earlier this year. https://medium.com/@jasoncorso/annotation-is-dead-1e37259f1714