The difficulty with being decisive in our AI world
The news moves fast and Jess says be decisive as a solution. The hard part is finding the data to support decisions. Here is an example from this month:
Mercor CEO Brendan Foody announced they have scaled from $1-500M in revenue run rate in the last 17 months, self-proclaiming “the fastest growing company of all time.” He proposes a specific view of the future: “While everyone fears job loss, we’re creating a new category of knowledge work faster than any other time in history. The future of work will converge on training agents.”
ML Researcher Benjamin Anderson posted Don’t Build an RL Environment Startup, where he warns of the “free money spigots” of building RL environments: “Once AI models have a skill, the value of datasets and environments that teach that skill tend to 0, as models can now provide unlimited, cheap data as a substitute.”
Anthropic Co-Founder Jack Clark reportedly said that in the next 16 months, AI will be able to complete tasks that take days, weeks or months, describing it as a “call center of geniuses” or a “country of geniuses.”
Three former OpenAI employees are in talks to raise more money for their RL company, now valued at $500M.
So:
According to Mercor, the future of work is training agents.
Humans can only train agents on skills they perform better than agents.
By the end of next year, AI could be a “call center of geniuses.”
Smart VC’s are putting more money into RL companies at very high prices.
Once an agent has mastered an environment, the training environment’s value goes to zero. Once an agent’s abilities exceed a human labeler’s, the labeler’s value goes to zero. As Anderson put it in his post: “If the only person smart enough to make models better is Terence Tao—OpenAI doesn’t need your marketplace, they’ll just pick up the phone and call him.”
Years ago we predicted that synthetic data generation would quickly swamp human labeling, and companies like Scale would crash. Turns out it is swamping human labeling, starting at the bottom, but this has happened slowly enough for Scale to find a profitable exit and robotic rebirth.
It’s temping to make the same prediction for Mercor and others that support this RL ecosystem. Mercor is starting with a more highly skilled talent pool than Scale, but couldn’t it become one of the most rapidly depreciating assets in history? Maybe they thread the needle or maybe we make the same mistake we did with Scale, and the range of tasks requiring human-assigned rewards and RL is so broad that it will be 10+ years and many successful startups and fund returning investments before the RL economy runs out of steam.
Being decisive is key. Being right is more difficult.