“In a world where things are moving so fast . . . you could actually spend a lot of money, doing it the hard way, and then the rest of the field is right on your heels,” IBM’s Cox said. “So it is an interesting and tricky business landscape.”
Ahahaha hahaha.large language models will still be required for “high intelligence and high stakes tasks” where “businesses are willing to pay more for a high level of accuracy and reliability.”
Ahahaha hahaha.
"For years we have said that we can do the work of people faster and cheaper, never mind some quality issues... now that there is someone doing it even faster and even cheaper, we think SURELY they'll pay for our slower and expensiver quality product."
All evidence points towards all AI companies lifting their data from everywhere they can and without permission. Why pretend they are not?still waiting for the day the courts catch up and say ALL the AIs are illegal because they were made with countless copywritten material, and their existence is one gigantic piece of theft.
And then courts are pressured to ignore the law because china doesn't give a shit about any laws, since they seem to make theft a virtue. And then it becomes a public secret that copywritten material isn't protected anymore because AI
edit: I'm referring to the CCP - the chinese government, NOT the average chinese person who very much doesn't think theft is ok.
let me google that for you. Look at the nytimes lawsuit for example. Or how literlaly EVERY content provider online has had problems for yeras with countless AI scrapers stealing everything from the site over and over and over and over.All evidence points towards all AI companies lifting their data from everywhere they can and without permission. Why pretend they are not?
Yes, though you could argue this is doomed to happen anyway as models are trained against the open web and the open web gets more content generated by LLMs.Wouldn’t this magnify errors and hallucinations from the base model over time? Like a xerox of a xerox?
“We’re going to use [distillation] and put it in our products right away,” said Yann LeCun, Meta’s chief AI scientist. “That’s the whole idea of open source. You profit from everyone and everyone else’s progress as long as those processes are open.”
Wouldn’t this magnify errors and hallucinations from the base model over time? Like a xerox of a xerox?
Most likely, this depends on how much error there is at the fringes of the distilled model's expertise. The more error, the more likely a redistiller is to target too much breadth and incorporate more error and lower performance even on core expertise. But the best possible result would be that a redistilled model can only acquire, at most, the expertise given to the distilled model.The business answer to this is quite simple and also limiting. All frontier models will be inaccessible and only the distilled models will be released.
This begs the question of can you effectively distill a model from a distilled model? If so, then there is truly no moat.
This makes it sound like there should be, in future, some index of distilled models and a way to select the most appropriate for a given inquiry. Like, "this is a legal question, employ the Westlaw LLM"
I see no reason you couldn't, but for it to work best it'd need to be a subset of a subset of what each model is good at – you could train a model to summarise e-mails, then maybe from that train one to identify spam.The business answer to this is quite simple and also limiting. All frontier models will be inaccessible and only the distilled models will be released.
This begs the question of can you effectively distill a model from a distilled model? If so, then there is truly no moat.
This isn't quite right - all else equal if you can provide your product (a chatbot) for less money but the output quality is the same, you should be able to charge about the same for it. If the AI industry actually has a product worth paying for, them being able to provide it with less computing resources is only a problem for nVidia, Azure, and electricity providers, not OpenAI. Every LLM provider should be tripping over themselves to implement DeepSeek's techniques to save money immediately.That presents a challenge to many of the business models of leading AI firms. Even if developers use distilled models from companies like OpenAI, they cost far less to run, are less expensive to create, and, therefore, generate less revenue. Model-makers like OpenAI often charge less for the use of distilled models as they require less computational load.
They're all hoping that it'll be "The Next Big Thing" and they can make a lot of money.So there’s no moat AND these massively loss-making endeavors have even less medium-term ability to make any future profits on these loss-makers due to there being no moat and no first-mover advantage? So why are we pouring a $trillion in new capital expenditures over the next decade and boiling the oceans with all the new fossil-fuel electricity generation we (/China) are going to need for this? Collective hysteria.
I think the initial goal was to be able to replace a $4000/mo spreadsheet pusher employee with a $500/mo GPU timeshare.This isn't quite right - all else equal if you can provide your product (a chatbot) for less money but the output quality is the same, you should be able to charge about the same for it. If the AI industry actually has a product worth paying for, them being able to provide it with less computing resources is only a problem for nVidia, Azure, and electricity providers, not OpenAI. Every LLM provider should be tripping over themselves to implement DeepSeek's techniques to save money immediately.
The challenge for the AI industry is that it's becoming ever more clear how insanely commoditized it is: everyone and their dog seems to be able to make a GPT4-level model, and so with tons of competition, no lock-in, and no moat, the price you can command is rapidly going to fall to barely more than the cost to provide it. As a market should operate.
ETA: I guess if we model "The AI Industry" as the interests of a bunch of wannabe monopolists like A16Z or Sam Altman, then the quote makes more sense: presiding over a boring, low margin, high-competition, commoditized SaaS is not the thrilling high-margin play they're aiming for. Nor does it justify their company's valuations.
But even if they could do that, the industry experience since ChatGPT came out would look like:I think the initial goal was to be able to replace a $4000/mo spreadsheet pusher employee with a $500/mo GPU timeshare.
But then it turns out the AI can't really replace the spreadsheet pusher without supervision, and they can't charge nearly as much as the employee costs either.
Oops.
Focusing in on this statement: Assuming the "teacher" has been trained on reliable sources, I'm of a mind to think this is the best use of LLMs. Develop a deep, broad set of associations, hone in on one specific topic and dive deep, augmenting and refining from more specialized--and probably in many cases proprietary--knowledge sources.Traditionally, knowledge distillation is a process where a large, complex model (the "teacher") transfers its learned knowledge to a smaller, more efficient model (the "student").
Except the price is already far below the cost to provide the services. Every LLM provider is losing money hand over fist, and that even with the hyperscalers giving them huge discounts. Even ChatGPT Pro loses money.the price you can command is rapidly going to fall to barely more than the cost to provide it.
And that's actual open source, as opposed to the open-weight models from Meta and others being routinely called "open source".This article appears very poorly researched, and seems to completely misunderstand what made DeepSeek's V3 and R1 special and cheap in the first place.
It was not distillation, that has been used for years and was already common place. What made DeepSeek's models special is that they designed a very efficient architecture. Trained it in FP8 precision, which is half of the industry standard BF16. And wrote a lot of custom software to push as much performance out of their limited hardware as they possibly could.
On that note DeepSeek actually had a special event last week where they open sourced a core part of their training and inference infrastructure every day. I'm kind of surprised it received no coverage on Ars as it has been a pretty amazing thing. Many of the projects they open sourced has already been used to speed up other open source inference engines. They even open sourced a custom distributed file system designed specifically for loading datasets during training.
Teacher here... the moment I read that an AI performed better if you ask it to think step by step, I was sure my future was going to be bright.Cool, so after years and billions of dollars, AI firms have invented school.
Yeah… you’re gonna want to stop waiting for this and get on with your life.still waiting for the day the courts catch up and say ALL the AIs are illegal because they were made with countless copywritten material,
Interesting, it’s like college vs high school.Instead of the student model simply imitating the teacher's final answers, it learns to emulate the intermediate processes—the way the teacher analyzes, interprets, and derives conclusions from the data.