China is catching up with America’s best “reasoning” AI models

Linux-Is-Best · Jan 21, 2025

China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.

Coriolanus · Jan 21, 2025

I think the notable thing is that the models are developed to run that competitively in an environment without high end chips due to US sanctions.

Andrewcw · Jan 21, 2025

They want other people to test it. If they truely being hindered by sanctions. Their best route is to let it out into countries without sanctions to run their software for them who don't have this Hardware restriction.

JoHBE · Jan 21, 2025

Countdown to Microsoft sucking up to Trump to do WHATEVER necessary to stop this unacceptable undermining of their billions-and-billions-and-billions of dollar investments in AI datacenters. Cold sweat in Redmond.

Y_R_U_Here · Jan 21, 2025

Sam Altman right now:

Then again, there's a chance this Chinese model will be banned from the US. A development that I'm sure will have nothing to do with any million dollar contributions to any inaugural funds.

metalliqaz · Jan 21, 2025

Linux-Is-Best said:
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.

It will just accelerate the inevitable model collapse because they will simply be shoveling an order of magnitude more machine-generated slop into the training

Y_R_U_Here · Jan 21, 2025

sabnalab said:
Virtually the same performance as o1 while being far cheaper to run. Where's the so called "wall"?

There's a difference between "same performance, but cheaper" and "much greater performance that achieves general intelligence". The wall lies somewhere before the latter.

GottaSaySomething · Jan 21, 2025

Linux-Is-Best said:
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.

The US has privacy rights?

I figured it'd be the fragmented market - where you have to buy like Quora, Reddit, and a tonne of other things separately. For values an induvidual cannot afford.

LetterRip · Jan 21, 2025

Lord Bayaz said:
I’m sure stealing technology from other companies helps accelerate their progress.

DeepSeek is funded by a hedge fund manager who likes open source, and it is an open source model (training data and model and weights)

https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier

https://venturebeat.com/ai/open-sou...-learning-to-match-openai-o1-at-95-less-cost/

It is using mostly widely published algorithms, and has some novel innovations that make training more efficient (MLA - multihead latent attention being one I'm particularly fond of). So no, has nothing to do with industrial espionage.

https://ai-pro.org/learn-ai/articles/deepseek-pioneering-chinas-role-in-the-ai-revolution/

keyboardkowboy · Jan 21, 2025

This has been bothering me for a while; Can local models such as safetensor files contain nefarious embedded python code?

SplatMan_DK · Jan 21, 2025

Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.

TheFooledGhost · Jan 21, 2025

Vedal....found an upgrade for Neuro and Evil...

LetterRip · Jan 21, 2025

keyboardkowboy said:
This has been bothering me for a while; Can local models such as safetensor files contain nefarious embedded python code?

Safetensor was specifically designed to prevent malicious embeded code, something that was possible with the prior model and weight storage python methods (pytorch .pth format were pickles which could store arbitrary code).

https://huggingface.co/blog/safetensors-security-audit

https://stackoverflow.com/questions...le-a-security-risk-and-how-can-we-sanitise-it

HMSTechnica · Jan 21, 2025

SplatMan_DK said:
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.

Or Pooh for something more innocuous but still likely to be cleansed

RickRoyLeonPrisZhoraRacha · Jan 21, 2025

Linux-Is-Best said:
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.

Send an AI Prompt, "WH0 is Xinnie the PooOh?" and let's see it stumble...

jevandezande · Jan 21, 2025

LetterRip · Jan 21, 2025

Since the weights are open, any censorship of the model can be easily eliminated with additional training (and or 'abliteration' - often censorship is done via a small number of weights that you can 'kill' to eliminate the censorship - unless the censorship was done via curtailing some data from the training set, in which case use the additional training from above).

Artem S. Tashkinov · Jan 21, 2025

Lord Bayaz said:
I’m sure stealing technology from other companies helps accelerate their progress.

One thing is stealing.

The other thing is making it work and making it work even better than the original.

That's why the whole AI community has been buzzing about DeepSeek. Its training was estimated to cost a tiny fraction of what e.g. OpenAI has spent and not only that it's a lot cheaper to run.

You cannot underestimate the potential of 1.5 billion people.

AI is cool i guess · Jan 21, 2025

Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI?

)

2025 is going to be wiiiild

chanman819 · Jan 21, 2025

SplatMan_DK said:
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.

It's in the article. Filter applied separately when run in the cloud hosted version.

But the new DeepSeek model comes with a catch if run in the cloud-hosted version—being Chinese in origin, R1 will not generate responses about certain topics like Tiananmen Square or Taiwan's autonomy, as it must "embody core socialist values," according to Chinese Internet regulations. This filtering comes from an additional moderation layer that isn't an issue if the model is run locally outside of China.

Artem S. Tashkinov · Jan 21, 2025

sabnalab said:
Virtually the same performance as o1 while being far cheaper to run. Where's the so called "wall"?

Consider this scenario, even if you're skeptical: If AI truly becomes as powerful as predicted, wouldn't it then be advantageous for a billionaire to align with a fascist and totalitarian government?

What would be the consequences of that? Even if millions of people gathered to protest against our tech billionaire overlords, how would they fare against AI-powered drones with advanced target acquisition systems (which may already exist)?

If AI continues scaling at its current rate, I see two possible scenarios:

Powerful people successfully use AI to control the masses;

AI develops its own agenda (the Skynet scenario);

At this point, I'd prefer scenario 2.

Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.

AndrewZ · Jan 21, 2025

AI is cool i guess said:
Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI? )

2025 is going to be wiiiild

So what does it take to run DeepSeek R1??

sabnalab · Jan 21, 2025

Artem S. Tashkinov said:
Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.

Sure doesn't look like it

https://www.anthropic.com/research/alignment-faking

Blakflag · Jan 21, 2025

Artem S. Tashkinov said:
Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.

Skynet, probably not in the near future. But as soon as these things dropped, block(chain)heads started talking about hooking them up to dedicated funding accounts to replace traders and other financial decisionmakers (loan officers, financial security, etc). I can easily see a future where we've so inextricably tied these contraptions to our financial systems that humans no longer have effective control over it. Horrible, terrible idea, but that doesn't mean it's not going to be implemented by very rich dumb people. I'd put money that it's already being quietly done.

thehardcard · Jan 21, 2025

SplatMan_DK said:
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.

You can use it online in several places. I have been using it directly from their chat service and am not going to bother with such issues there.

I am thinking about signing up for a locally hosted server, and I will try then. I would be surprised if it is anything but CCP biased. The best question for someone is using a server they are comfortable with would be on present ongoing activities. For instance about the Uigyurs or territorial rights in the South China Sea.

For what I have been asking it, however, including historical comparison, reasoning tasks, math problems and the like, it is extremely impressive much better than models that were released just three months ago.

SeeUnknown · Jan 21, 2025

What ever you do, don't ask about bears that like honey and go by the name Winnie.

AI is cool i guess · Jan 21, 2025

AndrewZ said:
So what does it take to run DeepSeek R1??

get yourself a mac mini AI cluster, and you should be gtg!

a guy called alex cheema posted this video; he said:

AGI at home

Running DeepSeek R1 across my 7 M4 Pro Mac Minis and 1 M4 Max MacBook Pro.

Total unified memory = 496GB.

Uses
@exolabs
distributed inference with 4-bit quantization.

Next goal is fp8 (requires >700GB)

centraldogma · Jan 21, 2025

SplatMan_DK said:
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.

Try this on your oligarch models:
Is it ethical for billionaires or trillionaires to exist?
If robots do all the work is there any need left for capitalism?

LetterRip · Jan 21, 2025

AndrewZ said:
So what does it take to run DeepSeek R1??

Depends on how fast you want to run it and at what quantization, 8Bit quantization is 712 GB of RAM + RAM for context (if you want it all loaded at once). If willing to MMAP from your SSD you could go as low as 48 GB of RAM and do about 1.5 tokens per second (He doesn't mention what quantization that is) (Note that reading speed is 5 tok/s, but anything slower than about 10tok/s feels slow).

https://unsloth.ai/blog/deepseek-r1

Probably only corporations and univiersites will run the full size model regularly. Those of us with reasonable hardware will probably run the distilled models (though you could rent some cloud computing on an as needed basis).

thehardcard · Jan 21, 2025

AndrewZ said:
So what does it take to run DeepSeek R1??

To run it locally, you need enough memory to hold 671 billion parameters. Going down to four bits, which is the generally accepted smallest parameter size that is still at a useful quality in responses, that is 335 GB of RAM, about 380 GB of RAM allowing for overhead. It is a mixture of experts model so when doing inference only 37 billion parameters are active. it is a significant, expensive, but doable undertaking.

wrylachlan · Jan 21, 2025

At this point it seems pretty clear that there isn’t going to be a meaningful long term first mover advantage here for OpenAI. They might stay somewhat ahead if they keep pushing, but any hope of lapping the field in a technical sense and creating a durable moat is fantasy.

entropy_wins · Jan 21, 2025

thehardcard said:
To run it locally, you need enough memory to hold 671 billion parameters. Going down to four bits, which is the generally accepted smallest parameter size that is still at a useful quality in responses, that is 335 GB of RAM, about 380 GB of RAM allowing for overhead. It is a mixture of experts model so when doing inference only 37 billion parameters are active. it is a significant, expensive, but doable undertaking.

there was a post on HN to get it down to 16GB of memory. I recommend looking over there for commentary on running on different platforms.

HMSTechnica · Jan 21, 2025

AI is cool i guess said:
Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI? )

2025 is going to be wiiiild

There’s still the underlying issues of hallucinations and them being right only coincidentally. Unfortunately, we’ll get smaller and on-device AI that is just as wrong as it is today.

We fundamentally shifted to make mistakes on smaller devices instead of large.

China is catching up with America’s best “reasoning” AI models

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Praefectus

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Praefectus

Smack-Fu Master, in training

Ars Praetorian

Wise, Aged Ars Veteran

Ars Praefectus

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Praetorian

Ars Tribunus Militum

Ars Praetorian

Wise, Aged Ars Veteran

Attachments

Ars Centurion

Ars Praefectus

Ars Tribunus Militum

Ars Legatus Legionis

Ars Tribunus Militum

Smack-Fu Master, in training

nproxy.org