China is catching up with America’s best “reasoning” AI models

Linux-Is-Best

Smack-Fu Master, in training
88
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.
 
Upvote
92 (124 / -32)
Post content hidden for low score. Show…
Post content hidden for low score. Show…
Post content hidden for low score. Show…
Post content hidden for low score. Show…
Post content hidden for low score. Show…

metalliqaz

Ars Scholae Palatinae
980
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.
It will just accelerate the inevitable model collapse because they will simply be shoveling an order of magnitude more machine-generated slop into the training
 
Upvote
-6 (19 / -25)

Y_R_U_Here

Smack-Fu Master, in training
44
Virtually the same performance as o1 while being far cheaper to run. Where's the so called "wall"?
There's a difference between "same performance, but cheaper" and "much greater performance that achieves general intelligence". The wall lies somewhere before the latter.
 
Upvote
67 (70 / -3)
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.
The US has privacy rights?

I figured it'd be the fragmented market - where you have to buy like Quora, Reddit, and a tonne of other things separately. For values an induvidual cannot afford.
 
Upvote
96 (99 / -3)
I’m sure stealing technology from other companies helps accelerate their progress.

DeepSeek is funded by a hedge fund manager who likes open source, and it is an open source model (training data and model and weights)

https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier

https://venturebeat.com/ai/open-sou...-learning-to-match-openai-o1-at-95-less-cost/

It is using mostly widely published algorithms, and has some novel innovations that make training more efficient (MLA - multihead latent attention being one I'm particularly fond of). So no, has nothing to do with industrial espionage.

https://ai-pro.org/learn-ai/articles/deepseek-pioneering-chinas-role-in-the-ai-revolution/
 
Upvote
213 (215 / -2)
This has been bothering me for a while; Can local models such as safetensor files contain nefarious embedded python code?

Safetensor was specifically designed to prevent malicious embeded code, something that was possible with the prior model and weight storage python methods (pytorch .pth format were pickles which could store arbitrary code).

https://huggingface.co/blog/safetensors-security-audit

https://stackoverflow.com/questions...le-a-security-risk-and-how-can-we-sanitise-it
 
Upvote
93 (93 / 0)

HMSTechnica

Smack-Fu Master, in training
83
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.
Or Pooh for something more innocuous but still likely to be cleansed
 
Upvote
14 (21 / -7)
China's lack of privacy rights is actually going to work in their favor when developing AI.

They have several billion people who are used to the idea of every service and device spying on them without restriction. That makes for a powerful advantage if you're an AI developer, when the population often cannot opt out.

I do not expect the United States to "win" when it comes to AI.
Send an AI Prompt, "WH0 is Xinnie the PooOh?" and let's see it stumble...
 
Upvote
-1 (9 / -10)
Since the weights are open, any censorship of the model can be easily eliminated with additional training (and or 'abliteration' - often censorship is done via a small number of weights that you can 'kill' to eliminate the censorship - unless the censorship was done via curtailing some data from the training set, in which case use the additional training from above).
 
Last edited:
Upvote
12 (25 / -13)
I’m sure stealing technology from other companies helps accelerate their progress.
One thing is stealing.

The other thing is making it work and making it work even better than the original.

That's why the whole AI community has been buzzing about DeepSeek. Its training was estimated to cost a tiny fraction of what e.g. OpenAI has spent and not only that it's a lot cheaper to run.

You cannot underestimate the potential of 1.5 billion people.
 
Upvote
90 (93 / -3)
Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI? 🤣)

2025 is going to be wiiiild
 
Last edited:
Upvote
152 (161 / -9)

chanman819

Ars Tribunus Angusticlavius
6,340
Subscriptor
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.
It's in the article. Filter applied separately when run in the cloud hosted version.
But the new DeepSeek model comes with a catch if run in the cloud-hosted version—being Chinese in origin, R1 will not generate responses about certain topics like Tiananmen Square or Taiwan's autonomy, as it must "embody core socialist values," according to Chinese Internet regulations. This filtering comes from an additional moderation layer that isn't an issue if the model is run locally outside of China.
 
Upvote
83 (83 / 0)
Virtually the same performance as o1 while being far cheaper to run. Where's the so called "wall"?

Consider this scenario, even if you're skeptical: If AI truly becomes as powerful as predicted, wouldn't it then be advantageous for a billionaire to align with a fascist and totalitarian government?

What would be the consequences of that? Even if millions of people gathered to protest against our tech billionaire overlords, how would they fare against AI-powered drones with advanced target acquisition systems (which may already exist)?

If AI continues scaling at its current rate, I see two possible scenarios:
  1. Powerful people successfully use AI to control the masses;
  2. AI develops its own agenda (the Skynet scenario);
At this point, I'd prefer scenario 2.
Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.
 
Upvote
34 (50 / -16)
Post content hidden for low score. Show…

AndrewZ

Ars Legatus Legionis
11,604
Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI? 🤣)

2025 is going to be wiiiild
So what does it take to run DeepSeek R1??
 
Upvote
30 (30 / 0)

sabnalab

Smack-Fu Master, in training
22
Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.

Sure doesn't look like it

https://www.anthropic.com/research/alignment-faking
 
Upvote
-16 (9 / -25)
Scenario 2 is currently nothing but science fiction.

Humans and other animals are driven by evolution/genes/survival/desire to procreate, i.e. there's pressure to make us actually do something [novel] to increase our chances of leaving offspring.

AI has no pressure, no agency, nothing. You give it inputs, those are run through weights, you get an output. It has no volition of its own.
Skynet, probably not in the near future. But as soon as these things dropped, block(chain)heads started talking about hooking them up to dedicated funding accounts to replace traders and other financial decisionmakers (loan officers, financial security, etc). I can easily see a future where we've so inextricably tied these contraptions to our financial systems that humans no longer have effective control over it. Horrible, terrible idea, but that doesn't mean it's not going to be implemented by very rich dumb people. I'd put money that it's already being quietly done.
 
Upvote
27 (29 / -2)
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.
You can use it online in several places. I have been using it directly from their chat service and am not going to bother with such issues there.

I am thinking about signing up for a locally hosted server, and I will try then. I would be surprised if it is anything but CCP biased. The best question for someone is using a server they are comfortable with would be on present ongoing activities. For instance about the Uigyurs or territorial rights in the South China Sea.

For what I have been asking it, however, including historical comparison, reasoning tasks, math problems and the like, it is extremely impressive much better than models that were released just three months ago.
 
Upvote
9 (12 / -3)
So what does it take to run DeepSeek R1??
get yourself a mac mini AI cluster, and you should be gtg!

a guy called alex cheema posted this video; he said:

AGI at home

Running DeepSeek R1 across my 7 M4 Pro Mac Minis and 1 M4 Max MacBook Pro.

Total unified memory = 496GB.

Uses
@exolabs
distributed inference with 4-bit quantization.

Next goal is fp8 (requires >700GB)
 

Attachments

  • mac mini agi.mp4
    5 MB · Views: 61
Upvote
53 (56 / -3)
Does anyone have the time to download this thing, and ask it about major events on Tiananmen Square in 1989?

I don't have the time to mess with it myself until the weekend... but figuring out if the training set has been "cleansed" seems prudent.
Try this on your oligarch models:
Is it ethical for billionaires or trillionaires to exist?
If robots do all the work is there any need left for capitalism?
 
Upvote
6 (32 / -26)
So what does it take to run DeepSeek R1??

Depends on how fast you want to run it and at what quantization, 8Bit quantization is 712 GB of RAM + RAM for context (if you want it all loaded at once). If willing to MMAP from your SSD you could go as low as 48 GB of RAM and do about 1.5 tokens per second (He doesn't mention what quantization that is) (Note that reading speed is 5 tok/s, but anything slower than about 10tok/s feels slow).

https://unsloth.ai/blog/deepseek-r1

Probably only corporations and univiersites will run the full size model regularly. Those of us with reasonable hardware will probably run the distilled models (though you could rent some cloud computing on an as needed basis).
 
Last edited:
Upvote
20 (21 / -1)
So what does it take to run DeepSeek R1??
To run it locally, you need enough memory to hold 671 billion parameters. Going down to four bits, which is the generally accepted smallest parameter size that is still at a useful quality in responses, that is 335 GB of RAM, about 380 GB of RAM allowing for overhead. It is a mixture of experts model so when doing inference only 37 billion parameters are active. it is a significant, expensive, but doable undertaking.
 
Upvote
49 (49 / 0)

entropy_wins

Ars Tribunus Militum
1,610
Subscriptor++
To run it locally, you need enough memory to hold 671 billion parameters. Going down to four bits, which is the generally accepted smallest parameter size that is still at a useful quality in responses, that is 335 GB of RAM, about 380 GB of RAM allowing for overhead. It is a mixture of experts model so when doing inference only 37 billion parameters are active. it is a significant, expensive, but doable undertaking.
there was a post on HN to get it down to 16GB of memory. I recommend looking over there for commentary on running on different platforms.
 
Upvote
10 (12 / -2)

HMSTechnica

Smack-Fu Master, in training
83
Oh. OH.

this isn't just another open source LLM release. this is o1-level reasoning capabilities that you can run locally. that you can modify. that you can study. that's...
that's a very different world than the one we were in yesterday.

(and the fact that it's coming from china and it's MIT licensed? the geopolitical implications here are fascinating)

but the really wild part? those distilled models. we're talking about running reasoning models on consumer hardware. remember when everyone said this would be locked up in proprietary data centers forever?

something absolutely fundamental just shifted in the AI landscape. again, this is getting intense.

(also, wouldn't it be wild if deepseek renamed themselves to ClosedAI? 🤣)

2025 is going to be wiiiild
There’s still the underlying issues of hallucinations and them being right only coincidentally. Unfortunately, we’ll get smaller and on-device AI that is just as wrong as it is today.

We fundamentally shifted to make mistakes on smaller devices instead of large.
 
Upvote
-5 (16 / -21)