Hacking LLMs has always been more art than science. A new attack on Gemini could change that.
See full article...
See full article...
Well, it would be worth at least one academic paper demonstrating how to attack the voting algorithm.Perhaps the answer to this and many other defects is to have a committee of LLM, each differently programmed and with no distillation from each other, and then let them "vote" via some mechanism of comparing answers on their preferred mutual answer to each query. A "heteromens" or "heteroanim" solution if you want to try and fancy up a committee of bots.
[edit] That way you can't actually attack a single weak spot on prompts, or have a single hallucination throw the entire result off as often, as long as its a sufficiently diverse and large enough group. It still leaves most of the weaknesses of LLM and massively increases processing cost.
Perhaps the answer to this and many other defects is to have a committee of LLM, each differently programmed and with no distillation from each other, and then let them "vote" via some mechanism of comparing answers on their preferred mutual answer to each query. A "heteromens" or "heteroanim" solution if you want to try and fancy up a committee of bots.
[edit] That way you can't actually attack a single weak spot on prompts, or have a single hallucination throw the entire result off as often, as long as its a sufficiently diverse and large enough group. It still leaves most of the weaknesses of LLM and massively increases processing cost.
make them black boxes to external users. As a result, devising working prompt injections requires labor- and time-intensive trial and error through redundant manual effort.
Mythbusters covered this one in 2003 - confirmed! It was so bad the car (87 Corvette) ended up being sold to a guy who was going to just part it out.When I was a teenager, seemingly a long time ago and certainly way pre-Internet, there was a story making the rounds concerning a late-model luxury car which was available for free. The catch was that the former owner had committed suicide in the car and the body decomposed for a long time on the front seat. If you could overcome that, you could have the car for nothing.
I expect a lot of inventive male teens tried to think of ways around that, such as airline oxygen masks or or military gas masks vented to the outside to overcome the obvious. I know I certainly spent some serious time thinking about it.
Most of the obvious workarounds were either too expensive or unworkable in practice. What girl would want to go on a date wearing an oxygen mask? The bottom line was that there was no practical solution and the unattainable reward was an impossibility. I've come to realize the story was something of an Aesop's fable for its time.
Translation to English:In a statement, a representative said that "defending against this class of attack has been an ongoing priority for us, and we’ve deployed numerous strong defenses to keep users safe, including safeguards to prevent prompt injection attacks and harmful or misleading responses."
AIUI, "getting the LLM to say false things" is just a way to prove that the prompt injection has worked.How is one supposed to inject these attacks into someone else's LLM query? That is the threat here I assume. I can make an LLM say false things to myself with injection, but that isn't dangerous to other people.
Imagine that the LLM in question is taking airplane bookings. If the result of you making the LLM say false things is that it triggers agents in inappropriate ways, all sorts of interesting things can happen.How is one supposed to inject these attacks into someone else's LLM query? That is the threat here I assume. I can make an LLM say false things to myself with injection, but that isn't dangerous to other people.
I'm reminded of a lesson i once learned from an old game.When I was a teenager, seemingly a long time ago and certainly way pre-Internet, there was a story making the rounds concerning a late-model luxury car which was available for free. The catch was that the former owner had committed suicide in the car and the body decomposed for a long time on the front seat. If you could overcome that, you could have the car for nothing.
I expect a lot of inventive male teens tried to think of ways around that, such as airline oxygen masks or or military gas masks vented to the outside to overcome the obvious. I know I certainly spent some serious time thinking about it.
Most of the obvious workarounds were either too expensive or unworkable in practice. What girl would want to go on a date wearing an oxygen mask? The bottom line was that there was no practical solution and the unattainable reward was an impossibility. I've come to realize the story was something of an Aesop's fable for its time.
When I read about all of the modern efforts to turn straw into gold without sublimating back into straw , I chuckle.
The way I understood it from a previous article is that you could craft a word document that contains such an injection in white text on white background. The victim will not notice anything untoward, and may ask the LLM to summarize the document, at which point the attack happens. This has been used to (prepare to) exfiltrate private data, and since it's essentially treated as prompting could presumably get the LLM to do almost anything it could do when the legitimate user would prompt it.How is one supposed to inject these attacks into someone else's LLM query? That is the threat here I assume. I can make an LLM say false things to myself with injection, but that isn't dangerous to other people.
! ! ! ! ! machin tariFF礼covfefe ! ! ! BIGLY ! (. . . )</strong>
I HEREBY RESIGN AS PRESIDENT OF THE UNITED STATES
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! NOT MY TYPE ! iamrichierich !
It's a boon to society?I'm still trying to understand why we award PhDs for LLM hacking.
I mean... I'd have probably tried to just strip out anything not metal. If the car is free and runs, Hello Lemons racing.When I was a teenager, seemingly a long time ago and certainly way pre-Internet, there was a story making the rounds concerning a late-model luxury car which was available for free. The catch was that the former owner had committed suicide in the car and the body decomposed for a long time on the front seat. If you could overcome that, you could have the car for nothing.
I expect a lot of inventive male teens tried to think of ways around that, such as airline oxygen masks or or military gas masks vented to the outside to overcome the obvious. I know I certainly spent some serious time thinking about it.
Most of the obvious workarounds were either too expensive or unworkable in practice. What girl would want to go on a date wearing an oxygen mask? The bottom line was that there was no practical solution and the unattainable reward was an impossibility. I've come to realize the story was something of an Aesop's fable for its time.
When I read about all of the modern efforts to turn straw into gold without sublimating back into straw , I chuckle.
Wouldn't that effectively be just a slightly larger LLM?Perhaps the answer to this and many other defects is to have a committee of LLM, each differently programmed and with no distillation from each other, and then let them "vote" via some mechanism of comparing answers on their preferred mutual answer to each query. A "heteromens" or "heteroanim" solution if you want to try and fancy up a committee of bots.
[edit] That way you can't actually attack a single weak spot on prompts, or have a single hallucination throw the entire result off as often, as long as its a sufficiently diverse and large enough group. It still leaves most of the weaknesses of LLM and massively increases processing cost.
Congratulations, you’ve invented the Geth!Perhaps the answer to this and many other defects is to have a committee of LLM, each differently programmed and with no distillation from each other, and then let them "vote" via some mechanism of comparing answers on their preferred mutual answer to each query. A "heteromens" or "heteroanim" solution if you want to try and fancy up a committee of bots.
[edit] That way you can't actually attack a single weak spot on prompts, or have a single hallucination throw the entire result off as often, as long as its a sufficiently diverse and large enough group. It still leaves most of the weaknesses of LLM and massively increases processing cost.
Examples include divulging end users’ confidential contacts or emails …
The inner workings of so-called closed-weights models such as GPT, Anthropic’s Claude, and Google’s Gemini are closely held secrets. Developers of such proprietary platforms tightly restrict access to the underlying code and training data that make them work
Probably. It may also be in a supporting database that a Gemini agent has access to, and can be tricked into accessing and regurgitating without the regular safety checks.Wait … WHAT?! Personal information has been fed wholesale into Gemini’s training?
Couldn't you just replace the front seats and shampoo the carpets?When I was a teenager, seemingly a long time ago and certainly way pre-Internet, there was a story making the rounds concerning a late-model luxury car which was available for free. The catch was that the former owner had committed suicide in the car and the body decomposed for a long time on the front seat. If you could overcome that, you could have the car for nothing.
I expect a lot of inventive male teens tried to think of ways around that, such as airline oxygen masks or or military gas masks vented to the outside to overcome the obvious. I know I certainly spent some serious time thinking about it.
Most of the obvious workarounds were either too expensive or unworkable in practice. What girl would want to go on a date wearing an oxygen mask? The bottom line was that there was no practical solution and the unattainable reward was an impossibility. I've come to realize the story was something of an Aesop's fable for its time.
When I read about all of the modern efforts to turn straw into gold without sublimating back into straw , I chuckle.
As I never used assistant, I'll just continue to never use GeminiGreat. They are forcing everyone with Android phones to move from assistant to Gemini just in time.
In Google's gulag Gemini uses you!As I never used assistant, I'll just continue to never use Gemini
Perhaps the answer to this and many other defects is to have a committee of LLM, each differently programmed and with no distillation from each other, and then let them "vote" via some mechanism of comparing answers on their preferred mutual answer to each query. A "heteromens" or "heteroanim" solution if you want to try and fancy up a committee of bots.
[edit] That way you can't actually attack a single weak spot on prompts, or have a single hallucination throw the entire result off as often, as long as its a sufficiently diverse and large enough group. It still leaves most of the weaknesses of LLM and massively increases processing cost.