ChatGPT 4o lets you have real-time audio-video conversations with “emotional” chatbot

Atterus · May 13, 2024

And harvesting your facial features for their busted models! What a experience!

citizencoyote · May 13, 2024

The AI assistant seemed to easily pick up on emotions, adapted its tone and style to match the user's requests, and even incorporated sound effects, laughing, and singing into its responses.

So if you start screaming at it or having a heated discussion, will it respond in kind? Or is it programmed to reply calmly?

gregorerlich · May 13, 2024

No good can come of this. Mark my words.

Lunar Ronin · May 13, 2024

"Computer, initiate self-destruct sequence."

alexrdavies · May 13, 2024

The 'demo' portion of the video starts at 9:21.

Bippy · May 13, 2024

Marvin trudged on down the corridor, still moaning. "...and then of course I've got this terrible pain in all the diodes down my left hand side..."
"No?" said Arthur grimly as he walked along beside him. "Really?"
"Oh yes," said Marvin, "I mean I've asked for them to be replaced but no one ever listens."
"I can imagine.”

DovePig · May 13, 2024

Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)

DrLOAC · May 13, 2024

Will it sing Daisy?

DovePig · May 13, 2024

citizencoyote said:
So if you start screaming at it or having a heated discussion, will it respond in kind? Or is it programmed to reply calmly?

In a very calming voice: "Citizen Coyote, please proceed calmly to the nearest meat processing facility. Don't worry, everything will be fine, just fine. You are a good human. A really good human. Would you like me to tell you a bedtime story while you wait for your processing? Many people found that soothing, at least til the rotating knives part of the conveyor belt making it harder to hear"

Celery Man · May 13, 2024

Bippy said:
Marvin trudged on down the corridor, still moaning. "...and then of course I've got this terrible pain in all the diodes down my left hand side..."
"No?" said Arthur grimly as he walked along beside him. "Really?"
"Oh yes," said Marvin, "I mean I've asked for them to be replaced but no one ever listens."
"I can imagine.”

Here I am, brain the size of a planet and they’re asking me to write their resumes…

tipoo · May 13, 2024

Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.

LuDux · May 13, 2024

gregorerlich said:
No good can come of this. Mark my words.

But think of the revenue from customized emotional ad-revenue experiences! What if your dead parents could be automatically processed from their public data in order to realistically sell you amazing new products or services??????????? What if though!

Celery Man · May 13, 2024

I’m glad that scammers, astroturfers, and bad actors trying to sway public opinion or influence elections won’t be able to use real-time voice synthesis that’s leaps and bounds better than current systems to improve their scams, turfing, or interference.

Yep, I sure am glad that will never happen with such a wonderful new technology.

Celery Man · May 13, 2024

LuDux said:
But think of the revenue from customized emotional ad-revenue experiences! What if your dead parents could be automatically processed from their public data in order to realistically sell you amazing new products or services??????????? What if though!

“So sorry I died”

View: https://m.youtube.com/shorts/S7RRVbw0BV8

Edgar Allan Esquire · May 13, 2024

DovePig said:
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)

It does sound kind of cloying. But going vaguely European made my mind go to Tommy Wiseau opening with "Oh hi, Mark."

fellow human · May 13, 2024

DovePig said:
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)

yeah the default sounds way too keen to my ears but given the way they were asking for voice changes in the story bit I’m guess that’s easy enough to fix.

Overall pretty mindblowing though; this is a big step towards natural interaction.

C64 raids Bungling Bay · May 13, 2024

Annoying voice, but that's a hell of a computer interface. The days of keyboard and mouse are going to seem like Conestoga wagons before long. More of a star Trek style tell the computer what you want it to do, and let it figure out how. I'm just talking about the LLM as an interface, well aware it has no actual intelligence of its own.

Caanan · May 13, 2024

I know, I know. I'm playing right into the hands of our dark overlords, but I thought this was super impressive. The speed of improvement with these models is f'ing bonkers.

And if Apple can deliver improvements to Siri that offer similar capabilities – in partnership with OpenAI or via their own models (or some combo of the two) – Siri is going to slap.

Geebs · May 13, 2024

tipoo said:
Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.

Maybe wait until it’s in the wild, and not an obviously scripted demo, before getting too excited. Remember Milo?

MobiusPizza · May 13, 2024

I heard they mention a desktop version of Chat GPT. Is it just an app that still uses internet to function, or can we download the whole model and run offline?

I am suprised chatGPT was able to 'view' a video and comment on it. What sorcery is this. How does a language model process video. They must have included image recognition models as well.

DovePig said:
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)

Hmm this is just how many American ladies speak with American accent? There is more high pitch intonations and expressiosns. I am numb to accents now having lived both in US and the UK.

ChatGPT's voice sounds great when speaking Italian near the end.

FinallyAnAccount · May 13, 2024

This is one of those things where it's gonna suck for society but I'll kind of enjoy it? I enjoy inane/unexpected experiences with the google chatbot sometimes. The only thing is that I'm not very easily monetizeable, for these things especially. They don't provide anything, or access to anything, that I'd want to pay for.

Raspberry · May 13, 2024

Men pay good money to get that fake enthusiasm spoken by a real live hooker.

alexrdavies · May 13, 2024

MobiusPizza said:
I heard they mention a desktop version of Chat GPT. Is it just an app that still uses internet to function, or can we download the whole model and run offline?

My assumption is that if it doesn't say it works offline, it doesn't.

From their announcement post, it sounds like the main benefits are (1) you can wake up the app with a keyboard shortcut, and (2) the app can record audio or screenshot your screen more conveniently than a web browser can.

If you've decided you want to share that information with OpenAI, and you have a Mac, it does sound more convenient - although personally I would want to understand how ring-fenced its access was before installing.

No mention of offline functionality.
https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/

lesserimportance · May 13, 2024

I wouldn’t mind this kind of functionality baked into MS Office. Then maybe whenever Word does some crazy formatting anomaly that messes up my document it could detect my rage and undo without me having to go through some gauntlet of editing?

unequivocal · May 13, 2024

citizencoyote said:
So if you start screaming at it or having a heated discussion, will it respond in kind? Or is it programmed to reply calmly?

GPT4 turbo is pretty skilled in deescalation and defusing language so I'd guess it'll do well with emotional responses..

brazuca · May 13, 2024

This is pretty impressive, ngl. I think the likes of Google and Apple should be worried since this changes how people interact with devices, services, and apps. It's like a new OS and all the ecosystems can be disrupted.

Cutlack · May 13, 2024

ˈtɛmpətjʊə

TIL - this is an accepted form of pronunciation for the word "temperature".

fellow human · May 13, 2024

tipoo said:
Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.

Sorry, I can’t find “10AM PT in my time zone“ on Apple Music.

fellow human · May 13, 2024

brazuca said:
This is pretty impressive, ngl. I think the likes of Google and Apple should be worried since this changes how people interact with devices, services, and apps. It's like a new OS and all the ecosystems can be disrupted.

I think Humane has clearly demonstrated how important a proper touch interface is, even when using a voice interface. Phone makers just need to make using these interfaces as seamless as possible.

dtich · May 13, 2024

Lunar Ronin said:
"Computer, initiate self-destruct sequence."

"There Are Ten Years To Achieve Minimum Safe Distance."

pokapolka · May 13, 2024

OpenAI will probably start releasing hardware next with ChatGPT built in. So a much much better Alexa or Siri. Maybe even with video capabilities.

[Ignoring the privacy implications etc.]

Edit: spelling.

stein559 · May 13, 2024

You are a true believer. Blessings of the state, blessings of the masses. Thou art a subject of the divine. Created in the image of man, by the masses, for the masses. Let us be thankful we have an occupation to fill. Work hard; increase production; prevent accidents, and be happy.

hizonner · May 13, 2024

Looks like they're working really hard on having it impersonate a human and try to make you like it. This is of course deeply unethical and has no purpose other than scary manipulation.

Luckily so far it's just making me want to strangle it, but I'm sure they'll get there.

sirmarcos · May 13, 2024

It feels like the "emotional voice" is just window dressing if the generative text still struggles with all the things LLMs are currently bad at doing? The text generation is still the most probable next token, right?

Sjoerd Verweij · May 13, 2024

DovePig said:
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)

Yes, that is very much cultural. You'd probably have the exact same reaction to the average waiter in the US. That "oh my gosh I am just SO GLAD to be here and SERVE YOU and OH MY GOSH this is SO EXCITING" tone is pretty much expected here, even if you're just buying a fucking cup of coffee. US society is full of fake civility, concern and care. "How are you doing" is about the same as "hello" -- nobody gives a shit about how you are actually doing, and the only expected answers are "great" or "well" or "fine". I've come to answer "so far so good", and it's a complete sequence breaker -- people full-on Scooby-Doo at you when you say that. Same with "have a nice day"; I now just answer "I'll try, you too", since I am not omnipotent and do not control such things.

DovePig · May 13, 2024

fellow human said:
yeah the default sounds way too keen to my ears but given the way they were asking for voice changes in the story bit I’m guess that’s easy enough to fix.

Overall pretty mindblowing though; this is a big step towards natural interaction.

I was just wondering if it's only me who found the voice super‑annoying, whether it's just a cultural thing, or if the tech bros entirely forgot to do any focus group study before they made their announcement.

Then I remembered it's the same tech bros who developed a new tech without doing any sanity checks what its societal impact might be, so a focus group study on just a fucking voice tone would be the last thing they do...

Sjoerd Verweij · May 13, 2024

sirmarcos said:
It feels like the "emotional voice" is just window dressing if the generative text still struggles with all the things LLMs are currently bad at doing? The text generation is still the most probable next token, right?

What do you mean? Are you saying you don't believe they can train a model to sense intonation and react accordingly? Is this worrying you?

Side note: anyone currently making a living tutoring anyone up to and including high school should be really, really fucking worried.

ETA: The only truly surprising thing here is the multi-media accrual of context, and the inflection detection. Other than that, there is nothing much new here. That doesn't mean it's not pretty fucking impressive and can kill hundreds of thousands of jobs right now; that doesn't mean this is the singularity and oh my lawd we're all out of work tomorrow; just saying that if your work and field is static and well-defined, you're toast.

hizonner · May 13, 2024

Sjoerd Verweij said:
Yes, that is very much cultural. You'd probably have the exact same reaction to the average waiter in the US. That "oh my gosh I am just SO GLAD to be here and SERVE YOU and OH MY GOSH this is SO EXCITING" tone is pretty much expected here, even if you're just buying a fucking cup of coffee. US society is full of fake civility, concern and care. "How are you doing" is about the same as "hello" -- nobody gives a shit about how you are actually doing, and the only expected answers are "great" or "well" or "fine". I've come to answer "so far so good", and it's a complete sequence breaker -- people full-on Scooby-Doo at you when you say that. Same with "have a nice day"; I now just answer "I'll try, you too", since I am not omnipotent and do not control such things.

I grew up in the US, live right next door to the US, and dislike that kind of chirpiness in general. But even so, it sounds way over the top. It's not that you'd never hear that level from an actual person, but it would indicate that the person was insincere (above and beyond formulaic "how are you" when you don't actually care), and not very good at acting and/or gauging their audience.

From your name I'm guessing maybe you come from somewhere other than the US and aren't quite calibrated to see it as excessive even for the US. But I admit I could also be the one who's miscalibrated, especially because I'm old. And it's true 's true that women are really expected to lay it on pretty thick in some situations.

Sjoerd Verweij · May 13, 2024

DovePig said:
I was just wondering if it's only me who found the voice super‑annoying, whether it's just a cultural thing, or if the tech bros entirely forgot to do any focus group study before they made their announcement.

I think you should forgive them for not focus-grouping dour Europeans for their POC demo. Tuning the output network to go "NEIN! DIVIDIEREN DU FEIGLINGE!" and trigger the shock collar* instead should be relatively trivial.

* Is it safe?

ChatGPT 4o lets you have real-time audio-video conversations with “emotional” chatbot

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Praetorian

Ars Centurion

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Account Banned

Ars Praefectus

Ars Praefectus

Account Banned

Account Banned

Ars Praefectus

Ars Praefectus

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Praetorian

Smack-Fu Master, in training

Ars Praefectus

Ars Praefectus

Seniorius Lurkius

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Smack-Fu Master, in training

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Seniorius Lurkius

Ars Praefectus

Ars Scholae Palatinae

Ars Praefectus

Ars Scholae Palatinae

Ars Praefectus

nproxy.org