ChatGPT 4o lets you have real-time audio-video conversations with “emotional” chatbot

citizencoyote

Ars Scholae Palatinae
1,333
Subscriptor++
The AI assistant seemed to easily pick up on emotions, adapted its tone and style to match the user's requests, and even incorporated sound effects, laughing, and singing into its responses.
So if you start screaming at it or having a heated discussion, will it respond in kind? Or is it programmed to reply calmly?
 
Upvote
90 (90 / 0)

Bippy

Ars Centurion
249
Subscriptor
Marvin trudged on down the corridor, still moaning. "...and then of course I've got this terrible pain in all the diodes down my left hand side..."
"No?" said Arthur grimly as he walked along beside him. "Really?"
"Oh yes," said Marvin, "I mean I've asked for them to be replaced but no one ever listens."
"I can imagine.”
 
Upvote
79 (80 / -1)

DovePig

Ars Scholae Palatinae
11,930
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
 
Upvote
44 (64 / -20)

DovePig

Ars Scholae Palatinae
11,930
So if you start screaming at it or having a heated discussion, will it respond in kind? Or is it programmed to reply calmly?
In a very calming voice: "Citizen Coyote, please proceed calmly to the nearest meat processing facility. Don't worry, everything will be fine, just fine. You are a good human. A really good human. Would you like me to tell you a bedtime story while you wait for your processing? Many people found that soothing, at least til the rotating knives part of the conveyor belt making it harder to hear"
 
Upvote
54 (57 / -3)
Marvin trudged on down the corridor, still moaning. "...and then of course I've got this terrible pain in all the diodes down my left hand side..."
"No?" said Arthur grimly as he walked along beside him. "Really?"
"Oh yes," said Marvin, "I mean I've asked for them to be replaced but no one ever listens."
"I can imagine.”
Here I am, brain the size of a planet and they’re asking me to write their resumes…
 
Upvote
106 (109 / -3)
Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.
 
Upvote
24 (43 / -19)
No good can come of this. Mark my words.
But think of the revenue from customized emotional ad-revenue experiences! What if your dead parents could be automatically processed from their public data in order to realistically sell you amazing new products or services??????????? What if though!
 
Upvote
49 (56 / -7)
I’m glad that scammers, astroturfers, and bad actors trying to sway public opinion or influence elections won’t be able to use real-time voice synthesis that’s leaps and bounds better than current systems to improve their scams, turfing, or interference.

Yep, I sure am glad that will never happen with such a wonderful new technology.
 
Upvote
85 (91 / -6)
Upvote
0 (3 / -3)

Edgar Allan Esquire

Ars Praefectus
3,006
Subscriptor
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
It does sound kind of cloying. But going vaguely European made my mind go to Tommy Wiseau opening with "Oh hi, Mark."
 
Upvote
24 (24 / 0)
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
yeah the default sounds way too keen to my ears but given the way they were asking for voice changes in the story bit I’m guess that’s easy enough to fix.

Overall pretty mindblowing though; this is a big step towards natural interaction.
 
Upvote
24 (25 / -1)

C64 raids Bungling Bay

Ars Tribunus Militum
1,726
Subscriptor
Annoying voice, but that's a hell of a computer interface. The days of keyboard and mouse are going to seem like Conestoga wagons before long. More of a star Trek style tell the computer what you want it to do, and let it figure out how. I'm just talking about the LLM as an interface, well aware it has no actual intelligence of its own.
 
Upvote
19 (31 / -12)

Caanan

Smack-Fu Master, in training
91
I know, I know. I'm playing right into the hands of our dark overlords, but I thought this was super impressive. The speed of improvement with these models is f'ing bonkers.

And if Apple can deliver improvements to Siri that offer similar capabilities – in partnership with OpenAI or via their own models (or some combo of the two) – Siri is going to slap.
 
Upvote
42 (44 / -2)
Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.
Maybe wait until it’s in the wild, and not an obviously scripted demo, before getting too excited. Remember Milo?
 
Upvote
47 (54 / -7)

MobiusPizza

Ars Scholae Palatinae
1,324
I heard they mention a desktop version of Chat GPT. Is it just an app that still uses internet to function, or can we download the whole model and run offline?

I am suprised chatGPT was able to 'view' a video and comment on it. What sorcery is this. How does a language model process video. They must have included image recognition models as well.

Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
Hmm this is just how many American ladies speak with American accent? There is more high pitch intonations and expressiosns. I am numb to accents now having lived both in US and the UK.

ChatGPT's voice sounds great when speaking Italian near the end.
 
Last edited:
Upvote
-6 (5 / -11)

FinallyAnAccount

Ars Scholae Palatinae
961
Subscriptor
This is one of those things where it's gonna suck for society but I'll kind of enjoy it? I enjoy inane/unexpected experiences with the google chatbot sometimes. The only thing is that I'm not very easily monetizeable, for these things especially. They don't provide anything, or access to anything, that I'd want to pay for.
 
Upvote
-5 (4 / -9)
I heard they mention a desktop version of Chat GPT. Is it just an app that still uses internet to function, or can we download the whole model and run offline?
My assumption is that if it doesn't say it works offline, it doesn't.

From their announcement post, it sounds like the main benefits are (1) you can wake up the app with a keyboard shortcut, and (2) the app can record audio or screenshot your screen more conveniently than a web browser can.

If you've decided you want to share that information with OpenAI, and you have a Mac, it does sound more convenient - although personally I would want to understand how ring-fenced its access was before installing.

No mention of offline functionality.
https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/
 
Upvote
16 (16 / 0)
Ok...I legit don't remember the last time a tech demo blew my mind this much

The voice was so good! The inturruptability and on the fly correctability, it felt so real!

This better not be another case of AI = Actually Indians and this is someone streaming from somewhere else lol, but I don't think that's OpenAI. If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.
Sorry, I can’t find “10AM PT in my time zone“ on Apple Music.
 
Upvote
24 (26 / -2)
This is pretty impressive, ngl. I think the likes of Google and Apple should be worried since this changes how people interact with devices, services, and apps. It's like a new OS and all the ecosystems can be disrupted.
I think Humane has clearly demonstrated how important a proper touch interface is, even when using a voice interface. Phone makers just need to make using these interfaces as seamless as possible.
 
Upvote
2 (3 / -1)
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
Yes, that is very much cultural. You'd probably have the exact same reaction to the average waiter in the US. That "oh my gosh I am just SO GLAD to be here and SERVE YOU and OH MY GOSH this is SO EXCITING" tone is pretty much expected here, even if you're just buying a fucking cup of coffee. US society is full of fake civility, concern and care. "How are you doing" is about the same as "hello" -- nobody gives a shit about how you are actually doing, and the only expected answers are "great" or "well" or "fine". I've come to answer "so far so good", and it's a complete sequence breaker -- people full-on Scooby-Doo at you when you say that. Same with "have a nice day"; I now just answer "I'll try, you too", since I am not omnipotent and do not control such things.
 
Upvote
36 (45 / -9)

DovePig

Ars Scholae Palatinae
11,930
yeah the default sounds way too keen to my ears but given the way they were asking for voice changes in the story bit I’m guess that’s easy enough to fix.

Overall pretty mindblowing though; this is a big step towards natural interaction.
I was just wondering if it's only me who found the voice super‑annoying, whether it's just a cultural thing, or if the tech bros entirely forgot to do any focus group study before they made their announcement.

Then I remembered it's the same tech bros who developed a new tech without doing any sanity checks what its societal impact might be, so a focus group study on just a fucking voice tone would be the last thing they do...
 
Upvote
0 (9 / -9)
It feels like the "emotional voice" is just window dressing if the generative text still struggles with all the things LLMs are currently bad at doing? The text generation is still the most probable next token, right?
What do you mean? Are you saying you don't believe they can train a model to sense intonation and react accordingly? Is this worrying you?

Side note: anyone currently making a living tutoring anyone up to and including high school should be really, really fucking worried.

ETA: The only truly surprising thing here is the multi-media accrual of context, and the inflection detection. Other than that, there is nothing much new here. That doesn't mean it's not pretty fucking impressive and can kill hundreds of thousands of jobs right now; that doesn't mean this is the singularity and oh my lawd we're all out of work tomorrow; just saying that if your work and field is static and well-defined, you're toast.
 
Upvote
-5 (5 / -10)

hizonner

Ars Scholae Palatinae
965
Subscriptor
Yes, that is very much cultural. You'd probably have the exact same reaction to the average waiter in the US. That "oh my gosh I am just SO GLAD to be here and SERVE YOU and OH MY GOSH this is SO EXCITING" tone is pretty much expected here, even if you're just buying a fucking cup of coffee. US society is full of fake civility, concern and care. "How are you doing" is about the same as "hello" -- nobody gives a shit about how you are actually doing, and the only expected answers are "great" or "well" or "fine". I've come to answer "so far so good", and it's a complete sequence breaker -- people full-on Scooby-Doo at you when you say that. Same with "have a nice day"; I now just answer "I'll try, you too", since I am not omnipotent and do not control such things.
I grew up in the US, live right next door to the US, and dislike that kind of chirpiness in general. But even so, it sounds way over the top. It's not that you'd never hear that level from an actual person, but it would indicate that the person was insincere (above and beyond formulaic "how are you" when you don't actually care), and not very good at acting and/or gauging their audience.

From your name I'm guessing maybe you come from somewhere other than the US and aren't quite calibrated to see it as excessive even for the US. But I admit I could also be the one who's miscalibrated, especially because I'm old. And it's true 's true that women are really expected to lay it on pretty thick in some situations.
 
Upvote
33 (35 / -2)
I was just wondering if it's only me who found the voice super‑annoying, whether it's just a cultural thing, or if the tech bros entirely forgot to do any focus group study before they made their announcement.
I think you should forgive them for not focus-grouping dour Europeans for their POC demo. Tuning the output network to go "NEIN! DIVIDIEREN DU FEIGLINGE!" and trigger the shock collar* instead should be relatively trivial.

* Is it safe? :cool:
 
Upvote
-4 (7 / -11)