ChatGPT 4o lets you have real-time audio-video conversations with “emotional” chatbot

I grew up in the US, live right next door to the US, and dislike that kind of chirpiness in general. But even so, it sounds way over the top. It's not that you'd never hear that level from an actual person, but it would indicate that the person was insincere (above and beyond formulaic "how are you" when you don't actually care), and not very good at acting and/or gauging their audience.

From your name I'm guessing maybe you come from somewhere other than the US and aren't quite calibrated to see it as excessive even for the US. But I admit I could also be the one who's miscalibrated, especially because I'm old. And it's true 's true that women are really expected to lay it on pretty thick in some situations.
Yeah, the AI in the demo sounded like an advertisement voiceover or someone using their "customer service voice" in a retail setting. Normal people don't bop around chirping at each other in tones like that and it's honestly somewhere in the uncanny valley for me. Hopefully this was meant to make the demo seem more engaging and normal interactions will use a more natural speaking voice.
 
Upvote
5 (5 / 0)

Caanan

Smack-Fu Master, in training
91
If Siri is getting this next month, wow, it'll be like going from the first single celled life to near human in a leap, it couldn't even tell me what time the event at 10AM PT was in my time zone.
Ha! I am an unapologetic Apple fanboi. But I just can't with Siri. I asked it when the next Friday the 13th is – it's a Taylor Swift thing – and it replied, "Monday, May 13th." :cautious:
 
Upvote
2 (2 / 0)
Is no one going to mention how in the live demo, Chat GPT-4o kept interrupting Mark and being interrupted by seemingly non-conversational audio? (around 10:30 mark) Sure, it can pick up emotional states, but it can't tell when you're done speaking, or when you're not speaking at all? The demo made it seem like if there was the slightest background audio, e.g., from the live audience laughing, ChatGPT-4o would get "interrupted", even though Mark wasn't speaking. And it seemed like Mark was trying his best to not let ChatGPT interrupt him... so awkward.
 
Upvote
10 (10 / 0)
Looks like they're working really hard on having it impersonate a human and try to make you like it. This is of course deeply unethical and has no purpose other than scary manipulation.

Luckily so far it's just making me want to strangle it, but I'm sure they'll get there.
I keep repeating this, but I truly hate how digital assistant are hiding their true nature behind a fake human voice. They should have their own voice, capable of human interactions without lying about what they are. Hello? Is there any sound designer in the room?

As you said, this is manipulative. Purposefully designed to trigger an emotional response. I hope this is just a temporary trend like skeuomorphism.

I don't want fake smiles. I don't want fake emotions. I want T.A.R.S.
 
Upvote
10 (11 / -1)

kingliam

Seniorius Lurkius
6
Aargh. That voice they used is just so fucking grating. I don't know what's about it – maybe the overeagerness – but it's almost like hearing nails scratching a school blackboard.

It just sounds so false to me. Is that just a cultural thing? Maybe they should try basing it on a "polite but seemingly very bored German official" for the European audiences ;-)
Not a cultural thing -- It's nails on a chalkboard to me too. It's like an overly peppy cheerleader. Hopefully there will be a way to tone it down.
 
Upvote
8 (8 / 0)
Still, I think the tone of the AI voice is way too over the top even for the US crowd – it reminds me of Anime girl cliché voices, or something like that. Given it's developed by tech bros, I don't find that really that much surprising...

The voice is similar to the stereotypical robot female assistant voice, but with extra performative femininity. For incels and the like, the voice sounds like a "real female" because by "real female", they mean matching ideal femininity from men's fiction, not behaving like actual humans who identify as female.
 
Upvote
-5 (1 / -6)

Edgar Allan Esquire

Ars Praefectus
3,011
Subscriptor
The voice is similar to the stereotypical robot female assistant voice, but with extra performative femininity. For incels and the like, the voice sounds like a "real female" because by "real female", they mean matching ideal femininity from men's fiction, not behaving like actual humans who identify as female.
Very Stepford Wives vibe: '“My gosh,” the short man said, “we don’t want robots for wives. We want real women.”'

I'd always taken that use of "real women" to have an implied irony.
 
Upvote
-3 (0 / -3)

MechR

Ars Tribunus Militum
2,934
Subscriptor
The voice is similar to the stereotypical robot female assistant voice, but with extra performative femininity. For incels and the like, the voice sounds like a "real female" because by "real female", they mean matching ideal femininity from men's fiction, not behaving like actual humans who identify as female.
I assure you, the voice really isn't that appealing, for all the reasons other commenters have tried to articulate. The forced exuberance makes your skin crawl.
 
Upvote
4 (4 / 0)
I assure you, the voice really isn't that appealing, for all the reasons other commenters have tried to articulate. The forced exuberance makes your skin crawl.
Bold of you to assume the other commenters are also incels.

Edit: Why the down votes when I was talking about the appeal to incels and MechR tried to correct me on how unappealing it is to him? I read the top comments here and never assumed the other commenters are incels unless they self-identified as such.
 
Last edited:
Upvote
-2 (1 / -3)

JoHBE

Ars Tribunus Militum
2,563
Subscriptor++
Annoying voice, but that's a hell of a computer interface. The days of keyboard and mouse are going to seem like Conestoga wagons before long. More of a star Trek style tell the computer what you want it to do, and let it figure out how.
This has absolutely NO downsides, right?

Imagine Thinking getting totally out of fashion...what a wonderful world it will be...
 
Upvote
3 (3 / 0)

JoHBE

Ars Tribunus Militum
2,563
Subscriptor++
It feels like the "emotional voice" is just window dressing if the generative text still struggles with all the things LLMs are currently bad at doing? The text generation is still the most probable next token, right?
That's the crux, I think: LLMs bamboozle us with command/manipulation of language, which we can't help but automatically associate with general intelligence. With this, they slapped another mimicry layer on top of it that will supercharge that bamboozling.

I get that this is impressive, intriguing and fascinating... and inevitable that this would be chased. But it's getting damn scary. Essentially, it's not that much different from how the best animators figured out the magic ingredients to get the viewer completely engrossed and identifyng with, say, Tom & Jerry. There's NOTHING of substance down there, but the right sequence of carefully constructed images makes you forget all that, and manages to get you to emotionally invest in what is being shown. An entertaining cartoon is one thing, but here the stakes are getting higher and higher. These technologies have the potential to be like a psychopat: you're tricked into dealing with them like you deal with everyday normal people, because that's how they (are tuned to) appear most of the time. But there are important fundamental principles entirely missing or completely different. God knows when and how and how severely they stab you in the back... There is no "essential" safety net, like humans with for example their ingrained empathy.

Edit: the "stabbing in the back" meant metaphorically, by how the end result of a failed interaction might have serious consequences , that were simply outside the horizon of superficial simulated conversational intelligence
 
Last edited:
Upvote
9 (9 / 0)
I don't know which fate is worse: being replaced by an AI, or having to work in an open-plan office with twenty people who're all having an AI with a valley-girl voice read back to them what their own code does.
lol hopefully they'll use headphones, but even then, listening to a dozen coworkers all chatting away to an unheard voice would be maddening.
 
Upvote
4 (4 / 0)
I don't think you have any idea what you are talking about. I worked a couple restaurants as a cook during college. When you hire wait staff you specifically seek out the candidates with a cheery attitude. Later I had a job that was about 80% travel, so I have probably eaten at least 1,000 meals at a sit-down restaurant. Most of them are genuinely nice.
Nice is not cheerful. If you truly believe a majority of waiters are that stoked about making crap money for dealing with assholes all day, I have a bridge to sell you.
 
Upvote
1 (1 / 0)

Psyborgue

Ars Tribunus Angusticlavius
7,629
Subscriptor++
The text generation is still the most probable next token, right?
It hasn’t been for a long time. The next most probable token results in terrible generation. Instead software rolls some dice and chooses from the top k most probable tokens or picks a number between 0 and 1 then starts adding up token probabilities until that number is reached (top p sampling).

There isn’t a right way to do it, but picking the most likely every time results in repetition and regurgitation.

https://arxiv.org/abs/1904.09751
 
Upvote
2 (2 / 0)

MechR

Ars Tribunus Militum
2,934
Subscriptor
Bold of you to assume the other commenters are also incels.

Edit: Why the down votes when I was talking about the appeal to incels and MechR tried to correct me on how unappealing it is to him? I read the top comments here and never assumed the other commenters are incels unless they self-identified as such.
I'm saying the voice sounds unctuous to virgins and sex-havers alike. If anything, incels might find the exaggerated fake-nice more off-putting. Luckily it can probably be toned down, just like it can be told to ham it up harder in the demo.
 
Upvote
2 (2 / 0)

HalbLeiter

Smack-Fu Master, in training
16
I'm split here.... on the one hand i'm in awe at the progress that is being made in this field, and how responsive and natural it all feels. But on the other hand this means i can't trust any phone call now, especially ones claiming to be from my bank, and i don't see any way for them to mitigate against the abuses that this will inevitably be used for.

Also, got to agree with the other posters... the fake overly friendly US waitress voice is annoying, but i'm sure this will be combined with the voice sampling AI soon so you can have anybody talking to you....
 
Upvote
0 (0 / 0)

HalbLeiter

Smack-Fu Master, in training
16
I don't think you have any idea what you are talking about. I worked a couple restaurants as a cook during college. When you hire wait staff you specifically seek out the candidates with a cheery attitude. Later I had a job that was about 80% travel, so I have probably eaten at least 1,000 meals at a sit-down restaurant. Most of them are genuinely nice.
I assume you're American, so you're probably used to this overly bubbly stuff, but as a European i have to agree with Sjoerd, it feels really weird at first. There's friendly, and then there's American Waitress friendly.
 
Upvote
1 (1 / 0)

elemenopea

Wise, Aged Ars Veteran
123
I assume you're American, so you're probably used to this overly bubbly stuff, but as a European i have to agree with Sjoerd, it feels really weird at first. There's friendly, and then there's American Waitress friendly.
There's really nothing quite like the American chirpiness. I've worked with and am friends with a lot of Americans over the years, and have visited there a few times, and it's remarkable how culturally different Americans are. I hated restaurants in the US.
 
Upvote
1 (1 / 0)