With new Gen-4 model, Runway claims to have finally achieved consistency in AI videos

What is the value proposition with creating these videos?
Tech companies propose value is shifted from real people to them.

Impressive, but distinct lack of any acutal acting going on in that video.
It reminds me of a quote from American Psycho:
There is an idea of a Patrick Bateman; some kind of abstraction. But there is no real me: only an entity, something illusory. And though I can hide my cold gaze, and you can shake my hand and feel flesh gripping yours and maybe you can even sense our lifestyles are probably comparable... I simply am not there.
 
Upvote
15 (15 / 0)

jhodge

Ars Tribunus Angusticlavius
8,416
Subscriptor++
It doesn't hold up to critical viewing, but it's already better than some of the corporate training-video dreck I've been exposed to. I can easily output from this being used to create "Tom the Trainer" and walk him through demonstrations of what a painfully obvious quid-pro-quo looks like.
 
Upvote
11 (12 / -1)

Aurich

Director of Many Things
37,871
Ars Staff
Thank you for the proper source! Not sure why I found some link farm site instead of that, I should have gone deeper.

Agreed that it does sound like a good use case for AI, and a useful tool (“fancy magic wand” is exactly the right use case for a probabilistic tool like this, the magic wand is already probabilistic), but it’s also far from what their main product seems to be marketed as (wholesale video generation). It also sounds like it’s something Runway purpose built rather than their generic tool, but the Variety article could be misleading on that count.
Nah, you did fine, I got the Variety link from your link lol. Just drilled one level deeper.

I was going to bring it up with Axon so wanted to see what the deal is. Seems like they did make a tool for the movie, so it's not inaccurate, just wasn't actually for generating images.
 
Upvote
6 (6 / 0)
I keep waiting for the day where I can feed a low-res tv series like Star Trek Voyager into an AI model and have it synthesize a 4K version from scratch. No upscaling, no sharpening, no weird artifacts, just take the video/scene as an input and spit out a high quality recreation that looks like it was natively produced in high-def. I've seen what people are doing with tools like Topaz, and that seems like rubbing two sticks together compared to the potential of a product like Runway.
Add 3D/VR version and we've got out own holodeck.

Holosuite.
Quark would already be rubbing his hands together with prurient capitalistic glee.
 
Upvote
4 (4 / 0)

bugsbony

Ars Scholae Palatinae
910
Just imagine what gpt4-o image generator has achieved, and apply it to videos, it will come. And no it will not be controlled only with prompts, like you can do in the gpt4-o image generator, you can give the picture of the actor, a picture of the scenery, and some rough sketches of what you want. For videos, you'll be able to give it a rough sketches of the action with arrows and comments, or storyboard + script, or whatever you can give it. It will be wild.
 
Upvote
-7 (6 / -13)

GFKBill

Ars Tribunus Militum
2,456
Subscriptor
Had a co-worker who was an extra on a Michael Bay film. They had no continuity person at all in the scenes he worked.
Based on some of the later Transformer movies, he doesn't even care about continuity of entire environments, so I'm not surprised.

Or even aspect ratio 🧐
 
Upvote
13 (13 / 0)

ktmglen

Ars Scholae Palatinae
1,194
It doesn't hold up to critical viewing, but it's already better than some of the corporate training-video dreck I've been exposed to. I can easily output from this being used to create "Tom the Trainer" and walk him through demonstrations of what a painfully obvious quid-pro-quo looks like.
Think of all the 1980s forklift safety training videos that could get remade!
 
Upvote
9 (9 / 0)
I keep waiting for the day where I can feed a low-res tv series like Star Trek Voyager into an AI model and have it synthesize a 4K version from scratch. No upscaling, no sharpening, no weird artifacts, just take the video/scene as an input and spit out a high quality recreation that looks like it was natively produced in high-def. I've seen what people are doing with tools like Topaz, and that seems like rubbing two sticks together compared to the potential of a product like Runway.
I feel like a really good use of AI would be feeding scripts, audio, and stills from the missing Doctor Who episodes from the 1960s and have the model generate replacement episodes that match the extant episode in terms of how they look. Maybe they could even generate full color HD versions of The Celestial Toymaker and Marco Polo from production stills.

IMG_0546.jpeg
 
Upvote
3 (9 / -6)

BigOlBlimp

Ars Scholae Palatinae
707
Subscriptor
This is driving me crazy...that example is only barely the same person. Her blemishes and scars constantly change scene-to-scene. [Edit: the side profile at the beginning looks like a totally different person, the younger sister of the woman we see later.]

"Come back in one year and see-" I am quite confident that in 10 years we will still have AI that is much dumber than a rat and not actually capable of understanding object permanence. They will be better at faking it, but I strongly doubt this tech will ever be capable of creating 30 minutes of coherent film without a human constantly correcting all the common-sense errors.
Hot take but I think the definition of common sense will change. People will pay less attention to those small details as AI gets better and both will eventually meet somewhere in the middle. Starting with stuff like kids shows anyway. I genuinely don’t think a kid would notice the details you mentioned.
 
Upvote
-7 (3 / -10)
I keep waiting for the day where I can feed a low-res tv series like Star Trek Voyager into an AI model and have it synthesize a 4K version from scratch. No upscaling, no sharpening, no weird artifacts, just take the video/scene as an input and spit out a high quality recreation that looks like it was natively produced in high-def. I've seen what people are doing with tools like Topaz, and that seems like rubbing two sticks together compared to the potential of a product like Runway.
Voyager was shot on film, aside from the dodgy CGI and stagey lighting, it should look really good with a quality scan. No worse than any feature film of the era. I haven't looked for one but I'm surprised it hasn't been rescanned at at least HD (and really, that's all you need, the advantage of 4k over HD is marginal at best in all but the most extreme viewing scenarios - in fact, most films are still finished at 2k, although that's changing).
Had a co-worker who was an extra on a Michael Bay film. They had no continuity person at all in the scenes he worked.

Also, Bay found an extra that he really liked, and he rewrote the script to give them lines, and more screen time. Which was at least kinda cool.
I can absolutely guarantee you that they did in fact have a continuity person. Your friend just didn't notice them - extras really don't get to interact with anyone except other extras, third ADs, and PAs, for the most part. They would never interact with the continuity person. (Or they did and were confused because the actual name of the position is "script supervisor").

Major feature films always have a script supervisor on set. It's a critical role.
 
Upvote
9 (9 / 0)

poochyena

Ars Scholae Palatinae
3,338
Subscriptor++
This is driving me crazy...that example is only barely the same person. Her blemishes and scars constantly change scene-to-scene. [Edit: the side profile at the beginning looks like a totally different person, the younger sister of the woman we see later.]
eylg0g9t4hzc1.jpeg
 
Upvote
-2 (5 / -7)
Hot take but I think the definition of common sense will change. People will pay less attention to those small details as AI gets better and both will eventually meet somewhere in the middle. Starting with stuff like kids shows anyway. I genuinely don’t think a kid would notice the details you mentioned.
You might be right, but I really hope not. That sounds like a really, really sad future.
 
Upvote
13 (13 / 0)

BigOlBlimp

Ars Scholae Palatinae
707
Subscriptor
You might be right, but I really hope not. That sounds like a really, really sad future.
I’m kind of outing myself as being a victim of sludge brain from ai generated Instagram reels, but I’ve already started thinking of entities in those videos as concepts rather than.. you know immutable physical objects.

Like if I know I’m watching just a dumb reel I’ll connect the bearded guy in the red sports car from the last scene to the current one even if his brand of glasses and the model of car has changed. That’s an extreme example and at least I’m aware of it, but it’s becoming a little more second nature than I’d like to admit.

I am also worried about kids growing up with that actually being second nature.
 
Upvote
-5 (6 / -11)
I personally hate that idea.
The OP is dunking on Squaresoft’s “virtual actors” concept which they tried to make a thing with the lead character in the Final Fantasy movie.

The problem is, everyone forgot that movie so hard (usually within seconds of leaving the movie theatre) that nobody’s getting the joke. OP should have put in a sarcasm tag.
 
Upvote
7 (7 / 0)

dragonzord

Ars Scholae Palatinae
676
I keep waiting for the day where I can feed a low-res tv series like Star Trek Voyager into an AI model and have it synthesize a 4K version from scratch. No upscaling, no sharpening, no weird artifacts, just take the video/scene as an input and spit out a high quality recreation that looks like it was natively produced in high-def. I've seen what people are doing with tools like Topaz, and that seems like rubbing two sticks together compared to the potential of a product like Runway.
Holding on to my dvd collection for this. At some point sooner, it'll be something you can run a job for overnight, but the ultimate goal would be something that can do it on the fly so it keeps the old feel of physically handling a movie.
 
Upvote
0 (0 / 0)
Listening to the "music", the piano chords - in a leading tone tonality, then a little semi staccato arpeggio, it ends up on a flat 7, tonally completely out of character in reference to the chords we just heard which approximate badly done appropriated Chopin. Ugh.
AI generated music is still absolute trash because of the level of consistency and math involved in making music sound good dwarfs what you need for pictures to look ok. If you’re off by a few Hz or a fraction of a second, people are going to hear it.
 
Upvote
4 (4 / 0)

Aurich

Director of Many Things
37,871
Ars Staff
I’m kind of outing myself as being a victim of sludge brain from ai generated Instagram reels, but I’ve already started thinking of entities in those videos as concepts rather than.. you know immutable physical objects.

Like if I know I’m watching just a dumb reel I’ll connect the bearded guy in the red sports car from the last scene to the current one even if his brand of glasses and the model of car has changed. That’s an extreme example and at least I’m aware of it, but it’s becoming a little more second nature than I’d like to admit.

I am also worried about kids growing up with that actually being second nature.
Just a wild thought, but you could make a conscious decision to stop watching that stuff.

Not even being judgey, just ... it's like bottom barrel garbage that you're just wiling away your life consuming.

I look at Instagram. I try to not do it too much, but a lot of artists and hobbies I'm interested are there, and I'll definitely find myself scrolling. So again, not judging.

But the moment they start feeding me AI garbage I close the app. It's actually helpful as a reminder that I'm using it too much.
 
Upvote
30 (31 / -1)

oluseyi

Ars Scholae Palatinae
1,316
What is the value proposition with creating these videos?
"Creative auteurs without the skills funds access to talent and equipment will be able to generate the visual representations of the stories they imagine and… hope that audiences will materialize to watch rather than use the same AI tools to make their own indulgent narrative representations?"

It's a tragedy of the commons. Nobody has thought about what this inevitably does to a marketplace. In a sense, though, this is the logical conclusion of creative copyright: a legal and financial system set up to spur new creation/invention intellectual property eventually consumed by a technological system that can generate combinatorial variations of all the creations that have come before it. I'm undecided on whether that's a good or bad thing.
 
Upvote
-3 (1 / -4)

oluseyi

Ars Scholae Palatinae
1,316
I look at Instagram. I try to not do it too much, but a lot of artists and hobbies I'm interested are there, and I'll definitely find myself scrolling. So again, not judging.

But the moment they start feeding me AI garbage I close the app. It's actually helpful as a reminder that I'm using it too much.
Have you checked out Cara?
 
Upvote
0 (1 / -1)
In all seriousness, given the hurdles involved in planning and producing VR content, this could end up being a viable killer application.
The issue is that it’s assuming a whole heap of functionality that isn’t there yet. The current systems are generating 2d images of videos. They’re not generating 3d anything. That’s a whole different problem that may require a very different approach.
 
Upvote
10 (10 / 0)
The issue is that it’s assuming a whole heap of functionality that isn’t there yet. The current systems are generating 2d images of videos. They’re not generating 3d anything. That’s a whole different problem that may require a very different approach.
Exactly, you’d get those wildly gyrating/limb-sprouting gymnasts, but in VR and also all of the background scenery is doing the same thing.
 
Upvote
3 (3 / 0)
I’m kind of outing myself as being a victim of sludge brain from ai generated Instagram reels, but I’ve already started thinking of entities in those videos as concepts rather than.. you know immutable physical objects.

Like if I know I’m watching just a dumb reel I’ll connect the bearded guy in the red sports car from the last scene to the current one even if his brand of glasses and the model of car has changed. That’s an extreme example and at least I’m aware of it, but it’s becoming a little more second nature than I’d like to admit.

I am also worried about kids growing up with that actually being second nature.
Yeah that is uh, incredibly disturbing. What the actual fuck.

Get off that shit.
 
Upvote
13 (13 / 0)

Dadlyedly

Ars Tribunus Militum
2,456
Subscriptor
I keep waiting for the day where I can feed a low-res tv series like Star Trek Voyager into an AI model and have it synthesize a 4K version from scratch. No upscaling, no sharpening, no weird artifacts, just take the video/scene as an input and spit out a high quality recreation that looks like it was natively produced in high-def. I've seen what people are doing with tools like Topaz, and that seems like rubbing two sticks together compared to the potential of a product like Runway.
Honestly, I'm just waiting for the day when I can feed movies into an AI model and have it synthesize it in the style of a different director and with different actors. Just for the heck of it, I'd redo Star Wars eps1-6 as directed by Stanley Kubrick, David Lynch, John Carpenter, Quentin Tarantino, David Cronenburg, and John Ford just for variety. Then substitute in The Muppets for everyone but Samuel L. Jackson.
 
Upvote
4 (6 / -2)

Kjella

Ars Tribunus Militum
1,992
The issue is that it’s assuming a whole heap of functionality that isn’t there yet. The current systems are generating 2d images of videos. They’re not generating 3d anything. That’s a whole different problem that may require a very different approach.
Things are just getting started but there's people working on text -> Image -> video -> 3d -> multi-view video -> 4D generation (3D over time) pipeline:

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Also another person commented on the lack of character acting in the clip, well there's people working on that too:

MoCha: Towards Movie-Grade Talking Character Synthesis

This stuff isn't perfect but I don't think it's that long until you have a multi-angle shot of Will Smith eating spaghetti while holding a conversation. Some, not all, of those clips are passing as cheesy movie lines for me.
 
Upvote
-5 (2 / -7)

Deja-Q-

Seniorius Lurkius
44
That era Star Trek is so formulaic that you should be able to feed the library to AI and have it generate fresh episodes for you on request.

Computer, generate me a double episode with Troy's mom seducing ensign Wesley..
That era Star Trek is so formulaic that you should be able to feed the library to AI and have it generate fresh episodes for you on request.

Computer, generate me a double episode with Troy's mom seducing ensign Wesley..
Would love to have that quality of show today. The modern versions are the 'formulaic' ones of pointless mediocrity. We can throw in Star Wars too, in fact humans ran out of ideas. Let the AI take over is what modern day has shown me.
 
Upvote
4 (4 / 0)

Purpleivan

Ars Centurion
291
Subscriptor++
When considering the real usefulness of this, lets consider what isn't in this video and is rarely (if ever) in videos like this one.

First, length. As usual, we're looking at 15 seconds, not the 30 minutes to 3 hours, required for TV series and movies. It fine to say that they have consistency (even when this video is only vaguely consistent), but how would that consistency fare if the video was the same length as a typical movie of 90 minutes, which 360 times as long as this little clip.

Secondly, there's the unseen things, such as feet and their connection to the ground, which would have been especially difficult in the scene with the forest floor, with the complex interaction between the person's footsteps and the leaves, grass, dirt etc. that they'd be walking on. Get that wrong and characters will appear float, rather than tread.

Next, interaction involving the character. The only interaction in this is video is some gentle adjustments to the angle of a steering wheel. I wonder how this video would look if the person in the driver's seat was doing something as basic as turning out into a junction (using turn signals, looking each way etc.), or better yet, turning to wrestle with an attacker in the back seat.

For another, speech. In a convincing video, the movement of the whole body is included in that, as whether or not to maintain eye contact, including turning to keep the person spoken to in view, is character and situation dependent. Get that wrong and conversations will look as bad in terms of character interaction, as early CGI efforts. That's ignoring all the small complexities of what's going on with the face, which again are character and context dependent.

Finally, good acting. It's going to be one thing to have all characters behave in much the same way, an amalgum of the source data, but good acting is precise, detailed and consistent and very much based on context. It can't just be "phoned it in" quality, of an actor speaking the lines, but there's nothing in the details of the character to make them believable. Show me an Emmy winning quality, 5 minutes long talk between two characters, whose emotions subtly change throughout, as the conversation evolves. Then I might be impressed.

Currently, at best what they've got is something that can make bland, dreamy TV commercials.
 
Upvote
20 (20 / 0)

MachinistMark

Smack-Fu Master, in training
91
Currently, at best what they've got is something that can make bland, dreamy TV commercials.

I was literally just about to comment that this is on par with awful perfume/aftershave ads in a lot of ways. At the end of either, you're kind of just thinking "but what was that supposed to be?". I could actually see this tech being used for TV ads, given that they are less likely to be scrutinised much, and it could sneak under the radar since most people who care to look for it aren't exactly rushing to watch TV ads.

At least her legs didn't sprout out of her ears in this video, in that way it's a step up from the old shite, but I'm not sure consistent is particularly accurate. If consistent means "nobody melted" now then my standards for consistency are far too high.
 
Upvote
6 (7 / -1)

JoHBE

Ars Tribunus Militum
2,563
Subscriptor++
Was… that video supposed to be consistent?

The strange “indent” on the spare tire keeps changing, the woman is very clearly different in each scene (she doesn’t even appear to be of the same ethnicity), her earrings are all weird, the cabin spontaneously sprouts two front windows…

Plus everything still has that odd floaty movement that AI video has, because it can’t figure out how things are supposed to move with weight and purpose.

Blech. No thanks.
People still ccnfuse genuinely impressive tech demos with a functional tool that should become "invisible" (= not aggressively impose a set of constraints and homogenisation) once matured.

As long as you wear ypur "tech" glasses, this looks impressive and great, but once you're in "consuming engaging content" mode, it quickly falls apart.
 
Upvote
8 (11 / -3)

Beddict

Ars Centurion
270
Subscriptor
What is the value proposition with creating these videos?I
I was standing in the bus the other day behind a tired looking mom and her ~5 year old son. The kid sat with his head buried in the phone watching 10 sec long very colorful AI generated videos of different animals / humans / objects merging into combinations, one 10 sec video after the other until I got off.

So; giving parents peace in exchange for destroying the brains of children and making money from ads?
 
Upvote
13 (13 / 0)

drewmu

Smack-Fu Master, in training
73
With the all the AI generated content breakthroughs that are occurring, despite how impressive they are, I find myself eventually asking the fundamental existential question, which is, “Is my life, and human life in general, so poor from the lack of text to read, music to listen to, photos to look at, and videos to watch, that we need machines to generate an infinite supply of them?”

The answer is a resounding no.

And when I think of cases in which a person wants an infinite supply of any of those, most would consider that an unhealthy appetite to feed.

Yet on we go, building these machines. Sigh.

I guess we can’t have ML protein folding algorithms without also having infinite TikTok feed ML algorithms.
 
Upvote
14 (15 / -1)

AndrewZ

Ars Legatus Legionis
11,606
Am I the only person who thinks AI generated videos are pointless? Is this worth spending Billions of dollars on? Is this actually supposed to earn back the 10's of Billions of dollars invested? It just seems flashy and stupid. This AI hype is all so much fluffery.

I should add that without a semantics framework where the AI actually "understands" motivations for actions, the characters will only be shadows and ghosts.
 
Upvote
9 (12 / -3)