Gemini gets new coding and writing tools, plus AI-generated “podcasts”

Dassassin · Mar 18, 2025

All content in NotebookLM is captured and trained on. My experience with the audio summary feature a few months ago was decidedly "meh," with enough hallucinations and obvious errors that the slower audio delivery (compared to skimming a text summary) just made things more frustrating.

strider107 · Mar 18, 2025

Does anyone actually want to listen to an AI postcast (I mean for anything beyond the novelty of it)? I can't imagine going for a walk and just casually listening to a computer talk to itself. I feel like I would hate it and I think I would hate it even more if it was actually a good podcast lol.

picklefactory · Mar 18, 2025

"Adept at coding", indeed? Compared to...?

And not a single mention of... 𝓽𝓱𝓮 𝓿𝓲𝓫𝓮𝓼???

MachinistMark · Mar 18, 2025

I think we have plenty podcasts nobody wants to listen to already, thanks!

Aidolon · Mar 18, 2025

As other posters have also stated, I was pretty nonplussed at the 'synthetic podcast' functionality, so I took an incredibly dry 116-page procedual manual from my company and chucked it at NotebookLM to see what would happen (public-facing content, nothing sensitive).

The synthetic podcast is... Surprisingly good. The only adaptation I made was for the prompt to be "explain the core elements of the procedures for someone doing the job for the first time". It spat out a 19-minute back-and-forth simulating a female host looking at the procedures for the first time and speaking with a male host who was more experienced and could field questions from her. I'm only about 7 minutes in, but it's been pretty well entirely on target so far (which isn't too surprising as it's providing an overview rather than painstaking technical detail).

I'm still not sure who the target audience would be (I don't think I'd subject new employees at my company to a 19-minute synthetic podcast), but the results are better than I expected.

¯\ (ツ) /¯

peterford · Mar 18, 2025

It is fascinating watching all this grow and get glued together.

I do feel like this might be my tech breaking point though. The bit where all the younguns use it like water and I'm left shaking my cane at them.

macvirii · Mar 18, 2025

I like to use notebooklm as a research paper summary and looking for points in common between papers. It does a good job with that.

Fatesrider · Mar 18, 2025

Aidolon said:
As other posters have also stated, I was pretty nonplussed at the 'synthetic podcast' functionality, so I took an incredibly dry 116-page procedual manual from my company and chucked it at NotebookLM to see what would happen (public-facing content, nothing sensitive).

The synthetic podcast is... Surprisingly good. The only adaptation I made was for the prompt to be "explain the core elements of the procedures for someone doing the job for the first time". It spat out a 19-minute back-and-forth simulating a female host looking at the procedures for the first time and speaking with a male host who was more experienced and could field questions from her. I'm only about 7 minutes in, but it's been pretty well entirely on target so far (which isn't too surprising as it's providing an overview rather than painstaking technical detail).

I'm still not sure who the target audience would be (I don't think I'd subject new employees at my company to a 19-minute synthetic podcast), but the results are better than I expected.

¯\ (ツ) /¯

That last line is the most telling thing about the issues with all the AI bullshit they're throwing out there.

They keep heaving that bovine fertilizer at walls, hoping SOMETHING sticks. But so far, for the most part, it's been, "Gosh, that's COOL!" for a few minutes and then the novelty wears off and the flaws (even if few) are found. Yeah, no people involved in the making of that media, but when what comes out is pretty useless overall, it's not even worth the KWH's it took to generate the output.

NKA is the issue. No Killer App. Lots of niche applications, granted. Some of them useful. But the for the trillions (by now, I'd think) blown on this tech, they're getting very, very little in practical returns. For that money relentlessly poured into it, they'd have done better with flesh and blood types, even if it took a bit longer.

The flaw in the reasoning is they only calculate the cost of the comparatively few successes, without considering the time, effort and costs for all the shit that had to be thrown out because it didn't do the job good enough to pass.

bainbrge · Mar 18, 2025

Was pretty unimpressed with Deep Research - it failed to produce a report for me consisting of taking some basic financial information from a defined set of entities that all publish their financial statements online in similar formats. It seemed to get a few right but was completely unable to replicate across the set, going down irrelevant rabbit holes. This was a task that a secondary school child could do in a few hours.

graylshaped · Mar 18, 2025

Aidolon said:
As other posters have also stated, I was pretty nonplussed at the 'synthetic podcast' functionality, so I took an incredibly dry 116-page procedual manual from my company and chucked it at NotebookLM to see what would happen (public-facing content, nothing sensitive).

The synthetic podcast is... Surprisingly good. The only adaptation I made was for the prompt to be "explain the core elements of the procedures for someone doing the job for the first time". It spat out a 19-minute back-and-forth simulating a female host looking at the procedures for the first time and speaking with a male host who was more experienced and could field questions from her. I'm only about 7 minutes in, but it's been pretty well entirely on target so far (which isn't too surprising as it's providing an overview rather than painstaking technical detail).

I'm still not sure who the target audience would be (I don't think I'd subject new employees at my company to a 19-minute synthetic podcast), but the results are better than I expected.

¯\ (ツ) /¯

Which brings us back to "glorified spell checker."

The takeaway from your trial is that it might be worthwhile to throw that same manual back a a model and ask it to provide a two-page summary of "core points for first time users" and include it--after review--as a foreword to the dry document. There is value in these models as copy editors. Just don't include an "Accept All" option.

ldrn · Mar 18, 2025

NotebookLM has been absolutely fantastic for quick checking my D&D notes and rules. I don't use the podcast feature but asking it about items abilities or names I forgot has been extremely useful.

imchillyb · Mar 18, 2025

"Now you too can fall asleep to the latest AI hallucinatory audio-thing. Remember, it's not a lie it's a hallucination!" -Google's marketing team, probably.

bugsbony · Mar 18, 2025

Dassassin said:
All content in NotebookLM is captured and trained on.

Is that a fact or an hallucination ?

I wish people would try a bit harder adding words like "I think", "I'm sure", "I doubt", "I read somewhere"... or even post links.

silverboy · Mar 18, 2025

More bad news.

isparavanje · Mar 18, 2025

Fatesrider said:
That last line is the most telling thing about the issues with all the AI bullshit they're throwing out there.

They keep heaving that bovine fertilizer at walls, hoping SOMETHING sticks. But so far, for the most part, it's been, "Gosh, that's COOL!" for a few minutes and then the novelty wears off and the flaws (even if few) are found. Yeah, no people involved in the making of that media, but when what comes out is pretty useless overall, it's not even worth the KWH's it took to generate the output.

NKA is the issue. No Killer App. Lots of niche applications, granted. Some of them useful. But the for the trillions (by now, I'd think) blown on this tech, they're getting very, very little in practical returns. For that money relentlessly poured into it, they'd have done better with flesh and blood types, even if it took a bit longer.

The flaw in the reasoning is they only calculate the cost of the comparatively few successes, without considering the time, effort and costs for all the shit that had to be thrown out because it didn't do the job good enough to pass.

Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.

As a side note, I'm honestly astonished at how anti-AI the Ars crowd is. I'm reasonably senior with code these days, and while AI doesn't code as well as I do at difficult problems, it definitely helps my productivity a ton. So much of code is boilerplate and trivial API pipes and that kind of stuff, you don't need a particularly capable dev to do it.

Bzored · Mar 18, 2025

peterford said:
It is fascinating watching all this grow and get glued together.

I do feel like this might be my tech breaking point though. The bit where all the younguns use it like water and I'm left shaking my cane at them.

I feel this, I just have no interest. All it feels like to me is a way for companies to intrude themselves even further into your life.. if that's possible.

Aelix · Mar 19, 2025

isparavanje said:
Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.

As a side note, I'm honestly astonished at how anti-AI the Ars crowd is. I'm reasonably senior with code these days, and while AI doesn't code as well as I do at difficult problems, it definitely helps my productivity a ton. So much of code is boilerplate and trivial API pipes and that kind of stuff, you don't need a particularly capable dev to do it.

Coding is definitely the killer app. Claude 3.7 is scary good at writing and refactoring complex SQL queries. When refactoring SQL, it’s easy to verify if the output is correct.

You can even do things like “change the output of this SQL report to output a single row with a single column that contains a representation of the output as a markdown table” and it just… does it.

Hydrargyrum · Mar 19, 2025

ldrn said:
NotebookLM has been absolutely fantastic for quick checking my D&D notes and rules. I don't use the podcast feature but asking it about items abilities or names I forgot has been extremely useful.

Problem is, it's hard to trust the answers. I've just spent twenty minutes trying to use the free version of Gemini to help me with using Point Cloud Library (PCL), which is an open source C++ library for processing geometrical data. I'm not very experienced with C++ (I'm mostly a C# guy) so I find it tricky to parse the type and function signatures of PCL methods to figure out what the actual differences between them are. That's rarely directly called out in the method documentation - all overloads of a given method typically have the same description, so I thought "an LLM should be able to elaborate on this a bit for me".

Gemini kept making up method signatures which were completely believable, but not, in fact, actually part of PCL. E.g., saying that PCL would give me an orientation in a different mathematical form from what it actually does. Or telling me that there are three overloads, or one overload, of a specific method which actually had two overloads. And it kept telling me that it had checked the source code and given me the "exact results directly from the source code", while being wrong. Even though I have no doubt that since PCL is a mature open-source project, its documentation and its source code would have both been included in Gemini's training materials, and I suspect should also have been included directly in RAG data for my questions.

I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.

That said, you're more likely to be successful with D&D questions than with questions for more niche games, due to the fact that it's fundamentally a statistical model so the most-discussed game will have the highest influence on the model training. And it might give answers that are good enough for an immediate ruling at the table, but I don't have high confidence that they'll actually be correct.

TimeToTilt · Mar 19, 2025

Hydrargyrum said:
Problem is, it's hard to trust the answers. I've just spent twenty minutes trying to use the free version of Gemini to help me with using Point Cloud Library (PCL), which is an open source C++ library for processing geometrical data. I'm not very experienced with C++ (I'm mostly a C# guy) so I find it tricky to parse the type and function signatures of PCL methods to figure out what the actual differences between them are. That's rarely directly called out in the method documentation - all overloads of a given method typically have the same description, so I thought "an LLM should be elaborate on this a bit for me".

Gemini kept making up method signatures which were completely believable, but not, in fact, actually part of PCL. E.g., saying that PCL would give me an orientation in a different mathematical form from what it actually does. Or telling me that there are three overloads, or one overload, of a specific method which actually had two overloads. And it kept telling me that it had checked the source code and given me the "exact results directly from the source code", while being wrong. Even though I have no doubt that since PCL is a mature open-source project, its documentation and its source code would have both been included in Gemini's training materials, and I suspect should also have been included directly in RAG data for my questions.

I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability, or similar errors), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.

That said, you're more likely to be successful with D&D questions than with questions for more niche games, due to the fact that it's fundamentally a statistical model so the most-discussed game will have the highest influence on the model training. And it might give answers that are good enough for an immediate ruling at the table, but I don't have high confidence that they'll actually be correct.

Yeah hallucinations are annoying but the insistence on hallucinating is weird. We do a lot of receipt processing, and no matter how I worded the query chat GPT 40 would just make things up on a receipt that wasn't legible. It didn't matter how many or how I told it that it was okay if it couldn't find something or couldn't find anything. I would push back and it would be like " you're right I can't read this " then I would immediately feed it the same receipt and it would make everything up again. It's my least favorite trait of people in the real world in a program. When I go and ask somebody that works at the store where something's located and they don't know instead of just saying I don't know they wander around looking randomly as if I didn't already do that or couldn't do that. Like it's okay not to know something, just say that and let me find somebody who does. Llm is like the worst version of that so far

JMTronicHobbyist · Mar 19, 2025

strider107 said:
Does anyone actually want to listen to an AI postcast (I mean for anything beyond the novelty of it)? I can't imagine going for a walk and just casually listening to a computer talk to itself. I feel like I would hate it and I think I would hate it even more if it was actually a good podcast lol.

It would have to be better than Joe Rogan, right?

wrosecrans · Mar 19, 2025

Why is the subhead talking about "one of Google's coolest AI products" ? None of this seems cool. Some of it seems actively terrible. Frankly, being so positive about overhyped AI stuff really comes across as quite biased. You'll note that the comments on articles about this stuff are pretty universally negative so it raises an eyebrow are that for out of the general consensus of people paying attention to this stuff when they favorably cover a corporation.

Daniel · Mar 19, 2025

Hydrargyrum said:
I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.

As someone who recently started playing D&D, I've used Copilot a ton with questions on my class and general gameplay and it's been really helpful. I did try some of the same questions in ChatGPT and got wildly different results, often telling me rules from different versions even when I specified the version I was interested in, even for simple things like class abilities as specific levels. I know we're not talking complex calculations or anything but being able to get great examples of how to play and helping draft cheatsheets I can use in game has been awesome for someone new to the game who doesn't want to come off as clueless in a group.

case_ratchet · Mar 19, 2025

Maybe I'm ancient, but I prefer podcasts that feature humans.

password123 · Mar 19, 2025

I've used Copilot thus far but only as a glorified auto complete for programming, it is pretty cool. I'd like to try out the more advanced coding assistants, if you write your own unit tests, I imagine that can help catch potential hallucinations and also help you understand what the LLM actually generated.

Midnitte · Mar 19, 2025

bugsbony said:
Is that a fact or an hallucination ?

I wish people would try a bit harder adding words like "I think", "I'm sure", "I doubt", "I read somewhere"... or even post links.

According to Google they don't, but... I'd be lying if I said I trusted Google.

shunted · Mar 19, 2025

Or have it write the unit tests which no one particularly enjoys.

My issue though is these coding assistants are great at actual code which most people enjoy, leaving the developer to do only the tedious parts...annoying build and deploy, qa, configs wiring things together, etc. I'd rather do the coding and it did the tedious parts.

CUclimber · Mar 19, 2025

isparavanje said:
Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.

As a side note, I'm honestly astonished at how anti-AI the Ars crowd is. I'm reasonably senior with code these days, and while AI doesn't code as well as I do at difficult problems, it definitely helps my productivity a ton. So much of code is boilerplate and trivial API pipes and that kind of stuff, you don't need a particularly capable dev to do it.

I agree on all points. During discussions at work Ars actually comes up as the old curmudgeons who are seemingly against everything AI. The age of the users on this site probably plays a role in it as I'm sure it skews older, but it's pretty funny to hear people who've never mentioned Ars bring it up just for this.

I think that a lot of software engineers forget that most coding just isn't all that consequential, in the grand scheme of things. I have a good friend who writes firmware for avionics systems for small aircraft-- that stuff needs to be 100% bulletproof, tested for years, and generally has to be the best of the best.

But especially in the age of rapid re-development of web apps, lots of it just doesn't matter as much as people like to think it does. I have been using AI agents (Cursor + Claude 3.7) for a while now and I'm well aware that what I get out of them isn't always the most efficient, the most DRY, the prettiest, etc. But it works really well for a fairly straightforward web application, and when it comes time to replace, rewrite, or refactor it in 18 months nobody is going to care all that much that it could have been more perfect if only it was written by hand by a person.

Or another example-- I needed a page built, with a handful of API calls to an existing internal system, that was basically a developer-facing test page to help with a whole bunch of diagnostics and component demos. It probably would've taken me a full day to build it from scratch, but Claude had a very workable solution in under a minute. No security to worry about, no over-optimized UI/UX that had to be run past our Product team, just a utility. It let me cruise forward with the rest of my work and massively helped out with the other tasks in the sprint.

We check and audit what goes into our codebase still and we don't just blindly accept everything that the AI produces, and as mid-career developers we're pretty aware when it starts spitting out garbage. But to hear people say that all AI coding agents are just a waste of compute cycles is pretty short-sighted.

Drop table users; · Mar 19, 2025

Big tech is spending a shit ton of money to fire pretty much everyone.

The question remains who is going to buy all this stuff once no one has a job and what money will they use to pay for it.

bugsbony · Mar 19, 2025

Midnitte said:
According to Google they don't, but... I'd be lying if I said I trusted Google.

Thanks for the link:

As a Google Workspace or Google Workspace for Education user, your uploads, queries and the model's responses in NotebookLM will not be reviewed by human reviewers, and will not be used to train AI models.

Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.

isparavanje · Mar 19, 2025

CUclimber said:
I agree on all points. During discussions at work Ars actually comes up as the old curmudgeons who are seemingly against everything AI. The age of the users on this site probably plays a role in it as I'm sure it skews older, but it's pretty funny to hear people who've never mentioned Ars bring it up just for this.

I think that a lot of software engineers forget that most coding just isn't all that consequential, in the grand scheme of things. I have a good friend who writes firmware for avionics systems for small aircraft-- that stuff needs to be 100% bulletproof, tested for years, and generally has to be the best of the best.

But especially in the age of rapid re-development of web apps, lots of it just doesn't matter as much as people like to think it does. I have been using AI agents (Cursor + Claude 3.7) for a while now and I'm well aware that what I get out of them isn't always the most efficient, the most DRY, the prettiest, etc. But it works really well for a fairly straightforward web application, and when it comes time to replace, rewrite, or refactor it in 18 months nobody is going to care all that much that it could have been more perfect if only it was written by hand by a person.

Or another example-- I needed a page built, with a handful of API calls to an existing internal system, that was basically a developer-facing test page to help with a whole bunch of diagnostics and component demos. It probably would've taken me a full day to build it from scratch, but Claude had a very workable solution in under a minute. No security to worry about, no over-optimized UI/UX that had to be run past our Product team, just a utility. It let me cruise forward with the rest of my work and massively helped out with the other tasks in the sprint.

We check and audit what goes into our codebase still and we don't just blindly accept everything that the AI produces, and as mid-career developers we're pretty aware when it starts spitting out garbage. But to hear people say that all AI coding agents are just a waste of compute cycles is pretty short-sighted.

I'm glad to know that you have similar experiences! I'm also perhaps mid-career by tech standards, but I'm in academia. A lot of my examples are similar; for example, if I'm doing research, I'd write a ton of quick prototypes as Jupyter notebooks. At some point if the project takes off I might want my code to be refractored into a Python module or a script (depending on purpose and size); with Claude as long as I specify a reasonable structure for the codebase, this can be done in minutes instead of hours. I think there's a time when you're senior enough that your time is worth a lot and you don't want to waste time doing these things a junior can do, but not senior enough to have a team of juniors to call upon (and even if you did, they're more expensive than a few API calls). I typically use Cline + Sonnet 3.7 but since we use the same underlying model I imagine it's comparable.

There are also simple things like little utility scripts that I would just like to add a simple UI to so that I don't have to teach people how to use them and this kind of trivial but tedious work can be done very rapidly with AI as well.

I get the feeling that a lot of online discussion has been either vastly overoptimistic (eg. reddit/twitter with vibe coding), or very anti-AI (well, here). It's a bit surprising that I haven't got to talk to experienced coders who have had good experiences with AI aside from at work, and the occasional conversation like this one.

SubWoofer2 · Mar 19, 2025

bugsbony said:
Thanks for the link:

Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.

I missed the bit "and will not be saved, stored, or retained on any server operated or controlled by us or any of our agents. Nor will your data be passed through to any third party without their specific court-ordered request"

ldrn · Mar 19, 2025

Hydrargyrum said:
[...]

I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.

That said, you're more likely to be successful with D&D questions than with questions for more niche games, due to the fact that it's fundamentally a statistical model so the most-discussed game will have the highest influence on the model training. And it might give answers that are good enough for an immediate ruling at the table, but I don't have high confidence that they'll actually be correct.

I think a TTRPG is almost ideal: low consequences to getting it wrong, and if the ruling is right I might still override if it I think it wouldn't be fun at the table. (For my group keeping things going is more important than getting the rules exactly right.)

That said, NotebookLM in particular was great at getting it right in my experience. It did a terrible job when I asked it for ideas on unique magic items the party might find but it did a good job listing existing ones from the PDFs I uploaded that mostly matched what I wanted.

ChatGPT or Gemini do a better job making up random NPCs. I wouldn't use either to make a module, but if it's a situation where I have to improv something when the party did something I didn't expect it's great for getting quick inspiration.

As for how that maps to the world outside low stakes tabletop gaming... actually, there might be more there than I thought; make drafts and take them as starting points.

Dassassin · Mar 20, 2025

bugsbony said:
Is that a fact or an hallucination ?

I wish people would try a bit harder adding words like "I think", "I'm sure", "I doubt", "I read somewhere"... or even post links.

It's a warning and a fact. A caveat before people go dumping documents into the ("Free!") NotebookLM and trying it out for themselves.

Midnitte · Mar 29, 2025

bugsbony said:
Thanks for the link:

Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.

Sure, but I linked evidence that Google lies about what they say they will or won't do. So..

Google has agreed to a $93 million settlement with the California Attorney General’s Office after a multi-year investigation found the company allegedly lied to users by telling them their location data was not collected or stored for targeted advertising.

Gemini gets new coding and writing tools, plus AI-generated “podcasts”

Ars Praetorian

Seniorius Lurkius

Ars Centurion

Smack-Fu Master, in training

Ars Centurion

Ars Praefectus

Smack-Fu Master, in training

Ars Legatus Legionis

Seniorius Lurkius

Ars Legatus Legionis

Ars Centurion

Account Banned

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Centurion

Ars Praefectus

Ars Centurion

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Centurion

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Centurion

Ars Praetorian

Ars Tribunus Militum

nproxy.org