That last line is the most telling thing about the issues with all the AI bullshit they're throwing out there.As other posters have also stated, I was pretty nonplussed at the 'synthetic podcast' functionality, so I took an incredibly dry 116-page procedual manual from my company and chucked it at NotebookLM to see what would happen (public-facing content, nothing sensitive).
The synthetic podcast is... Surprisingly good. The only adaptation I made was for the prompt to be "explain the core elements of the procedures for someone doing the job for the first time". It spat out a 19-minute back-and-forth simulating a female host looking at the procedures for the first time and speaking with a male host who was more experienced and could field questions from her. I'm only about 7 minutes in, but it's been pretty well entirely on target so far (which isn't too surprising as it's providing an overview rather than painstaking technical detail).
I'm still not sure who the target audience would be (I don't think I'd subject new employees at my company to a 19-minute synthetic podcast), but the results are better than I expected.
¯\ (ツ) /¯
Which brings us back to "glorified spell checker."As other posters have also stated, I was pretty nonplussed at the 'synthetic podcast' functionality, so I took an incredibly dry 116-page procedual manual from my company and chucked it at NotebookLM to see what would happen (public-facing content, nothing sensitive).
The synthetic podcast is... Surprisingly good. The only adaptation I made was for the prompt to be "explain the core elements of the procedures for someone doing the job for the first time". It spat out a 19-minute back-and-forth simulating a female host looking at the procedures for the first time and speaking with a male host who was more experienced and could field questions from her. I'm only about 7 minutes in, but it's been pretty well entirely on target so far (which isn't too surprising as it's providing an overview rather than painstaking technical detail).
I'm still not sure who the target audience would be (I don't think I'd subject new employees at my company to a 19-minute synthetic podcast), but the results are better than I expected.
¯\ (ツ) /¯
Is that a fact or an hallucination ?All content in NotebookLM is captured and trained on.
Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.That last line is the most telling thing about the issues with all the AI bullshit they're throwing out there.
They keep heaving that bovine fertilizer at walls, hoping SOMETHING sticks. But so far, for the most part, it's been, "Gosh, that's COOL!" for a few minutes and then the novelty wears off and the flaws (even if few) are found. Yeah, no people involved in the making of that media, but when what comes out is pretty useless overall, it's not even worth the KWH's it took to generate the output.
NKA is the issue. No Killer App. Lots of niche applications, granted. Some of them useful. But the for the trillions (by now, I'd think) blown on this tech, they're getting very, very little in practical returns. For that money relentlessly poured into it, they'd have done better with flesh and blood types, even if it took a bit longer.
The flaw in the reasoning is they only calculate the cost of the comparatively few successes, without considering the time, effort and costs for all the shit that had to be thrown out because it didn't do the job good enough to pass.
I feel this, I just have no interest. All it feels like to me is a way for companies to intrude themselves even further into your life.. if that's possible.It is fascinating watching all this grow and get glued together.
I do feel like this might be my tech breaking point though. The bit where all the younguns use it like water and I'm left shaking my cane at them.
Coding is definitely the killer app. Claude 3.7 is scary good at writing and refactoring complex SQL queries. When refactoring SQL, it’s easy to verify if the output is correct.Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.
As a side note, I'm honestly astonished at how anti-AI the Ars crowd is. I'm reasonably senior with code these days, and while AI doesn't code as well as I do at difficult problems, it definitely helps my productivity a ton. So much of code is boilerplate and trivial API pipes and that kind of stuff, you don't need a particularly capable dev to do it.
Problem is, it's hard to trust the answers. I've just spent twenty minutes trying to use the free version of Gemini to help me with using Point Cloud Library (PCL), which is an open source C++ library for processing geometrical data. I'm not very experienced with C++ (I'm mostly a C# guy) so I find it tricky to parse the type and function signatures of PCL methods to figure out what the actual differences between them are. That's rarely directly called out in the method documentation - all overloads of a given method typically have the same description, so I thought "an LLM should be able to elaborate on this a bit for me".NotebookLM has been absolutely fantastic for quick checking my D&D notes and rules. I don't use the podcast feature but asking it about items abilities or names I forgot has been extremely useful.
Yeah hallucinations are annoying but the insistence on hallucinating is weird. We do a lot of receipt processing, and no matter how I worded the query chat GPT 40 would just make things up on a receipt that wasn't legible. It didn't matter how many or how I told it that it was okay if it couldn't find something or couldn't find anything. I would push back and it would be like " you're right I can't read this " then I would immediately feed it the same receipt and it would make everything up again. It's my least favorite trait of people in the real world in a program. When I go and ask somebody that works at the store where something's located and they don't know instead of just saying I don't know they wander around looking randomly as if I didn't already do that or couldn't do that. Like it's okay not to know something, just say that and let me find somebody who does. Llm is like the worst version of that so farProblem is, it's hard to trust the answers. I've just spent twenty minutes trying to use the free version of Gemini to help me with using Point Cloud Library (PCL), which is an open source C++ library for processing geometrical data. I'm not very experienced with C++ (I'm mostly a C# guy) so I find it tricky to parse the type and function signatures of PCL methods to figure out what the actual differences between them are. That's rarely directly called out in the method documentation - all overloads of a given method typically have the same description, so I thought "an LLM should be elaborate on this a bit for me".
Gemini kept making up method signatures which were completely believable, but not, in fact, actually part of PCL. E.g., saying that PCL would give me an orientation in a different mathematical form from what it actually does. Or telling me that there are three overloads, or one overload, of a specific method which actually had two overloads. And it kept telling me that it had checked the source code and given me the "exact results directly from the source code", while being wrong. Even though I have no doubt that since PCL is a mature open-source project, its documentation and its source code would have both been included in Gemini's training materials, and I suspect should also have been included directly in RAG data for my questions.
I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability, or similar errors), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.
That said, you're more likely to be successful with D&D questions than with questions for more niche games, due to the fact that it's fundamentally a statistical model so the most-discussed game will have the highest influence on the model training. And it might give answers that are good enough for an immediate ruling at the table, but I don't have high confidence that they'll actually be correct.
It would have to be better than Joe Rogan, right?Does anyone actually want to listen to an AI postcast (I mean for anything beyond the novelty of it)? I can't imagine going for a walk and just casually listening to a computer talk to itself. I feel like I would hate it and I think I would hate it even more if it was actually a good podcast lol.
I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.
According to Google they don't, but... I'd be lying if I said I trusted Google.Is that a fact or an hallucination ?
I wish people would try a bit harder adding words like "I think", "I'm sure", "I doubt", "I read somewhere"... or even post links.
I agree on all points. During discussions at work Ars actually comes up as the old curmudgeons who are seemingly against everything AI. The age of the users on this site probably plays a role in it as I'm sure it skews older, but it's pretty funny to hear people who've never mentioned Ars bring it up just for this.Agentic coding assistants (Cline for example) have been a killer app for me. They increase my productivity significantly because a lot of coding is frankly trivial, and I can always make the important architectural decisions and jump in to code the difficult bits when the AI code fails to pass my code review, so to speak. It's not cheap, but I get paid enough that doubling my productivity for the cost of LLM inference is easily worth it.
As a side note, I'm honestly astonished at how anti-AI the Ars crowd is. I'm reasonably senior with code these days, and while AI doesn't code as well as I do at difficult problems, it definitely helps my productivity a ton. So much of code is boilerplate and trivial API pipes and that kind of stuff, you don't need a particularly capable dev to do it.
Thanks for the link:According to Google they don't, but... I'd be lying if I said I trusted Google.
Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.As a Google Workspace or Google Workspace for Education user, your uploads, queries and the model's responses in NotebookLM will not be reviewed by human reviewers, and will not be used to train AI models.
I'm glad to know that you have similar experiences! I'm also perhaps mid-career by tech standards, but I'm in academia. A lot of my examples are similar; for example, if I'm doing research, I'd write a ton of quick prototypes as Jupyter notebooks. At some point if the project takes off I might want my code to be refractored into a Python module or a script (depending on purpose and size); with Claude as long as I specify a reasonable structure for the codebase, this can be done in minutes instead of hours. I think there's a time when you're senior enough that your time is worth a lot and you don't want to waste time doing these things a junior can do, but not senior enough to have a team of juniors to call upon (and even if you did, they're more expensive than a few API calls). I typically use Cline + Sonnet 3.7 but since we use the same underlying model I imagine it's comparable.I agree on all points. During discussions at work Ars actually comes up as the old curmudgeons who are seemingly against everything AI. The age of the users on this site probably plays a role in it as I'm sure it skews older, but it's pretty funny to hear people who've never mentioned Ars bring it up just for this.
I think that a lot of software engineers forget that most coding just isn't all that consequential, in the grand scheme of things. I have a good friend who writes firmware for avionics systems for small aircraft-- that stuff needs to be 100% bulletproof, tested for years, and generally has to be the best of the best.
But especially in the age of rapid re-development of web apps, lots of it just doesn't matter as much as people like to think it does. I have been using AI agents (Cursor + Claude 3.7) for a while now and I'm well aware that what I get out of them isn't always the most efficient, the most DRY, the prettiest, etc. But it works really well for a fairly straightforward web application, and when it comes time to replace, rewrite, or refactor it in 18 months nobody is going to care all that much that it could have been more perfect if only it was written by hand by a person.
Or another example-- I needed a page built, with a handful of API calls to an existing internal system, that was basically a developer-facing test page to help with a whole bunch of diagnostics and component demos. It probably would've taken me a full day to build it from scratch, but Claude had a very workable solution in under a minute. No security to worry about, no over-optimized UI/UX that had to be run past our Product team, just a utility. It let me cruise forward with the rest of my work and massively helped out with the other tasks in the sprint.
We check and audit what goes into our codebase still and we don't just blindly accept everything that the AI produces, and as mid-career developers we're pretty aware when it starts spitting out garbage. But to hear people say that all AI coding agents are just a waste of compute cycles is pretty short-sighted.
I missed the bit "and will not be saved, stored, or retained on any server operated or controlled by us or any of our agents. Nor will your data be passed through to any third party without their specific court-ordered request"Thanks for the link:
Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.
I think a TTRPG is almost ideal: low consequences to getting it wrong, and if the ruling is right I might still override if it I think it wouldn't be fun at the table. (For my group keeping things going is more important than getting the rules exactly right.)[...]
I'd expect that in a TTRPG context, the same behaviour will likely manifest as quoting rules from other editions, or even similarly-named mechanics from completely different games (e.g. quoting Pathfinder or Warhammer Fantasy rules when asked questions about a similarly-named D&D monster ability), or quoting rules that are actually a sort of averaged-out hybrid from multiple different games.
That said, you're more likely to be successful with D&D questions than with questions for more niche games, due to the fact that it's fundamentally a statistical model so the most-discussed game will have the highest influence on the model training. And it might give answers that are good enough for an immediate ruling at the table, but I don't have high confidence that they'll actually be correct.
It's a warning and a fact. A caveat before people go dumping documents into the ("Free!") NotebookLM and trying it out for themselves.Is that a fact or an hallucination ?
I wish people would try a bit harder adding words like "I think", "I'm sure", "I doubt", "I read somewhere"... or even post links.
Sure, but I linked evidence that Google lies about what they say they will or won't do. So..Thanks for the link:
Seems good enough for me. You're of course free to not trust them, but I think it's important to know that they specifically say they won't. And I don't like people simply stating they will as if it was a fact.
Google has agreed to a $93 million settlement with the California Attorney General’s Office after a multi-year investigation found the company allegedly lied to users by telling them their location data was not collected or stored for targeted advertising.