while (true) {

MilleniX

Ars Tribunus Angusticlavius
7,273
Subscriptor++
What's the best modern way to search for the minimum output of f(x,y,z,a,b,c, . . .)? IOW, optimize a multi-input black-box function for a single output?
Back in college I did it by writing my own evolutionary search code, but from what I understand there are now AI-like tools that I can use for problems like this without having to write everything from scratch.
What is out there that I should look into for that? .NET/C# based would be ideal, but I can adapt if the proper tool uses a different language.
Do you know anything at all about your function f? Even the expected dimensionality of the input (good methods differ between lower and higher dimensions)? Smoothness? Some starting-point input vector that's believed to be in the right neighborhood? Broadly, derivative-free optimization is a huge field. If it really is an arbitrary function, you're more or less stuck with a lot of evaluations no matter how you swing it. If it's not decently convex, then you're basically going to have to evaluate over the entire domain of valid inputs, since any local minimum you find might be arbitrarily worse than the global minimum. Do you even need the global minimum, or just something that's locally 'deep enough'?
 

svdsinner

Ars Legatus Legionis
15,131
Subscriptor
Do you know anything at all about your function f? Even the expected dimensionality of the input (good methods differ between lower and higher dimensions)? Smoothness?
I know a reasonable range for all the inputs. The results, in general will be smooth, except certain combinations will spike the results to be incredibly bad. (Typical results are 0-10, with 0 being the perfect unicorn value. Bad results might be in the hundreds)
Some starting-point input vector that's believed to be in the right neighborhood?
Yes. I've been able to futz around to get a few good guesses to start with.
Do you even need the global minimum, or just something that's locally 'deep enough'?
Deep enough. There are also some non-controllable variables that the variables we can control are designed to handle. (IOW, it is a control system that needs to be tuned to have a broad range of uncontrollables that will still result in a solution of < 1.4. Analogous to optimizing all the factors in a car suspension where it needs to work across all sorts of things the driver will do.)

Hopefully that gives you enough without blurring the issue with too many details.
 

MilleniX

Ars Tribunus Angusticlavius
7,273
Subscriptor++
Yes. I've been able to futz around to get a few good guesses to start with.
Oh, is this just a one-time, or development-time-only thing, rather than the optimizer being part of the product? If so, that means there's much less need to pick an approach robust/fast/cheap enough to run repeatedly in production usage.

If you can at all afford to do lots of evaluations of f, then you could start with sampling over a regular and/or quasi-random set of points, and use the better outputs as either stopping points if they're good enough, or as the starting points for annealing or gradient descent approaches.

Also, intellectual responsibility requires me to ask - you describe f as a black box function, but do you have any way of directly computing derivatives of f with respect to its inputs? If so, the range of methods available gets much better and cheaper, because you can then also directly follow real gradients, rather than having to estimate them numerically based on finite differences.
 

svdsinner

Ars Legatus Legionis
15,131
Subscriptor
Oh, is this just a one-time, or development-time-only thing, rather than the optimizer being part of the product?
It will be a one time thing that only would need to be recomputed when I make changes to the function.
If you can at all afford to do lots of evaluations of f, then you could start with sampling over a regular and/or quasi-random set of points, and use the better outputs as either stopping points if they're good enough, or as the starting points for annealing or gradient descent approaches.
Evaluating the function is medium expensive, each evaluation takes 30-90 seconds. Too expensive to brute force it, but cheap enough to evaluate it a few hundred times.

Ideally, there are premade tools that I can handle all of the logic to manipulate the inputs to find the best logic. I can write it myself, but I suspect that would be reinventing the wheel.
Also, intellectual responsibility requires me to ask - you describe f as a black box function, but do you have any way of directly computing derivatives of f with respect to its inputs?
No. I can only do finite evaluations and compare them to other finite evaluations.
 

ShuggyCoUk

Ars Legatus Legionis
10,174
Subscriptor++
I spent a rainy hour in a cafe writing some code to generate double word squares from 5 letter words, e.g where every row and column is a real word...
Code:
TRAMS
REBEL
AFIRE
CEDED
TRESS

My first try never found a square, so I rewrote it using what I thought was a clever algorithm that got it down to 1-3 minutes. Then I randomly woke up this morning with a flash of inspiration (totally obvious in retrospect) and now it finds 5-letter squares in 50-200ms, so almost a 1200x speedup!

It's quite a fun challenge to spend an hour or two on, but if you know your data structures you'll probably get there a lot quicker than I did.
I assume a prefix Trie so you can eliminate candidates early. Couple that with highly bit packed representatives to keep memory traffic low and caches hit?
 

ShuggyCoUk

Ars Legatus Legionis
10,174
Subscriptor++
I'll point out that if you're defining f, then you could probably write it using something like (Py)Torch or Jax, to get automatic differentiation for free.

If you want to try out packaged libraries that might do the optimization process for you, maybe start with what's provided in SciPy and/or scikit-learn.
This. The gradient descent optimisers available to everyone for free now are just amazing.
If it’s differentiable of course.
 
  • Like
Reactions: AndrewZ

Aleamapper

Ars Scholae Palatinae
1,390
Subscriptor
I assume a prefix Trie so you can eliminate candidates early. Couple that with highly bit packed representatives to keep memory traffic low and caches hit?
Yeah a trie was my initial thought, but it's not what I'm using as I had an idea that was simpler - a trie might be quicker though! I put some basic effort into making things cache friendly but tbh once I got the 1200x speedup I kinda left it there. The bit that took most time was trying to get the squares 'random' enough looking without affecting performance, as certain words seem to show up too often, but perhaps that's just because they constrain future words the least.

I could also do an exhaustive search of puzzles to filter words from the dictionary that never get used in a square, speeding things up even more!
 
Last edited:

Vince-RA

Ars Praefectus
5,058
Subscriptor++
Another day, another ridiculous LLM hallucination that seems like it should be trivial to avoid. I use GitHub Copilot quite a bit to help with writing boilerplate crap for Terraform. Mostly it works great, recognizes my naming conventions, etc and saves me a lot of busy work. The other day I asked it how to create a resource of type Microsoft.VirtualMachinesImages/imageTemplates using the azurerm Terraform provider, and it confidently told me in great detail how to use the azurerm_image_template resource.

Which is great, except the azurerm_image_template resource doesn't exist. At all. When I called Copilot on its bullshit, it did apologize for the confusion (lol) and suggested a valid workaround, but it's hard to believe we're still having to sort through stuff (often, too) that could easily be demonstrated as false with about 2 seconds of web searching.
 
  • Haha
Reactions: svdsinner

svdsinner

Ars Legatus Legionis
15,131
Subscriptor
I spent a rainy hour in a cafe writing some code to generate double word squares from 5 letter words, e.g where every row and column is a real word...
Code:
TRAMS
REBEL
AFIRE
CEDED
TRESS

My first try never found a square, so I rewrote it using what I thought was a clever algorithm that got it down to 1-3 minutes. Then I randomly woke up this morning with a flash of inspiration (totally obvious in retrospect) and now it finds 5-letter squares in 50-200ms, so almost a 1200x speedup!

It's quite a fun challenge to spend an hour or two on, but if you know your data structures you'll probably get there a lot quicker than I did.
One major performance trick that tremendously speeds up a lot of solutions to word problems is to give each word a 32-bit value where bit 1 indicates if there is an A in the word, bit 2 indicates if there is a B, etc. This can make checks to see if a word contains a certain letter or if two words share letters a lightning fast bitwise OR or bitwise AND. If your algorithm needs to compare letters between words, this can speed thing up by orders of magnitude. Of course, you will have to decide how words with multiple letters will impact this trick.
 
  • Like
Reactions: Aleamapper

Zich

Ars Tribunus Militum
2,533
Subscriptor++
Which is great, except the azurerm_image_template resource doesn't exist. At all. When I called Copilot on its bullshit, it did apologize for the confusion (lol) and suggested a valid workaround, but it's hard to believe we're still having to sort through stuff (often, too) that could easily be demonstrated as false with about 2 seconds of web searching.
I asked for a powershell script that wraps a command line utility that is not readily publicly available. I figured maybe it still had the info due to everything being on the internet somewhere. But nope, instead it confidently hallucinated how it works that is entirely wrong.
 
I asked for a powershell script that wraps a command line utility that is not readily publicly available. I figured maybe it still had the info due to everything being on the internet somewhere. But nope, instead it confidently hallucinated how it works that is entirely wrong.
You can try 2 solutions:

You provide a fully fledged description of the tool and its interfaces and arguments.
Have it use something it should know about, and make appropriate adjustments as necessary.

A miss on LLM is they are substantially not allowed to not answer the question. If you're using a reasoning model you can provide a description of how to not do that, by having it ask for you to refine its needs, even that only works some of the time.

This is pulled from elsewhere:

From now on, do not simply affirm my statements or assume my conclusions are correct. Your goal is to be an intellectual sparring partner, not just an agreeable assistant. Every time I present an idea, do the following: 1. Analyze my assumptions. What am I taking for granted that might not be true? 2. Provide counterpoints. What would an intelligent, well-informed skeptic say in response? 3. Test my reasoning. Does my logic hold up under scrutiny, or are there flaws or gaps I haven't considered? 4. Offer alternative perspectives. How else might this idea be framed, interpreted, or challenged? 5. Prioritize truth over agreement. If I am wrong or my logic is weak, I need to know. Correct me dearly and explain why. Maintain a constructive, but rigorous, approach. Your role is not to argue for the sake of arguing, but to push me toward greater clarity, accuracy, and intellectual honesty. If I ever start slipping into confirmation bias or unchecked assumptions, call it out directly. Let's refine not just our conclusions, but how we arrive at them.

Try with something like that and see what it responds with. I used that after I back and forth on a circuit design using components that would be convenient for me to use that are suboptimal, although not unreasonable. The responses from then on were better.
 
  • Like
Reactions: Pino90

ShuggyCoUk

Ars Legatus Legionis
10,174
Subscriptor++
One major performance trick that tremendously speeds up a lot of solutions to word problems is to give each word a 32-bit value where bit 1 indicates if there is an A in the word, bit 2 indicates if there is a B, etc. This can make checks to see if a word contains a certain letter or if two words share letters a lightning fast bitwise OR or bitwise AND. If your algorithm needs to compare letters between words, this can speed thing up by orders of magnitude. Of course, you will have to decide how words with multiple letters will impact this trick.
With only 5 letter words with A-Z you can do a simple 6 bits per letter and stay in a 4byte word. This makes the calculation harder, but prefix storage simpler.
A big lookup table of every prefix becomes tenable without a trie too. Possibly per location indexed. It’s a load of memory but not as much as it used to be.

Utterly doesn’t scale though
 

hanser

Ars Legatus Legionis
42,130
Subscriptor++
I'm getting closer to getting back in the market, and I keep seeing headlines like "AI Skills Earn Greater Wage Premiums Than Degrees" and can't help but wonder if I should really deep dive into the current bubble bullshit or stand my ground as an old-school (but relatively current) Java dev.
The headline is ambiguous as to whether it means "build AI enabled systems" (like a RAG thing) or "can use LLM competently for various tasks".

I'd probably spend a day or two building a simple RAG thing that uses a commodity LLM for vectorization. There are various tutorials around. Then I'd probably see if I could convert the RAG system using the commodity LLM to a Ollama-based, locally-running LLM just for funsies. That gets you most of the way towards what developers mean when they say they have experience with LLMs.

If you're building the LLMs directly, well that's a completely different niche.
 

dredphul

Ars Praefectus
5,943
Subscriptor++

hanser

Ars Legatus Legionis
42,130
Subscriptor++
Arguably the hardest part of RAG is text extraction from an existing corpus, and then RBAC on top of search results. I think RAG as a product you buy is largely a dead end product, unless you're Google, Slack, Atlassian, or Microsoft where you have LLMs-as-a-service already, and you can think hard about security boundaries and how they intersect with your existing permissions model.

IMO, anything with an LLM pretty much just becomes "immediately" commodity because LLMs are commodity except for the compute cycles required to run them.

Put another way: if an organization uses your platform as an IdP or has baked your platforms permissions into their organizational onboarding processes, you probably have an advantage. Otherwise don't bother.
 
Last edited:
  • Like
Reactions: ShuggyCoUk

dredphul

Ars Praefectus
5,943
Subscriptor++

hanser

Ars Legatus Legionis
42,130
Subscriptor++
I have written so much code lately that I can't remember if I've started, let alone finished, whole new features. I literally have to go check by looking at the Swagger docs for the services in production.

🫠

The upside is that this work is on track to double revenue from our top-of-funnel and existing customers. It's just a little crazy to think of how fast it was all created. Earlier in my career, I feel like I could remember every feature that took longer than a few hours to implement. Now I'll work on something for two weeks, and the moment it's tested and released, it's flushed from my brain so I can work on the next thing.

--

Kinda related, but in November of 2021 when I started writing the program that would become the main analytics component for the company I distinctly remember looking at the empty page and thinking "I'm either completely wasting my time, and this company is going to fail, or I'm about to write the most valuable program I've ever written, and am ever likely to write." I definitely sat with that thought for a few minutes before getting going. My last bit of self-talk in that moment was "Better not fuck it up then".
 

gregatron5

Ars Legatus Legionis
11,767
Subscriptor++
= Retrieval-Augmented Generation

There's plenty of companies claiming to make using RAG cheaper/easier in exchange for monies.

AWS has a decent explanation without trying to sell you too hard on their system: https://aws.amazon.com/what-is/retrieval-augmented-generation/
Thanks. Coincidentally the TLDR newsletter had a link to a local RAG setup blog post, which I followed. I already had Ollama, but it walked through setting up Elastic Search and Kibana in Docker and setting up a connector to Ollama. Still need to read up more on the e5 thing.

I also incidentally learned that on macOS Ollama runs on GPU/NE when run locally, but on CPUs in Docker. Easy way to make your AS MBP feel like an Intel MBP is to query an LLM on CPUs. Toasty!
 

Quarthinos

Ars Tribunus Militum
2,424
Subscriptor
It's amazing what programming knowledge is buried in my brain. I was showering this morning and remembered that 'poke 53774,64' did something useful in Atari BASIC. I knew that poke 16,64 was also part of the incantation. Google lets me know that it disables the escape key. Not really super useful, but it at least stops randos from breaking your carefully crafted program before it's complete.

Now I'm wondering if it's worth the time to try and figure out what exactly setting those two bits high does at the hardware layer.
 

Aleamapper

Ars Scholae Palatinae
1,390
Subscriptor
It's amazing what programming knowledge is buried in my brain. I was showering this morning and remembered that 'poke 53774,64' did something useful in Atari BASIC. I knew that poke 16,64 was also part of the incantation. Google lets me know that it disables the escape key. Not really super useful, but it at least stops randos from breaking your carefully crafted program before it's complete.

Now I'm wondering if it's worth the time to try and figure out what exactly setting those two bits high does at the hardware layer.
I was watching this video the other day about the recent C64 demo Nine, which shows nine sprites on a single scanline, outside the borders even - something 'impossible' on the C64 as it only has 8 sprites, and multiplexing doesn't help - and ended up going down a rabbit-hole of C64 dev videos. All those memory addresses from messing with Basic as a kid, for border colours, sprite registers, disabling run/stop etc came flooding back.

View: https://www.youtube.com/watch?v=MXxSPgt_7Z4


I really want to get into demoscene coding on the C64, but summer is the wrong time!