Anonymous
Not logged in
Talk
Contributions
Log in
Request account
Rest of What I Know
Search
Editing
Blog/2025-12-11/LLMs Excel At Easy Verification Problems
From Rest of What I Know
Namespaces
Page
Discussion
More
More
Page actions
Read
Edit
History
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
In [[Blog/2025-12-01/Grounding Your Agent]] I talked about how grounding your agent allows it to make better decisions. This is akin to the approach you would take if you were to debug code. The core device in debugging is the structure of the discovery loop. To reproduce the issue, we go through a loop that looks like: # Enter loop # If condition true, reduce example # Else terminate The end of this provides a [[wikipedia:Minimal Reproducible Example|Minimal Reproducible Example]] (MRE) that you can then use to perform your debugging loop. In the classic case of a regression is the bisection loop on your codebase using the MRE to ground your search. Using LLMs to retrieve data from their memory will succeed in many cases, it's true. But that's a seductive (and false) god. The majority of an LLM's knowledge is there as a substrate for its intelligence and ability to reason. Much of it can be retrieved, but between the knowledge cutoffs and the fact that they imperfectly recall things, a better use of the LLM is to use them as reasoning agents without knowledge. I suspect this is the primary reason for the high variance between different people's experiences with LLMs. Some, like me, use them both in attended and unattended mode to write code. Others write them only with high attention, and still others find that they generate code that's useless to them. In my experience, all the people I know who talk about using LLMs successfully in semi-attended modes use them after they have transformed the problem into a [[Checkables|checkable]]. LLMs are pretty good with working with a yes/no answer. The primary response from my mesh checker script was just an affirmative or negative on whether the generated mesh was an appropriate solid without self-intersections. That was sufficient for the LLM to iterate on things. In fact, it is not whether or not the problem is [[Tractables|tractable]] that matters so much as whether the problem is a [[Checkables|checkable]]. Certainly, in the extreme, no LLM is going to quickly solve your [[wikipedia:Travelling salesman problem|TSP]] faster or better with just a "Is this the fastest?" check, but there are a large class of problems where you can easily check at the end whether or not the solution is any good. LLMs are great at this. And particularly, if the problem is a checkable, you can often write a program that verifies a solution, then use the LLM in a loop to generate solutions based on that feedback. {{#seo:|description=LLMs excel at easy verification problems, but their knowledge limitations make them better used for reasoning than retrieval.}} [[Category:Blog]] [[Category:Technology]]
Summary:
Please note that all contributions to Rest of What I Know are considered to be released under the Creative Commons Attribution-ShareAlike (see
Rest of What I Know:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Wiki tools
Wiki tools
Special pages
Page tools
Page tools
User page tools
More
What links here
Related changes
Page information
Page logs