Hacker News
18 days ago by bottlepalm

I've hit this point with AI where it's not a simple process, but a long drawn out back and forth.

I'll use AI to design the implementation of a medium sized, cross cutting feature. Review all the details, maybe iterate on just that. Then implement with Claude 4.7 Max - which runs slower, but does a better job. Then review the implementation, then have Codex GPT 5.5 xhigh fast review it - which almost always finds corner cases. Have Claude fix those - Claude is better at writing intuitive maintainable code versus Codex overengineered/shortcut filled code. (Codex is better at finding/fixing bugs and doing reviews - it's annoyingly pedantic)

Then repeat with fresh Claude/Codex instances having them both review the current staged changes and getting feedback, handling the feedback. Then covering it in tests. I mean overall I still implement the feature faster than coding it manually, but I spend a majority of the time going back and forth with reviews, handling corner cases and at the finish end up with what I feel a really solid implementation of whatever feature I'm working on. The v1 feature feels more like a v3 given the amount of iteration it already went through.

18 days ago by aomix

Talking the problem to death with the AI before implementation is a nice zone for me. I feel productive, get good results out of the AI, and still largely understand the code. That’s the part of the AI revolution that I feel has made me a better engineer because I argue about design and architecture all day with a robot.

18 days ago by throwaway7783

I follow the same process. I have a design in mind for the problem at hand, but I don't reveal it to Codex. I go back and forth a bit to see if its proposals are better than mine. I go back and forth on tradeoffs of various approaches. And then I ask it to compare its proposals with mine. I "win" most of the time but there are many times where it shows a me a better, or simpler approach, or makes me rethink the solution altogether.

Once this is done, the mechanical coding parts are mostly routine (for codex)

18 days ago by a_bonobo

I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.

18 days ago by yread

> I go back and forth a bit to see if its proposals are better than mine

I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less

18 days ago by hackermanai

I think this approach is more common than the hype for actual work. I do something similar, many back and forth, then settle on something often with now known tradeoffs, written by hand to spot issues as a final guard/ keep consistent naming etc.

18 days ago by revv00

i bet you've contributed a lot of training trajectories for those AI's.

18 days ago by mikepurvis

Despite the cynical sibling reply, I also feel like there's real value here. Contrary to the meme, I don't think Claude just tells me I'm brilliant, but really does push back on directions that are unproductive, helps identify when a part is overcomplicated or a dependency has become redundant, etc. Those are important things to have at least a sightline on before getting too deep into the code, even (or maybe especially) in a world where an awful lot of code can be created basically for free.

18 days ago by noduerme

I'm usually the one spotting redundancies and dead branches in Claude's code, not the other way around. But I think either way, what's important is questioning the process and understanding the way the code is working so that you retain a full mental model.

18 days ago by lintfordpickle

>> and still largely understand the code [...] ,that, I feel has made me a better engineer

the cynic in me would say that a good engineer should fully understand the code you write.

I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.

How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?

A.I makes it easier to get yourself into problems early on.

18 days ago by jnovek

> How can you reason about code you don't fully understand?

We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.

18 days ago by bottlepalm

One strategy I use in the planning phase is even when I know how I'd implement the solution, I ask the Claude/Codex how they would solve the problem or implement the feature without giving them any clues - and then compare their solutions to my own. Often I am pleasantly surprised by alternative ways of doing things and ideas that we integrate into the final design.

18 days ago by didericis

Same. I've been creating "research" documents where I let it do a freeform survey of possible solutions/have sketch out it's own solution. I'll then sketch out a plan based on what I think is good or what I think it missed, and then I'll have it interrogate me for a final PRD document. It then implements the feature in reviewable chunks, and I'll give it feedback or tweak the PRD doc as needed.

Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.

18 days ago by ddp26

I like this, though it does leave me feeling more nervous when I really don't know how I'd solve the problem, still requires trust.

18 days ago by scosman

yes exactly. Too many people ask AI to one-shot complex tasks, and wonder it behaves like a junior asked to rush something.

I have my own skill: 5 rounds of research/planning/test-planning. Interactive with me in loop for all important decisions. Starts with high level shape, then details. Planning can take 2-3 days of my time, then the implementation agent can take many hours (Opus 4.7). It splits the implementation across many phases/commits, each with its own code-review fix loop. Deep code review at the end can take another hour or two. It opens a PR, Gemini reviews, it reads out and resolves those issues.

Projects still take days or weeks, but 5x faster than doing it all myself.

Edit: the skill - https://github.com/scosman/vibe-crafting

18 days ago by atomicnumber3

"yes exactly. Too many people ask AI to one-shot complex tasks, and wonder it behaves like a junior asked to rush something."

Because this version of AI is worth 10 trillion dollars.

While the pragmatic versions from realists you can find all over this thread are ultimately probably less of a speed boost than just having your CEO/local micromanager be conveniently on vacation during critical periods when the work actually gets done.

18 days ago by bsaul

"Because this version of AI is worth 10 trillion dollars."

i wonder how much the real version of AI is worth. I've got a hinch we're going to find out pretty soon.

18 days ago by 59nadir

My personal experience with trying to front-load tons of planning and speccing out with LLMs is that at best it's a small improvement on code quality but with considerably more time spent.

As a result I've abandoned the idea of having LLMs generate code except for very small, localized and tightly scoped things. They really can't produce much more than a function or a small module without shitting the bed (last time I vibecoded was with Opus 4.6, Composer 2 and GPT-5.4). I use it almost entirely as another signal in analysis, which naturally makes it fit in better because all the other signals (reading the code, stepping through the code, writing the code myself) are already there so when the LLM points things out the information it actually renders can be taken in much more easily (and seen through more easily when it's false or irrelevant).

I think it's neat that people find fun ways to develop, but I think dressing up vibecoding in a fancy dress and layering SpecLang, sometimes in multiple steps, on top of it, is an exercise in trying to use the tool more instead of trying to use it in its most useful capacity.

18 days ago by abalashov

I expect you'll be told to try Opus 4.7, and in short, JuSt WaiT FoR ThE NexT MoDel, BRo.

This has been my experience every time I've suggested that there are any sort of inherent ontological/conceptual or computational limits to the sophistication of LLM mimicry.

18 days ago by dawnerd

Even fully planned it’s still no better than a junior dev. You’re leaving out how much back and forth you have the ai do on itself, which you’d have on a junior dev too. In the end does it matter if it’s giving you what you want? Guess not really. But let’s not act like it’s crazy good when you’re still doing a lot of rounds of revisions on something an experienced dev would know to do right the first time.

16 days ago by notgenerated

I think that in general, people need to understand that they need to invest most of their time in the planning phase. High level plan, then spec are the baseline imo

18 days ago by dawnerd

When I use ai to code this is pretty close to my workflow too but I find it ends up taking at best just as long as if I were to write the code myself. If m some cases I’ve thrown away what the ai has done and just done it myself. I think that’s just a skill people need to learn - at a certain point you have to cut your losses. I’ve seen some coworkers argue back and forth with an llm trying to get it to do something. Especially true on simpler changes.

18 days ago by theK

I've stumbled upon that too! Funnily I see it having two forms:

1. Some bad idea gets embedded into the context that you just can't argue away

2. Some important idea gets lost in compression and the ai wheres off into funland without recourse.

In both cases if is often better to start over or just do it yourself. I sometimes find myself asking for a summary, editing it and then using the edited one to seed a new session.

Edit: s/Finland/funland/

18 days ago by skydhash

That sounds too much like three weeks of work saving you three hours of planning.

In my experience, software engineering is a matter of knowledge. Understanding it and then coming up with a solution. The latter is a flash of insight that comes mostly from experience. Then you gather more information to flesh it out, or brainstorm it with your colleagues.

What you're describing sounds more like a ritual of doing busy work than anything practical. Because tasks vary so much. A feature may be huge, but you take care of it in a day with copy pasting because you already have all the building blocks in other files. And something may be twenty lines of code, but you spent the whole week sweating on it (concurrency stuff maybe). Those ritualistic workflows sounds more like someone imagining software development than actually doing it.

18 days ago by bottlepalm

A lot of people say you need to go through at least three versions of something before it is mature - and v3 is not something you can design upfront. You need to see v1 both in code, and at runtime. Use it, get the feedback, and iterate. This is where AI tightens that loop immensely.

Lost you in the last paragraph - features are not "copy pasting because you already have all the building blocks" and "something may be twenty lines of code". Mid sized features often mean tearing up many layers of code across the stack to add in some sort of new capability. Tearing up existing code means there are all sorts of add-on considerations in addition to feature you are working on.

18 days ago by habinero

> Mid sized features often mean tearing up many layers of code across the stack to add in some sort of new capability

What? No, it shouldn't. I've worked on a lot of codebases and if you have to do this, something is very, very wrong.

18 days ago by TACIXAT

This article doesn't address writing code with AI, just code review. My issue with agentic coding is that I make numerous micro-architectural decisions while programming. I almost never have a full spec up front and develop one as I consider what I am writing.

When using Claude Code or Codex, that is all gone. Claude Code is extremely eager to reach the end goal to the point that it feels like a fever dream to write code with it. In the end, I have low confidence about edge cases and fit into the project's architectural and design goals.

On top of that, I enjoy programming, reverse engineering, etc. and I feel that the LLMs, while able to solve some problems or deliver some features, take that fun away. I'm trying really hard to find a workflow with them that I'm confident in, but I fear that workflow is just chat, search, and being a rubber duck for my thoughts.

18 days ago by HyperL0gi

> This article doesn't address writing code with AI, just code review. My issue with agentic coding is that I make numerous micro-architectural decisions while programming. I almost never have a full spec up front and develop one as I consider what I am writing.

working with AI forced me to write better specs but the way I write today is very different. I typically open Codex and have Linear MCP connected where my chat with the AI will end up writing the issue. Its a lot of back-end-forth where I tell what I want, the AI does all the code scanning, write something, I correct something, etc

The value for me is exactly that I tell what I want, the AI verify in the actual code if that's the path that makes more sense or not. In the end I have a pretty detailed spec that I'm much more confident is the correct path.

I find the spec easier to review than a huge PR so typically when executing is much faster and aligned with what I want.

The grill-me skill from Matt Pocock is great for this (https://github.com/mattpocock/skills/blob/main/skills/produc...)

18 days ago by zahlman

> but I fear that workflow is just chat, search, and being a rubber duck for my thoughts.

That's still a lot of benefit, though. I have to agree with Patrick McKenzie on this one (https://x.com/patio11/status/2058631943785488815):

> If the only impact of LLMs professionally was causing people to "think out loud" in a way which was routinely captured by computer systems and then could be operated on by computer systems, that would by itself be one of the most consequential changes in practice in 100 years

18 days ago by aakresearch

> I fear that workflow is just chat, search, and being a rubber duck for my thoughts

This is exactly what I settled upon after my own trying really hard. It is liberating, I have no fear at all!

18 days ago by teaearlgraycold

A lot of programming work is well represented in the training data. For that kind of stuff there’s not much to do regarding architectural decisions. I love to run the LLMs on auto for that work. But for anything not well represented in the training data, which could be anything from mundane stuff in PyQT or a truly novel application, keep them on a short leash or forget them altogether.

18 days ago by redeye100

> represented in the training data

This isn’t a binary is/isn’t thing though. What if only 80% of my task is, how would I know that the other part isn’t, if I haven’t worked it through fully

What if my task is generally represented, but for my specific context, there are specific details that aren’t?

How would I know until I’ve reasoned through it myself? At that point having the LLM do the work doesn’t add much value

18 days ago by justinlivi

I find myself spending on average more time in LLM review/resolution loops than it would take for me to write the code by hand. Partially because once I'm in the flow I write very very quickly and the code pours out sometimes faster than I can write. But also because the LLM code on the first few tries is generally really really bad. What I find interesting though is that spending the time to personally review and direct the LLM through several iterations of review and revision on average results in higher quality code written in about the same time as I would have written it. This might be particular to me, but seeing several interations of someone else's code helps me better understand holistically my objective as opposed to whatever happens to come out of my flow-state consciousness.

18 days ago by zahlman

>the code pours out sometimes faster than I can write.

Meaning that you type the code faster than you would normally type prose? Or just what?

17 days ago by mordv

Meaning the code comes to you out of nowhere faster than you could write it down. At peak performance, there is no you in the process, you just not preventing it from happening. That's what flow state is. The more skills, the better tools you have (such as typing speed, fine tuned IDE) the longer you could stay there.

18 days ago by youre-wrong3

It’s weird how insecure people are on HN that they need to downvote and flag comments when their feelings are upset. Instead of learning and accepting that they are wrong and there is room for improvement. They close their eyes plug their eyes and scream until the comments are flagged.

18 days ago by youre-wrong3

[flagged]

18 days ago by wavemode

What an arrogant comment. You have no idea what kind of software the parent commenter is working on. If you think all software can be handled by AI then I'm afraid you're the one who doesn't know what they're doing.

18 days ago by youre-wrong3

People come up with excuses all the time trying to claim their special little project won’t work with AI. Ohhh there’s no training data. Ohh there’s no manual. Oh no digital instructions. Always turns out to be a user issue. People make these claims because they want to feel special. They want to believe no one else can do their job.

18 days ago by youre-wrong3

[flagged]

18 days ago by deviation

I manage a component of an internal compute product which serves ~a billion idempotent use-cases per quarter and I can confidently tell you that you're incorrect.

What I haven't been able to teach AI is the full distributed nature of the system, how we progressively roll out each service (about ~30 unique ones) when we push updates -- and how to read, write, and review my code while keeping all of this in-context (because believe me, if it's not in-context, it is useless to me). Don't get me started on all the containers, K8s configs, endpoint naming conventions...

My entire stack covers bare metal, virtualisation infrastructure, storage infrastructure... I could go on. At a certain scale, it doesn't matter how fast you write something, but if what you're writing is bulletproof.

18 days ago by youre-wrong3

You literally just proved my point.

18 days ago by janalsncm

Or, if we consider the fact that an LLM’s performance depends on the task’s similarity to others in the training set, it could be that one person is doing a fairly novel task and another is doing something very well represented in online code.

18 days ago by zik

If your AI is writing bad code then you need to change your AI. No current high-end AI should be producing bad code.

18 days ago by redeye100

This sounds like a subjective assessment. I counter with the opinion that most LLMs write technically correct, but bad code. When I read it, it makes me want to gag or poke my eyes out. I spend a lot of time wondering about what kind of person would write it like that, then I realize it’s an LLM

18 days ago by singingtoday

The tool is important but then so it's the way you use it. I've seen small LLMs produce good code and frontier LLMs produce poor quality code. Depending on context..

18 days ago by flexagoon

This is delusional. Opus 4.7 regularly produces pretty bad code.

18 days ago by newsicanuse

Source trust me bro.

18 days ago by crabmusket

The linked article about getting LLMs to critique each others' code review[1], the magpie tool[2], and also this recent article from Cloudflare about their code review stack[3] are all quite compelling.

I'm fairly AI-skeptical not on grounds of "do they work" but "are they good for the world". I feel that getting AIs to do this kind of review work is a rare case that doesn't outsource thinking and deskill workers. It doesn't trigger the same alarm bells as having the AI write the code (including having the AI fix the issues it discovers). That's setting aside environmental and other ethical concerns, which are still significant to me.

I have been impressed by the recent quality of AI code reviews*, but the experience of interacting with 3 separate AI reviewers via GitHub PRs is pretty terrible. Having more local-oriented and jj/rebase-aware review rounds would be great.

*context: fairly large PHP/Laravel backend and Vue frontend

[1]: https://milvus.io/blog/ai-code-review-gets-better-when-model...

[2]: https://github.com/liliu-z/magpie

[3]: https://blog.cloudflare.com/ai-code-review/

18 days ago by hrideshmg

As a junior, i do actually enjoy going back and forth with the AI discussing different ways to implement something and exploring alternatives.

More often than not, I'd have an architectural idea that I'm not that confident in. The process of talking with the LLM takes a long time but it helps me sharpen the initial approach or even come up with a new one depending on the requirements.

18 days ago by tyrust

Be sure to explicitly ask for critiques or alternatives. In my experience the machine is really susceptible to a sort of anchoring effect.

18 days ago by ex-aws-dude

I've noticed that too, once you get an initial implementation it seems to always find a way to argue for keeping that approach in the name of simplicity

Like "Let's stick with what we have, its simple and it works." or "That seems overkill, let's not over complicate things"

17 days ago by mattmanser

I've got to the point where sometimes I frame my question as if I disagree with it, just to allow the AI to "agree" with me and actually critique it.

"My team mate wants to X, but I feel like that might be a bad idea. What could go wrong?", etc.

18 days ago by goolz

In this vein, I have a system level memory for Claude to push back and give me direct feedback when possible. So far a success as it helps cut through the sycophancy.

18 days ago by Tallain

[dead]

18 days ago by ChicagoDave

This is how I do it to and I’m an architect/senior dev. Keep it up!

17 days ago by suprjami

When doing this, I particularly like that the LLM sometimes gets things wrong.

It forces me to really understand each thing deeply so that I evaluate it properly.

It is like taking an exam where the exam writer is hostile and sneaks in trick questions. You only spot that the question is wrong when you fully reason through the answer.

17 days ago by RevEng

Huh, good point! When a colleague asks me to review their design or otherwise discuss it, I'm always looking for things they might have missed, assumptions they silently made, or corner cases that could come up. I start from the position that there is likely something missing and I need to find out what. Likewise, when I'm looking at suggestions or code or anything else from an AI, I'm assuming it made some mistakes, made some unstated assumptions, or didn't consider some corner cases, and so I'm having to carefully think through what it says to spot the mistake, rather than casually skimming it and going, "LGTM!" If it were too reliable, I might get lazy and not look too hard knowing that it's probably right anyway so there's no point trying too hard to find something. It's the same thing my juniors will sometimes do to me: don't assume I'm right just because I'm experienced - I still make mistakes too! I want to be questioned on anything that might not make sense, because even if it was intentional, the fact that the reason isn't clear is itself a problem to resolve. And I only know so many things - we all have different experiences and a junior can have just as much they can teach me as a senior.

18 days ago by lubesGordi

As a senior, I do the same.

18 days ago by etothet

ā€œA lot of people seem convinced that the point of AI coding is to write low-quality code as fast as possible.ā€

A lot of people think a lot of things, but I don’t think the majority of people think the point of using LLMs is so they can produce low-quality code. Do they produce low-quality code sometimes or often? Of course. But they also produce high-quality code very often. And sometimes they just a ā€œfineā€ job.

One of the promises - and there are plenty of cases where it’s met and where it falls drastically short - is that agentic coding tools can help us code faster that is just as good or better than what a human can. One of the other big ideal payoffs is that agentic coding can allow non-programmers to create things that previously required programmers to create.

We can debate as to how successful we’ve been toward the two goals above, but I think it’s misguided to say that the majority of people think LLMs should produce lower quality code.

18 days ago by batshit_beaver

> We can debate as to how successful we’ve been toward the two goals above, but I think it’s misguided to say that the majority of people think LLMs should produce lower quality code.

Guessing you’re not at FAANG or similar company. For the last 6 months at least there’s been tremendous pressure from leadership (including highly experienced IC engineers) to let AI take the reigns, assumption being that future AI assistants will be able to deal with any level of complexity and tech debt created today.

Given that everyone agrees that reviewing all AI-generated code is impractical (if you let the agents rip at maximum available bandwidth), and that ā€œharness engineeringā€ is at best immature and at worst complete snake oil when it comes to ensuring system stability, maintainability, and quality, I do believe that it’s fair to claim that most engineers are, in fact, supportive of low quality code generated by LLMs.

Fwiw I do see pushback here and there, but only from the lowest rungs on the career ladder - ICs with enough experience to see where this train is headed, but no ability to save it. Management needs to see the results of their policies first, and that will take months or even years to fully play out.

18 days ago by goatlover

Hopefully not, but there was recent thread with multiple posters arguing that code quality doesn't matter, and quality produced by humans in the past was often terrible. So who cares, ship it was the sentiment. Let the AIs handle the growing maintenance cost, I guess?

Kind of a shocking thing to see argued on HN. Maybe it's just the vibe coders.

18 days ago by dozerly

The vast majority of corporate-employed programmers write bad code. I think maybe 10% of the people I’ve come across have shown any interest or care in the quality of code they write.

There will be a large majority of people who hold these opinions, because they weren’t capable of or didn’t care enough to write good code in the before times

18 days ago by sarchertech

The real problem with these conversations is that code quality isn’t something we have any kind of consensus on.

To a lot of engineers code quality means upper-case C Clean Code. Other engineers are in the Grug brain camp where they think that premature abstraction is the worst kind of code.

But to your point I think the majority of engineers think they high quality code is anything that compiles or passes their (almost definitely insufficient) test suite.

18 days ago by Kiro

> arguing that code quality doesn't matter, and quality produced by humans in the past was often terrible

You're conflating two different things. I'm one of the people arguing for the latter, but not because I don't think code quality matters but as a counter to to sudden idealization of handcrafted code.

18 days ago by zahlman

> A lot of people think a lot of things, but I don’t think the majority of people think the point of using LLMs is so they can produce low-quality code.

Hence "seem". Of course people are not in the habit of describing their process output as "low quality", let alone supposing that that's the point. But when people clearly prioritize speed, and when the result is low quality, it's easy to get the impression of intent.

18 days ago by wavemode

Eh, I definitely do think that it has become a mainstream take. Not necessarily that we want lower-quality code, but simply that humans shouldn't be reviewing AI code for quality at all - that is, that code quality doesn't really matter and what matters is that the software works.

This is the entire premise of the concept of "vibe coding", and the concept of non-programmers using coding agents. The idea that there aren't large amounts of people and companies doing these things and/or who consider it "the future" is hard to argue.

18 days ago by customguy

But how do I know if something works if I don't know how it works? By testing (literally) all use cases, every single permutation of of variables? For complex programs there might not be enough time and energy in the universe to do that.

If I know what addition is, I can look at at a line that does addition and reason about it. If I just check "if it works", for all I know, the actual code is something like

    if (thing == 17)
    {
      shit_the_bed();
    }
    else
    {
      thing += other_thing;
    }
Sure, I can use an LLM to check on the first LLM, and then a third LLM to check on the second, and so on ad infinitum, but none of that, at no point, can give me what "knowing what addition is" gives me.

It's kinda like cheap/fake concrete: If you know something about concrete and what concrete is being used, you can roughly tell if it will last, what it will withstand. If you just go by "seems to work", "looks good", you get collapsing bridges and buildings after a few years, during heavy rainfall etc.

18 days ago by jillesvangurp

Regardless of what model you use, agentic coding tools are indeed pretty good at finding issues if you target them a bit. And they have no respect for their own code or any sense of shame. So, you can just point them at their own code with a new thread.

Many AI models seem biased to cutting corners by default when generating code, even when you ask them not to. But a few simple follow up prompts can address that. Simply ask for covering corner cases with tests, test all the known non happy paths, look for weaknesses, verify adherence to SOLID principles, do security audits, etc. It will find issues. With bigger projects, you can actually make it file those issues in gh with labels and priorities. And then you can make it iterate on fixing issues with separate PRs.

On a recent project, I made it implement a simple benchmark test for measuring throughput. I had a hunch it was doing very sub optimal things. I then asked it to look for potential performance bottlenecks and use the benchmark to verify improvements. At that point I already had a lot of end to end tests to verify correctness. So, these performance tweaks were relatively low risk. I got about two orders of magnitude improvement and a lot more graceful behavior when pushed to the limit.

If you have a bit of experience engineering systems, just treat these tools like they are junior developers. Competent but likely to skip some essential steps. So, just double check with a lot pointed questions "did you do X? If not, do it now". Anything that needs repeated asking, turn it into a guard rail / skill.

There's a bit of effort and skill involved with this. I imagine a lot of less experienced developers might struggle to get good results because they aren't asking for the right things.

18 days ago by bonoboTP

My problem is that it "finds issues" all the time and it never really ends. You go through the list, make a decision on how to go about it, give it back to the AI, it does the changes, you ask for issues again, there are now new issues in part due to the solutions from the previous fixes, now you again assess each issue and it's often valid but you have to ask yourself if it's worth fixing right now and whether the fix is worth the complexity for a super rare edge case, depending on the type of prpgram you make, and often the assessment of what's high or low priority is not great by the AI.

So to me this loop really never properly ends so it never feels like I'm done. Which is not great from a psychological point of view.

17 days ago by RevEng

I treat it like other triage tasks: things could always be better, but how much effort does it take and how much better could it be?

There's a common saying that the enemy of good is perfect. It's easy to get stuck in the loop of endlessly polishing something but never actually releasing it, even without AI. It's on us to decide how good is good enough and when to stop.

Over time I've learned to be rather aggressive about cutting out work. I'll quickly ask myself how serious is the issue (does it give wrong answers? block important flows? look embarrassing? or is it just a minor annoyance?) and how much effort would it take (five minutes? two hours? three weeks?). I should be able to make that call in no more than 30 seconds. I skim through the list of 20 suggestions the AI gives, I make plans to iterate on the 3 that are serious, and I simply accept that the rest are "good enough". It's not easy - both to be willing to let issues stand and to make the decision about what is good enough - but it's an important part of the job when triaging lists of bug reports and feature requests, so it's something we need to get good at anyway.

18 days ago by qaq

I find that to not get into this doom loop is to make sure the solution is not overengineered in the first place. AI will pile on complexity to infinity unless you actively gate it.

18 days ago by KronisLV

> Regardless of what model you use, agentic coding tools are indeed pretty good at finding issues if you target them a bit. And they have no respect for their own code or any sense of shame. So, you can just point them at their own code with a new thread. Many AI models seem biased to cutting corners by default when generating code, even when you ask them not to. But a few simple follow up prompts can address that.

That's more or less all of them, they do just generate the likely combinations of tokens, there is no critical thought involved. If you want to approximate that, review iterations are probably the right way to go about it, without the full conversation context either so there's no model output like "I'm doing X because it seems like the correct way to go about Y." but rather a fresh context which allows for more critical predictions.

Here's what works for me, can be made into a skill in whatever you use:

  I would like you to do a review loop!
  
  How this works:
  * once implementation is done, all tools must be run and pass: whatever is configured in the project like Ruff, Oxlint and Oxfmt, depending on the tech stack (also don't run such tools directly, look at package.json or similar project files/configurations/run scripts first; like if it's a stack that has compilation, compile the app, if there are tests, then run those; just know that you DO NOT generally need to stand up the whole app); if there is a projectlint-rules folder then that means you probably should run ProjectLint as well (local tool, use projectlint --help or projectlint --docs, or better yet, look at whether package.json or README.md have any instructions on how to run it)
  * once all the code seems okay, you will run THREE parallel sub-agents for code review: each looking at ALL changed code (not each having a different sub-section) and looking for CRITICAL/SERIOUS issues (not nitpicks), with the goal of not missing anything and building consensus
  * whatever CRITICAL/SERIOUS issues are found, if you can confirm that they're real and not false positives, you will then fix and remember to run the tools after, after which you will do another review iteration, followed by a fix iteration if needed and so on
  * remember that the review and fix loop must END with an iteration of the review agents returning that there are no CRITICAL/SERIOUS issues - you cannot just do fixes and say that there is nothing remaining yourself (and also remember that the reviews are done when all of the tools pass, like when the code is linted and formatted etc.)
  * at the end, produce a summary post that has a table, the rows being iterations, the columns for each of the agents (A, B, C) showing FIX/OK and then a column called Iteration summary; the goal for this is to show a summary how many iterations it took and what was fixed, you can also include text alongside the table as normally
The ProjectLint references might need to be removed (replace with whatever higher level linting/architecture tools you have, if any), but that's the overall idea. It does use a LOT of tokens though, but almost always there's something to fix. Of course, the problem is that sometimes there will be nitpicks or the fixes themselves won't be fully okay, though in general this trends towards slightly better code, even with something like Opus 4.7.
18 days ago by jillesvangurp

This can backfire a bit on token usage where it gets a bit to trigger happy running expensive things for trivial changes. I tend to not use sub agents for this reason. I actually manage to cover most my needs on the 20$/month codex subscription. I might switch to the 200$ plan at some point. But right now I just need to be economical as our company is fairy resource constrained. That's also why I prefer Codex over Claude Code. It seems it gets the job done for less $. Another advantage is that it seems to have less need to have things like this spelled out in this level of detail.

Another thing is that unless you are doing really complicated stuff, you probably don't need the latest models running on high. I'm still on 5.4 medium with codex. I see very little reason to change that.

Part of agentic engineering is figuring out how to be economical with tokens and time. You can sacrifice one for the other of course. But there are diminishing returns as well where spending 10x more doesn't actually get you 10x more quality/results.

18 days ago by KronisLV

I just have the Anthropic 100 USD Max plan and it's enough for daily work - I sometimes do hit the 5 hour limits by mid day, but weekly ones usually cap out at around 80% or thereabout, even with this approach. I usually use xhigh, sometimes max, both still result in situations where I need to intervene plenty, not even on that complex use cases (some LLM stuff, mostly web based CRUD, some light data processing, integrations with Jira and GitLab, processing PDFs and so on, sometimes ML training and geospatial work, like the Sentinel-2 satellite data, nothing crazy).

If I had to pay per token, I'd probably look at DeepSeek. In general it feels like it's a bit early for the technology - either our software methods are wasteful, or the hardware hasn't caught up. To me, it appears that we often need to throw more tokens at these problems, not less, since otherwise it's just one-shot slop.

18 days ago by esperent

> once all the code seems okay, you will run THREE parallel sub-agents for code review: each looking at ALL changed code

I did some evals with a prompt like this when I had some subscription tokens to burn, a few months ago. I think using Opus 4.5. What I found was:

1. Running two subagents was somewhat useful

2. Running three started to get redundant

3. Any more than three was pointless (at least when using the same model)

However, even two were getting like 60% the same results.

Much, much more effective was splitting out into audits through different lenses:

* One looking for security issues

* One looking for whether the task was completed successfully

* One looking for performance issues

* One looking for contract/maintainability issues

* One looking at test coverage

Etc.

18 days ago by KronisLV

You can get reasonably close with fewer, however more agents give better signal: e.g. if 3/3 flag something as an issue, the outer one that orchestrates them can view it as something to give more attention to, whereas if it's just 1/3, then it probably begs more consideration. Ofc more doesn't always imply right.

18 days ago by bad_username

We may be in the last Golden age of AI, where experienced professionals still exist who can code manually, and AI already exists who can code automatically, and when the former use the latter skillfully, wonders happen. This magical intersection may not exist iin the future, or become very rare.

18 days ago by dozerly

I think as long as it continues to be tangibly better these people will still exist and the intersection will continue to be valuable enough to survive.

18 days ago by josephg

> as long as it continues to be tangibly better these people will still exist

Sure. But how long will that last? LLMs are getting better at programming much faster than I am.

Imagine a plot with time on the X axis and LLM skill on the Y axis. The line goes up and to the right. On the left is GPT3, or GPT3.5 with the very first glimmers of programming ability just a few short years ago. In the middle is Opus 4.7 now.

Where's the intersection point, where AI skill is higher than that of humans? Less than 10 years. I'd guess less than 5 years.

18 days ago by _under_scores_

I think the problem is is that coding is not wholly a 'writing code' problem. It's a translation from idea to outcome. Often I think the bad code generated by an LLM is less to do with it's 'ability' and more to do with an instruction that hasn't adequately accounted for the possibility of what code satisfies the criteria. I'm not sure how a newer model can improve on this per se - sure there will be imrpovement on outright mistakes but for me at least, that's been and gone with more or less with any model released in te last 6 months.

18 days ago by vanuatu

I think a better way to think about it is - what are the invariants to our current architecture? Why can't you tell Claude to build you a 1B$ business, make no mistakes?

I have no doubt they will be better programmers than almost every human that has ever existed. But the role of a SWE will expand to fill the gaps that the LLM paradigm hasn't filled:

- Accountability

- Long term architectural vision, goal setting

- Everchanging business context

- Mercurial executives, people problems, relationships etc...

18 days ago by throwatdem12311

Token efficiency is going to be the next big thing.

Tokenmaxxing an army of juniors will destroy your business through slop induced tech debt and API costs. A senior that uses AI but is token efficient will be like rocket fuel.

18 days ago by dogleash

>rocket fuel

Did you write this comment with AI, or can you explain why so many people use the exact same terrible metaphor?

18 days ago by nicman23

people said the same with any innovation

18 days ago by javier123454321

And you act like there hasn't been a loss once we moved away from the master craftsman style of building to the professionalized architect style of building. We cannot make a gothic cathedral amymore. also CAD, homogenized the built environment, significantly. And we have been losing a lot of traditional, artisanal craftsmen art forms over the past century. artisanal craft mounds,

18 days ago by ant6n

There at a lot of crafts that don’t have real deep experts anymore because the work was 90% automated.

18 days ago by herrherrmann

Did they? Genuine question, because I do wonder if people in some industries in the past were ever anxious about these specific things (especially skill attrition).

18 days ago by dogleash

> I do wonder if people in some industries in the past were ever anxious about these specific things (especially skill attrition).

I've spoken with some people (now in their 60s & 70s) that worried about skill atrophy in their line of work.

First they worried about atrophy. Then they watched skill dry up. Now they know it's not available to buy anywhere. In the better cases the skills still exist, but entirely overseas.

These are people I could recognize as sharp engineers, even if I don't know their domains at all. I had to take them at their word about the value in what was lost. The problem is that it's easy to assume that business (or at least society) would prevent degradation of valuable knowledge over time.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.