Hacker News
7 hours ago by wiremine

I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

4. Give it code examples. Don't prompt with English only.

FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

7 hours ago by ryandrake

I guess the things I don't like about Chat are the same things I don't like about pair (or team) programming. I've always thought of programming as a solitary activity. You visualize the data structures, algorithms, data paths, calling flow and stack, and so on, in your mind, with very high throughput "discussions" happening entirely in your brain. Your brain is high bandwidth, low latency. Effortlessly and instantly move things around and visualize them. Figure everything out. Finally, when it's correct, you send it to the slow output device (your fingers).

The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.

Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.

5 hours ago by yarekt

That's such a mechanical way of describing pair programming. I'm guessing you don't do it often (understandable if its not working for you).

For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.

In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.

I'll leave readers to find all the caveats here

Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.

With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.

5 hours ago by RHSeeger

I think it depends heavily on the people. I've done pair programming at a previous job and I hated it. It wound up being a lot slower overall.

For me, there's

- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)

- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)

- Time when I want to write code that implements things, which may be based on the output of one of the above

That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.

4 hours ago by ionwake

this is so far removed from anything I have ever heard or experienced. But I know not everyone is the same and it is refreshing to view this comment.

3 hours ago by freehorse

Pair programming is imo great when there is some sort of complementarity between the programmers. It may or may not accelerate output, but it can definitely accelerate learning which is often harder. But as you say, this is not what working with llms is about.

6 hours ago by frocodillo

I would argue that is a feature of pair programming, not a bug. By forcing you to use the slower I/O parts of your brain (and that of your partner) the process becomes more deliberate, allowing you to catch edge cases, bad design patterns, and would-be bugs before even putting pen to paper so to speak. Not to mention that it immediately reduces the bus factor by having two people with a good understanding of the code.

Iā€™m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that itā€™s a very useful tool for critical functionality and hard problems and shouldnā€™t be dismissed.

5 hours ago by RHSeeger

You can do that without pair programming, though. Both through actual discussions and through rubber ducking.

4 hours ago by TeMPOraL

I guess it depends on a person. My experience is close to that of 'ryandrake.

I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.

The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.

As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.

It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.

--

[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.

4 hours ago by bobbiechen

I agree, chat is only useful in scenarios that are 1) poorly defined, and 2) require a back-and-forth feedback loop. And even then, there might be better UX options.

I wrote about this here: https://digitalseams.com/blog/the-ideal-ai-interface-is-prob...

6 hours ago by throwup238

At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves. I find that the chat interface gets rid of the downsides of pair programming (that the other person is a human being with their own agency*) while maintaining the ā€œintelligentā€ pair programmer aspect.

Especially with the new r1 thinking output, I find it useful to iterate on the initial prompt as a way to make my ideas more concrete as much as iterating through the chat interface which is more hit and miss due to context length limits.

* I donā€™t mean that in a negative way, but in a ā€œI canā€™t expect another person to respond to me instantly at 10 words per secondā€ way.

5 hours ago by cortesoft

> At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves.

I mean, isnā€™t typing your code also forcing you to make your ideas concrete

6 hours ago by godelski

  > I focus on the high-level code, and let the model focus on the lower level code.
Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

I've yet to see any model understand nuance or detail.

This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.

Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.

5 hours ago by yarekt

> A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make

This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.

Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)

2 hours ago by godelski

  > Building the wrong thing fast is about as useful as not building anything at all.
SAY IT LOUDER

Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.

I can predict some of the responses

  Premature optimization is the root of all evil
There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...

[0] https://softwareengineering.stackexchange.com/a/80092

[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm

3 hours ago by TeMPOraL

> For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?

Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.

5 hours ago by wiremine

> Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.

2 hours ago by godelski

I have a response to a sibling comment showing where GPT 4o and o1-preview do not yield good results.

  > assuming you direct it well.
But hey, I admit I might not be good at this. But honestly, I've found greater value in my time reading the docs than spending trying to prompt engineer my way through. And I've given a fair amount of time to trying to get good at prompting. I just can't get it to work.

I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).

5 hours ago by lucasmullens

> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

5 hours ago by pc86

I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

3 hours ago by troupo

> It has a great feature that grabs context from the codebase, I use it all the time.

If only this feature worked consistently, or reliably even half of the time.

It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time

4 hours ago by godelski

I have not. But I also can't get the general model to work well in even toy problems.

Here's a simple example with GPT-4o: https://0x0.st/8K3z.png

It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd

To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)

Just for the heck of it, here's the same thing but with o1-preview

Initial problem: https://0x0.st/8K3t.txt

Codegolf one: https://0x0.st/8K3y.txt

As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.

But hey, at least o1 has the correct output... It's just that that's not all that matters.

I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.

[0] https://codegolf.stackexchange.com/questions/215216/high-thr...

Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.

6 hours ago by rpastuszak

I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

1. I need a smart autocomplete that can work backwards and mimic my coding patterns

2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.

It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).

I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).

In a sense 1) is autocomplete 2) is a scaffolding tool.

5 hours ago by yarekt

Oh yea, point 1 for sure. I call copilot regex on steroids.

Example: - copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter) - show it how to do the first line - autocomplete the rest of the table - Check every row to make sure it didn't invent fields/types

For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.

6 hours ago by ryandrake

> I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns

> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?

After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.

6 hours ago by echelon

I work on GenAI in the media domain, and I think this will hold true with other fields as well:

- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.

- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.

7 hours ago by dataviz1000

I agree with you.

Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.

This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.

Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in Reactā€”especially at firstā€”I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.

9 hours ago by taeric

I'm growing to the idea that chat is a bad UI pattern, period. It is a great record of correspondence, I think. But it is a terrible UI for doing anything.

In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.

You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.

8 hours ago by zamfi

With apologies to Bill Buxton: "Every interface is best at something and worst at something else."

Chat is a great UI pattern for ephemeral conversation. It's why we get on the phone or on DM to talk with people while collaborating on documents, and don't just sit there making isolated edits to some Google Doc.

It's great because it can go all over the place and the humans get to decide which part of that conversation is meaningful and which isn't, and then put that in the document.

It's also obviously not enough: you still need documents!

But this isn't an "either-or" case. It's a "both" case.

8 hours ago by packetlost

I even think it's bad for generalized communication (ie. Slack/Teams/Discord/etc.) that isn't completely throwaway. Email is better in every single way for anything that might ever be relevant to review again or be filtered due to too much going on.

7 hours ago by goosejuice

I've had the opposite experience.

I have never had any issue finding information in slack with history going back nearly a decade. The only issue I have with Slack is a people problem where most communication is siloed in private channels and DMs.

Email threads are incredibly hard to follow though. The UX is rough and it shows.

7 hours ago by packetlost

I hard disagree. Don't have a conversation? Ask someone who does to forward it. Email lets the user control how to organize conversations. Want to stuff a conversation in a folder? Sure. Use tags religiously? Go for it. Have one big pile and rely on full-text search and metadata queries? You bet. Only the last of these is possible with the vast majority of IM platforms because the medium just doesn't allow for any other paradigm.

The fact that there's a subject header alone leads people to both stay on topic and have better thought out messages.

I agree that email threads could have better UX. Part of that is the clients insistence on appending the previous message to every reply. This is completely optional though and should probably be turned off by default for simple replies.

8 hours ago by taeric

Anything that needs to be filtered for viewing again pretty much needs version control. Email largely fails at that, as hard as other correspondence systems. That said, we have common workflows that use email to build reviewed artifacts.

People love complaining about the email workflow of git, but it is demonstrably better than any chat program for what it is doing.

8 hours ago by packetlost

I don't think I agree with this. Sure, many things should be versioned, but I don't think most correspondence requires it, which is emails primarily purpose.

8 hours ago by SoftTalker

Yes, agree. Chatting with a computer has all the worst attributes of talking to a person, without any of the intuitive understanding, nonverbal cues, even tone of voice, that all add meaning when two human beings talk to each other.

7 hours ago by TeMPOraL

That comment made sense 3 years ago. LLMs already solved "intuitive understanding", and the realtime multimodal variants (e.g. the thing behind "Advanced Voice" in ChatGPT app) handle tone of voice in both directions. As for nonverbal cues, I don't know yet - I got live video enabled in ChatGPT only few days ago and didn't have time to test it, but I would be surprised if it couldn't read the basics of body language at this point.

Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.

The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.

So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.

7 hours ago by ryandrake

> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.

Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).

6 hours ago by smj-edison

This is one reason I love what Bret Victor has been doing with Dynamic Land[1]. He's really been doing in on trying to engage as many senses as possible, and make the whole system understandable. One of his big points is that the future in technology is helping us understand more, not defer our understanding to something else.

[1] https://dynamicland.org/

EDIT: love your analogy to dance!

7 hours ago by taeric

I think this gets to how a lot of these conversations go past each other? A chat interface for getting a ride from a car is almost certainly doable? So long as the itinerary and other details remain separate things? At large, you are basically using a chat bot to be a travel agent, no?

But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.

In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

8 hours ago by aylmao

I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.

7 hours ago by 1ucky

You should check out out MCP by Anthropic, which solves some of the issues you mentioned.

8 hours ago by taeric

Yeah, this is something I didn't make clear on my post. Chat between people is the same bad UI. People read in the aggression that they bring to their reading. And get mad at people who are legit trying to understand something.

You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.

2 hours ago by SoftTalker

Yeah this is part of why RTO is not an entirely terrible idea. Remote work has these downsides -- working with another person over a computer link sucks pretty hard, no matter how you do it (not saying WFH doesn't have other very real upsides).

8 hours ago by Suppafly

I like the idea of having a chat program, the issue is that it's horrible to have a bunch of chat programs all integrated into every application you use that are separate and incompatible with each other.

I really don't like the idea of chatting with an AI though. There are better ways to interface with AIs and the focus on chat is making people forget that.

7 hours ago by tux1968

We need an LSP like protocol for AI, so that we can amortize the configuration over every place we want such an integration. AISP?

7 hours ago by lytedev

I think they're working on it? MCP: https://www.anthropic.com/news/model-context-protocol

8 hours ago by themanmaran

I'm surprised that the article (and comments) haven't mentioned Cursor.

Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:

1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.

2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")

3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"

8 hours ago by dartos

I think the wildly different experiences we all seem to have with AI code tools speaks to the inconsistency of the tools and our own lack of understanding of what goes into programming.

Iā€™ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.

But obviously some people find them helpful.

Makes me wonder if programming approaches differ wildly from developer to developer.

For me, if I have an automated tool writing code, itā€™s bc I donā€™t want to think about that code at all.

But since LLMs donā€™t really act deterministically, I feel the need to double check their output.

Thatā€™s very painful for me. At that point Iā€™d rather just write the code once, correctly.

7 hours ago by kenjackson

I use LLMs several times a day, and I think for me the issue is that verification is typically much faster than learning/writing. For example, I've never spent much time getting good at scripting. Sure, probably a gap I should resolve, but I feel like LLMs do a great job at it. And what I need to script is typically easy to verify, I don't need to spend time learning how to do things like, "move the files of this extension to this folder, but rewrite them so that the name begins with a three digit number based on the date when it was created, with the oldest starting with 001" -- or stuff like that. Sometimes it'll have a little bug, but one that I can debug quickly.

Scripting assistance by itself is worth the price of admission.

The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.

5 hours ago by shaan7

I think you and the parent comment are onto something. I also feel like the parent since I find it relatively difficult to read code that someone else wrote. My brain easily gets biased into thinking that the cases that the code is covering are the only possible ones. On the flip side, if I were writing the code, I am more likely to determine the corner cases. In other words, writing code helps me think, reading just biases me. This makes it extremely slow to review a LLM's code at which point I'd just write it myself.

Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).

5 hours ago by skydhash

Your script example is a good one, but the nice thing about scripting is when you learn the semantic of it. Like the general pattern of find -> filter/transform -> select -> action. Itā€™s very easy to come up with a one liner that can be trivially modified to adapt it to another context. More often than not, I find LLMs generate overly complicated scripts.

7 hours ago by sangnoir

> But since LLMs donā€™t really act deterministically, I feel the need to double check their output.

I feel the same

> Thatā€™s very painful for me. At that point Iā€™d rather just write the code once, correctly.

I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.

1. Like switching between unit-test frameworks

8 hours ago by aprilthird2021

I think it's about what you're working on. It's great for greenfield projects, etc. Terrible for complex projects that plug into a lot of other complex projects (like most of the software those of us not at startups work on day to day)

8 hours ago by dartos

Itā€™s been a headache for my greenfield side projects and for my day to day work.

Leaning on these tools just isnā€™t for me rn.

I like them most for one off scripts or very small bash glue.

8 hours ago by lolinder

I like Cursor, but I find the chat to be less useful than the super advanced auto complete.

The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chatā€”I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.

5 hours ago by tarsinge

I was a very skeptic on AI assisted coding until I tried Cursor and experienced the super autocomplete. It is ridiculously productive. For me itā€™s to the point it makes Vim obsolete because pressing tab correctly finishes the line or code block 90% of the time. Every developer having an opinion on AI assistance should have just tried to download Cursor and start editing a file.

8 hours ago by themanmaran

Agreed the autocomplete definitely gets more milage than the chat. But I frequently use it for terminal commands as well. Especially AWS cli work.

"how do I check the cors bucket policies on [S3 bucket name]"

7 hours ago by fragmede

> while prodding it the last 20% of the way.

hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.

6 hours ago by reustle

Depends on the company. Most of the time, you get paid to add features and fix bugs, while maintaining reliability.

End users donā€™t care where the code came from.

7 hours ago by jeremyjh

That is exactly what GP was pointing out, and why they said they do not prod it for it the last 20%.

3 hours ago by koito17

I'm not familiar with Cursor, but I've been using Zed with Claude 3.5 Sonnet. For side projects, I have found it extremely useful to provide the entire codebase as context and send concise prompts focusing on a single requirement. Claude handles "junior developer" tasks well when each unit of work is clearly separated.

Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.

Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.

Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.

---

As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.

4 hours ago by mkozlows

Windsurf is even moreso this way -- it'll look through your codebase trying to find the right files to inspect, it runs the build/test stuff and examines the output to see what went wrong.

I found interacting with it via chat to be super-useful and a great way to get stuff done. Yeah, sometimes you just have to drop into the code, and tag a particular line and say "this isn't going to work, rewrite it to do x" (or rewrite it yourself), but the ability to do that doesn't vitiate the value of the chat.

8 hours ago by matthewsinclair

Yep. 100% agree. The whole ā€œchat as UXā€ metaphor is a cul-de-sac that Iā€™m sure weā€™ll back out of sooner or later.

I think about this like SQL in the late 80s. At the time, SQL was the ā€œnext big thingā€ that was going to mean we didnā€™t need programmers, and that management could ā€œwrite codeā€. It didnā€™t quite work out that way, of course, as we all know.

I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.

In the same way that no end-users ever touch SQL these days (mostly), we wonā€™t expose the chat-based UX of an LLM to users either.

There will be a place for an ad-hoc natural language interface to a machine, but I suspect itā€™ll be the exception rather than the rule.

I really donā€™t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.

2 hours ago by jug

I think a counterpoint to this is that SQL has a specific and well-defined meaning and it takes effort to get what you actually want right. However, communication with an AI can sometimes request a specific context or requirements but also be intentionally open-ended where we want to give the AI leeway. The great thing here is that humans _and_ AI now quite clearly understand when a sentence is non-specific, or with great importance. So, I think itā€™s hard to come up with a more terse or approachable competitor to the sheer flexibility of language. In a way, I think itā€™s a similar problem that still has engineers across the world input text commands in a terminal screen since about 80 years now.

7 hours ago by sangnoir

> The whole ā€œchat as UXā€ metaphor is a cul-de-sac that Iā€™m sure weā€™ll back out of sooner or later.

Only when someone discovers another paradigm that matches or exceeds the effectiveness of LLMs without being a language model.

5 hours ago by daxfohl

Or DSLs like cucumber for acceptance tests. Cute for simple things, but for anything realistic, it's more convoluted than convenient.

9 hours ago by croes

Natural language isnā€™t made to be precise thatā€™s why we use a subset in programming languages.

So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and thatā€™s just programming with extra steps.

9 hours ago by Klaster_1

A lot of extra text usually means prior requirements, meeting transcripts, screen share recordings, chat history, Jira tickets and so on - the same information developers use to produce a result that satisfies the stakeholders and does the job. This seems like a straightforward direction solvable with more compute and more efficient memory. I think this will be the way it pans outs.

Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The whole premise of AI developer automation, IMO, is that if a human can develop a thing, then AI should be able too, given the same input.

9 hours ago by cube2222

We are kind of actually there already.

With a 200k token window like Claude has you can already dump a lot of design docs / transcripts / etc. at it.

9 hours ago by rightisleft

Its all about the context window. Even the new Mistral Codestral-2501 256K CW does a great job.

If you use cline with any large context model the results can be pretty amazing. It's not close to self guiding, You still need to break down and analyze the problem and provide clear and relevant instructions. IE you need to be a great architect. Once you are stable on the direction, its awe inspiring to watch it do the bulk if the implementation.

I do agree that there is space to improve over embedded chat windows in IDEs. Solutions will come in time.

9 hours ago by mollyporph

And Gemini has 2m token window. Which is about 10 minutes of video for example.

7 hours ago by layer8

This premise in your last paragraph can only work with AGI, and weā€™re probably not close to that yet.

9 hours ago by throwaway290

idk if you think all those jira tickets and meetings are precise enough (IMO sometimes the opposite)

By the way, remind me why you need design meetings in that ideal world?:)

> Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The point was that specification is not detailed enough in practice. Precise enough specification IS code. And the point is literally that natural language is just not made to be precise enough. So you are back where you started

So you waste time explaining in detail and rehashing requirements in this imprecise language until you see what code you want to see. Which was faster to just... idk.. type.

9 hours ago by Klaster_1

That's a fair point, I'd love to see Copilot come to a conclusion that they can't resolve a particular conundrum and communicates with other people so everyone makes a decision together.

9 hours ago by falcor84

Even if you have superhuman AI designers, you still need buy-in.

9 hours ago by oxfordmale

Yes, let's devise a more precise way to give AI instructions. Let's call it pAIthon. This will allow powers that be, like Zuckerberg to save face and claim that AI has replaced mid-level developers and enable developers to rebrand themselves as pAIthon programmers.

Joking aside, this is likely where we will end up, just with a slightly higher programming interface, making developers more productive.

9 hours ago by dylan604

man, pAIthon was just sitting right there for the taking

6 hours ago by oxfordmale

Thanks for pointing it out :-)

8 hours ago by undefined
[deleted]
9 hours ago by pjc50

There was a wave of this previously in programming: https://en.wikipedia.org/wiki/The_Last_One_(software)

All the same buzzwords, including "AI"! In 1981!

9 hours ago by spacemanspiff01

Or a proposal/feedback process. Ala you are hired by non technical person to build something, you generate requirements and a proposed solution. You then propose that solution, they give feedback.

Having a feedback loop is the only way viable for this. Sure, the client could give you a book on what they want, but often people do not know their edge cases, what issues may arise/etc.

8 hours ago by spolsky

I don't think Daniel's point is that Chat is generically a clunky UI and therefore Cursor cannot possibly exist. I think he's saying that to fully specify what a given computer program should do, you have to provide all kinds of details, and human language is too compressed and too sloppy to always include those details. For example, you might say "make a logon screen" but there are an infinite number of ways this could be done and until you answer a lot of questions you may not get what you want.

If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.

I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".

So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.

If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.

5 hours ago by skydhash

Which no one argues about really. But writing code was never the issue of software project. And if you open any books about software engineering, thereā€™s barely any mention of coding. The issue is the process of finding what code to write and where to put it in a practical and efficient way.

In your example, the issue is not with writing the logon screen (You can find several example on github and a lot of css frameworks have form snippets). The issue is making sure that it works and integrate well with the rest of the project, as well as being easy to maintain.

8 hours ago by zamfi

Yeah, and in fact this is about the best-case scenario in many ways: "good defaults" that get you approximately where you want to be, with a way to update when those defaults aren't what you want.

Right now we have a ton of AI/ML/LLM folks working on this first clear challenge: better models that generate better defaults, which is greatā€”but also will never solve the problem 100%, which is the second, less-clear challenge: there will always be times you don't want the defaults, especially as your requests become more and more high-level. It's the MS Word challenge reconstituted in the age of LLMs: everyone wants 20% of what's in Word, but it's not the same 20%. The good defaults are good except for that 20% you want to be non-default.

So there need to be ways to say "I want <this non-default thing>". Sometimes chat is enough for that, like when you can ask for a different background color. But sometimes it's really not! This is especially true when the things you want are not always obvious from limited observations of the program's behaviorā€”where even just finding out that the "good default" isn't what you want can be hard.

Too few people are working on this latter challenge, IMO. (Full disclosure: I am one of them.)

9 hours ago by Edmond

This is about relying on requirements type documents to drive AI based software development, I believe this will be ultimately integrated into all the AI-dev tools, if not so already. It is really just additional context.

Here is an example of our approach:

https://blog.codesolvent.com/2024/11/building-youtube-video-...

We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.

Here's a demo:

https://youtu.be/NjYbhZjj7o8?si=XPhivIZz3fgKFK8B

9 hours ago by wongarsu

Now we just need another tool that allows stakeholders to write requirement docs using a chat interface

7 hours ago by mlsu

I can't wait for someone to invent a new language, maybe a subset of English, that is structured enough to half-well describe computer programs. Then train a model with RLHF to generate source code based on prompts in this new language.

It will slowly grow in complexity, strictness, and features, until it becomes a brand-new programming language, just with a language model and a SaaS sitting in the middle of it.

A startup will come and disrupt the whole thing by simply writing code in a regular programming language.

4 hours ago by fullstackwife

> Who is hiring 2035:

> Looking for a low level engineer, who works close to the metal, will work on our prompts

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.