Hacker News
8 months ago by wiremine

I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

4. Give it code examples. Don't prompt with English only.

FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

8 months ago by ryandrake

I guess the things I don't like about Chat are the same things I don't like about pair (or team) programming. I've always thought of programming as a solitary activity. You visualize the data structures, algorithms, data paths, calling flow and stack, and so on, in your mind, with very high throughput "discussions" happening entirely in your brain. Your brain is high bandwidth, low latency. Effortlessly and instantly move things around and visualize them. Figure everything out. Finally, when it's correct, you send it to the slow output device (your fingers).

The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.

Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.

8 months ago by yarekt

That's such a mechanical way of describing pair programming. I'm guessing you don't do it often (understandable if its not working for you).

For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.

In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.

I'll leave readers to find all the caveats here

Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.

With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.

8 months ago by RHSeeger

I think it depends heavily on the people. I've done pair programming at a previous job and I hated it. It wound up being a lot slower overall.

For me, there's

- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)

- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)

- Time when I want to write code that implements things, which may be based on the output of one of the above

That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.

8 months ago by skue

> I'm guessing you don't do it often (understandable if its not working for you). For me pair programming accelerates development to much more than 2x.

The value of pair programming is inversely proportional to the expertise of the participant. Junior devs who pair with senior devs get a lot out of it, senior devs not so much.

GP is probably a more experienced dev, whereas you are the type of dev who says things like ā€œI’m guessing that youā€¦ā€.

8 months ago by ionwake

this is so far removed from anything I have ever heard or experienced. But I know not everyone is the same and it is refreshing to view this comment.

8 months ago by viraptor

> but you won't reach that shared understanding until you start trying to implement something.

That's very much not my experience. Pairing on design and diagrams is as or more useful than on the code itself. Once you have a good design, the code is pretty simple.

8 months ago by frocodillo

I would argue that is a feature of pair programming, not a bug. By forcing you to use the slower I/O parts of your brain (and that of your partner) the process becomes more deliberate, allowing you to catch edge cases, bad design patterns, and would-be bugs before even putting pen to paper so to speak. Not to mention that it immediately reduces the bus factor by having two people with a good understanding of the code.

I’m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that it’s a very useful tool for critical functionality and hard problems and shouldn’t be dismissed.

8 months ago by RHSeeger

You can do that without pair programming, though. Both through actual discussions and through rubber ducking.

8 months ago by TeMPOraL

I guess it depends on a person. My experience is close to that of 'ryandrake.

I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.

The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.

As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.

It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.

--

[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.

8 months ago by hinkley

Efficient, but not always more effective.

8 months ago by hmcdona1

This going to sound out of left field, but I would venture to guess you have very high spatial reasoning skills. I operate much this same way and only recently connected these dots that that skill might be what my brain leans on so heavily while programming and debugging.

Pair programming is endlessly frustrating beyond just rubber duckying because I’m having to exit my mental model, communicate it to someone else, and then translate and relate their inputs back into my mental model which is not exactly rooted in language in my head.

8 months ago by bobbiechen

I agree, chat is only useful in scenarios that are 1) poorly defined, and 2) require a back-and-forth feedback loop. And even then, there might be better UX options.

I wrote about this here: https://digitalseams.com/blog/the-ideal-ai-interface-is-prob...

8 months ago by godelski

  > I focus on the high-level code, and let the model focus on the lower level code.
Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

I've yet to see any model understand nuance or detail.

This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.

Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.

8 months ago by yarekt

> A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make

This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.

Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)

8 months ago by godelski

  > Building the wrong thing fast is about as useful as not building anything at all.
SAY IT LOUDER

Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.

I can predict some of the responses

  Premature optimization is the root of all evil
There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...

[0] https://softwareengineering.stackexchange.com/a/80092

[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm

8 months ago by TeMPOraL

> For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?

Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.

8 months ago by lucasmullens

> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

8 months ago by troupo

> It has a great feature that grabs context from the codebase, I use it all the time.

If only this feature worked consistently, or reliably even half of the time.

It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time

8 months ago by pc86

I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

8 months ago by godelski

I have not. But I also can't get the general model to work well in even toy problems.

Here's a simple example with GPT-4o: https://0x0.st/8K3z.png

It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd

To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)

Just for the heck of it, here's the same thing but with o1-preview

Initial problem: https://0x0.st/8K3t.txt

Codegolf one: https://0x0.st/8K3y.txt

As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.

But hey, at least o1 has the correct output... It's just that that's not all that matters.

I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.

[0] https://codegolf.stackexchange.com/questions/215216/high-thr...

Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.

8 months ago by wiremine

> Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.

8 months ago by godelski

I have a response to a sibling comment showing where GPT 4o and o1-preview do not yield good results.

  > assuming you direct it well.
But hey, I admit I might not be good at this. But honestly, I've found greater value in my time reading the docs than spending trying to prompt engineer my way through. And I've given a fair amount of time to trying to get good at prompting. I just can't get it to work.

I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).

8 months ago by rpastuszak

I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

1. I need a smart autocomplete that can work backwards and mimic my coding patterns

2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.

It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).

I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).

In a sense 1) is autocomplete 2) is a scaffolding tool.

8 months ago by yarekt

Oh yea, point 1 for sure. I call copilot regex on steroids.

Example: - copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter) - show it how to do the first line - autocomplete the rest of the table - Check every row to make sure it didn't invent fields/types

For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.

8 months ago by ryandrake

> I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns

> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?

After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.

8 months ago by echelon

I work on GenAI in the media domain, and I think this will hold true with other fields as well:

- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.

- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.

8 months ago by dataviz1000

I agree with you.

Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.

This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.

Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in React—especially at first—I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.

8 months ago by taeric

I'm growing to the idea that chat is a bad UI pattern, period. It is a great record of correspondence, I think. But it is a terrible UI for doing anything.

In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.

You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.

8 months ago by zamfi

With apologies to Bill Buxton: "Every interface is best at something and worst at something else."

Chat is a great UI pattern for ephemeral conversation. It's why we get on the phone or on DM to talk with people while collaborating on documents, and don't just sit there making isolated edits to some Google Doc.

It's great because it can go all over the place and the humans get to decide which part of that conversation is meaningful and which isn't, and then put that in the document.

It's also obviously not enough: you still need documents!

But this isn't an "either-or" case. It's a "both" case.

8 months ago by packetlost

I even think it's bad for generalized communication (ie. Slack/Teams/Discord/etc.) that isn't completely throwaway. Email is better in every single way for anything that might ever be relevant to review again or be filtered due to too much going on.

8 months ago by goosejuice

I've had the opposite experience.

I have never had any issue finding information in slack with history going back nearly a decade. The only issue I have with Slack is a people problem where most communication is siloed in private channels and DMs.

Email threads are incredibly hard to follow though. The UX is rough and it shows.

8 months ago by packetlost

I hard disagree. Don't have a conversation? Ask someone who does to forward it. Email lets the user control how to organize conversations. Want to stuff a conversation in a folder? Sure. Use tags religiously? Go for it. Have one big pile and rely on full-text search and metadata queries? You bet. Only the last of these is possible with the vast majority of IM platforms because the medium just doesn't allow for any other paradigm.

The fact that there's a subject header alone leads people to both stay on topic and have better thought out messages.

I agree that email threads could have better UX. Part of that is the clients insistence on appending the previous message to every reply. This is completely optional though and should probably be turned off by default for simple replies.

8 months ago by esafak

In Slack people don't even consistently use threads, because they are not forced to, so conversations are strewn all over the place, interleaved with one another. Slack has no model of a discussion in the first place.

8 months ago by taeric

Anything that needs to be filtered for viewing again pretty much needs version control. Email largely fails at that, as hard as other correspondence systems. That said, we have common workflows that use email to build reviewed artifacts.

People love complaining about the email workflow of git, but it is demonstrably better than any chat program for what it is doing.

8 months ago by packetlost

I don't think I agree with this. Sure, many things should be versioned, but I don't think most correspondence requires it, which is emails primarily purpose.

8 months ago by SoftTalker

Yes, agree. Chatting with a computer has all the worst attributes of talking to a person, without any of the intuitive understanding, nonverbal cues, even tone of voice, that all add meaning when two human beings talk to each other.

8 months ago by TeMPOraL

That comment made sense 3 years ago. LLMs already solved "intuitive understanding", and the realtime multimodal variants (e.g. the thing behind "Advanced Voice" in ChatGPT app) handle tone of voice in both directions. As for nonverbal cues, I don't know yet - I got live video enabled in ChatGPT only few days ago and didn't have time to test it, but I would be surprised if it couldn't read the basics of body language at this point.

Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.

The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.

So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.

8 months ago by ryandrake

> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.

Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).

8 months ago by smj-edison

This is one reason I love what Bret Victor has been doing with Dynamic Land[1]. He's really been doing in on trying to engage as many senses as possible, and make the whole system understandable. One of his big points is that the future in technology is helping us understand more, not defer our understanding to something else.

[1] https://dynamicland.org/

EDIT: love your analogy to dance!

8 months ago by taeric

I think this gets to how a lot of these conversations go past each other? A chat interface for getting a ride from a car is almost certainly doable? So long as the itinerary and other details remain separate things? At large, you are basically using a chat bot to be a travel agent, no?

But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.

In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

8 months ago by aylmao

I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.

8 months ago by 1ucky

You should check out out MCP by Anthropic, which solves some of the issues you mentioned.

8 months ago by hakfoo

The idea of chat interfaces always seemed to be to disguise available functionality.

It's a CLI without the integrity. When you bought a 386, it came with a big book that said "MS-DOS 4.01" and enumerated the 75 commands you can type at the C:\> prompt and actually make something useful happen.

When you argue with ChatGPT, its whole business is to not tell you what those 75 commands are. Maybe your prompt fits its core competency and you'll get exactly what you wanted. Maybe it's hammering what you said into a shape it can parse and producing marginal garbage. Maybe it's going to hallucinate from nothing. But it's going to hide that behind a bunch of cute language and hopefully you'll just keep pulling the gacha and blaming yourself if it's not right.

8 months ago by taeric

Yeah, this is something I didn't make clear on my post. Chat between people is the same bad UI. People read in the aggression that they bring to their reading. And get mad at people who are legit trying to understand something.

You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.

8 months ago by SoftTalker

Yeah this is part of why RTO is not an entirely terrible idea. Remote work has these downsides -- working with another person over a computer link sucks pretty hard, no matter how you do it (not saying WFH doesn't have other very real upsides).

8 months ago by Suppafly

I like the idea of having a chat program, the issue is that it's horrible to have a bunch of chat programs all integrated into every application you use that are separate and incompatible with each other.

I really don't like the idea of chatting with an AI though. There are better ways to interface with AIs and the focus on chat is making people forget that.

8 months ago by tux1968

We need an LSP like protocol for AI, so that we can amortize the configuration over every place we want such an integration. AISP?

8 months ago by lytedev

I think they're working on it? MCP: https://www.anthropic.com/news/model-context-protocol

8 months ago by themanmaran

I'm surprised that the article (and comments) haven't mentioned Cursor.

Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:

1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.

2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")

3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"

8 months ago by dartos

I think the wildly different experiences we all seem to have with AI code tools speaks to the inconsistency of the tools and our own lack of understanding of what goes into programming.

I’ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.

But obviously some people find them helpful.

Makes me wonder if programming approaches differ wildly from developer to developer.

For me, if I have an automated tool writing code, it’s bc I don’t want to think about that code at all.

But since LLMs don’t really act deterministically, I feel the need to double check their output.

That’s very painful for me. At that point I’d rather just write the code once, correctly.

8 months ago by kenjackson

I use LLMs several times a day, and I think for me the issue is that verification is typically much faster than learning/writing. For example, I've never spent much time getting good at scripting. Sure, probably a gap I should resolve, but I feel like LLMs do a great job at it. And what I need to script is typically easy to verify, I don't need to spend time learning how to do things like, "move the files of this extension to this folder, but rewrite them so that the name begins with a three digit number based on the date when it was created, with the oldest starting with 001" -- or stuff like that. Sometimes it'll have a little bug, but one that I can debug quickly.

Scripting assistance by itself is worth the price of admission.

The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.

8 months ago by shaan7

I think you and the parent comment are onto something. I also feel like the parent since I find it relatively difficult to read code that someone else wrote. My brain easily gets biased into thinking that the cases that the code is covering are the only possible ones. On the flip side, if I were writing the code, I am more likely to determine the corner cases. In other words, writing code helps me think, reading just biases me. This makes it extremely slow to review a LLM's code at which point I'd just write it myself.

Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).

8 months ago by skydhash

Your script example is a good one, but the nice thing about scripting is when you learn the semantic of it. Like the general pattern of find -> filter/transform -> select -> action. It’s very easy to come up with a one liner that can be trivially modified to adapt it to another context. More often than not, I find LLMs generate overly complicated scripts.

8 months ago by aprilthird2021

I think it's about what you're working on. It's great for greenfield projects, etc. Terrible for complex projects that plug into a lot of other complex projects (like most of the software those of us not at startups work on day to day)

8 months ago by dartos

It’s been a headache for my greenfield side projects and for my day to day work.

Leaning on these tools just isn’t for me rn.

I like them most for one off scripts or very small bash glue.

8 months ago by sangnoir

> But since LLMs don’t really act deterministically, I feel the need to double check their output.

I feel the same

> That’s very painful for me. At that point I’d rather just write the code once, correctly.

I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.

1. Like switching between unit-test frameworks

8 months ago by lolinder

I like Cursor, but I find the chat to be less useful than the super advanced auto complete.

The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chat—I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.

8 months ago by tarsinge

I was a very skeptic on AI assisted coding until I tried Cursor and experienced the super autocomplete. It is ridiculously productive. For me it’s to the point it makes Vim obsolete because pressing tab correctly finishes the line or code block 90% of the time. Every developer having an opinion on AI assistance should have just tried to download Cursor and start editing a file.

8 months ago by themanmaran

Agreed the autocomplete definitely gets more milage than the chat. But I frequently use it for terminal commands as well. Especially AWS cli work.

"how do I check the cors bucket policies on [S3 bucket name]"

8 months ago by fragmede

> while prodding it the last 20% of the way.

hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.

8 months ago by reustle

Depends on the company. Most of the time, you get paid to add features and fix bugs, while maintaining reliability.

End users don’t care where the code came from.

8 months ago by jeremyjh

That is exactly what GP was pointing out, and why they said they do not prod it for it the last 20%.

8 months ago by koito17

I'm not familiar with Cursor, but I've been using Zed with Claude 3.5 Sonnet. For side projects, I have found it extremely useful to provide the entire codebase as context and send concise prompts focusing on a single requirement. Claude handles "junior developer" tasks well when each unit of work is clearly separated.

Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.

Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.

Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.

---

As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.

8 months ago by stitched2gethr

I think this misses the point. It seems like the author is saying we should move from imperative instructions to a declarative document that describes what the software should do.

Imperative: - write a HTTP server that serves jokes - add a healthcheck endpoint - add TLS and change the serving port to 443

Declarative: - a HTTP server that serves jokes - contains a healthcheck endpoint - supports TLS on port 443

The differences here seem minimal because you can see all of it at once, but in the current chat paradigm you'd have to search through everything you've said to the bot to get the full context, including the side roads that never materialized.

In the document approach you're constantly refining the document. It's better than reviewing the code because (in theory) you're looking at "support TLS on port 443" instead of a lot of code, which means it can be used by a wider audience. And ideally I can give the same high level spec to multiple LLMs and see which makes the best application.

8 months ago by ygouzerh

Good explanation! As an open-reflexion: will a declarative document be as detailed as the imperative version? Often between the specs that the product team is providing (that we can consider as the "descriptive" document) and the implementation, many sub specs have been created by the tech team that uncovered some important implementation details. It's like a Rabbit Hole.

For example, for a signup page, we could have: - Declarative: Signup the user using their email address - Imperative: To do the same, we will need to implement the smtp library, which means discovering that we need an SMTP server, so now we need to choose which one. And when purchasing an SMTP Server plan, we discover that there are rate limit, so now we need to add some bot protection to our signup page (IP Rate Limit only? ReCaptcha? Cloudflare bot protection?), etc

Which means that at the end, the imperative code way is kind of like the ultimate implementation specs.

8 months ago by bze12

I could imagine a hybrid where declarative statements drive the high-level, and lower-level details branch off and are hashed out imperatively (in chat). Maybe those detail decisions then revise the declarative statements.

The source of truth would still be the code though, otherwise the declarative statements would get so verbose that they wouldn't be any more useful than writing the code itself.

8 months ago by skydhash

The issue is that there’s no execution platform for declarative specs, so something will be translated to imperative and that is where the issue lies. There’s always an imperative core which needs to be deterministic or it’s out needs to be verified. LLMs are not the former and the latter option can take more time than just writing the code.

8 months ago by croes

Natural language isn’t made to be precise that’s why we use a subset in programming languages.

So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and that’s just programming with extra steps.

8 months ago by Klaster_1

A lot of extra text usually means prior requirements, meeting transcripts, screen share recordings, chat history, Jira tickets and so on - the same information developers use to produce a result that satisfies the stakeholders and does the job. This seems like a straightforward direction solvable with more compute and more efficient memory. I think this will be the way it pans outs.

Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The whole premise of AI developer automation, IMO, is that if a human can develop a thing, then AI should be able too, given the same input.

8 months ago by throwaway290

idk if you think all those jira tickets and meetings are precise enough (IMO sometimes the opposite)

By the way, remind me why you need design meetings in that ideal world?:)

> Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The point was that specification is not detailed enough in practice. Precise enough specification IS code. And the point is literally that natural language is just not made to be precise enough. So you are back where you started

So you waste time explaining in detail and rehashing requirements in this imprecise language until you see what code you want to see. Which was faster to just... idk.. type.

8 months ago by Klaster_1

That's a fair point, I'd love to see Copilot come to a conclusion that they can't resolve a particular conundrum and communicates with other people so everyone makes a decision together.

8 months ago by falcor84

Even if you have superhuman AI designers, you still need buy-in.

8 months ago by cube2222

We are kind of actually there already.

With a 200k token window like Claude has you can already dump a lot of design docs / transcripts / etc. at it.

8 months ago by rightisleft

Its all about the context window. Even the new Mistral Codestral-2501 256K CW does a great job.

If you use cline with any large context model the results can be pretty amazing. It's not close to self guiding, You still need to break down and analyze the problem and provide clear and relevant instructions. IE you need to be a great architect. Once you are stable on the direction, its awe inspiring to watch it do the bulk if the implementation.

I do agree that there is space to improve over embedded chat windows in IDEs. Solutions will come in time.

8 months ago by mollyporph

And Gemini has 2m token window. Which is about 10 minutes of video for example.

8 months ago by layer8

This premise in your last paragraph can only work with AGI, and we’re probably not close to that yet.

8 months ago by oxfordmale

Yes, let's devise a more precise way to give AI instructions. Let's call it pAIthon. This will allow powers that be, like Zuckerberg to save face and claim that AI has replaced mid-level developers and enable developers to rebrand themselves as pAIthon programmers.

Joking aside, this is likely where we will end up, just with a slightly higher programming interface, making developers more productive.

8 months ago by dylan604

man, pAIthon was just sitting right there for the taking

8 months ago by oxfordmale

Thanks for pointing it out :-)

8 months ago by undefined
[deleted]
8 months ago by pjc50

There was a wave of this previously in programming: https://en.wikipedia.org/wiki/The_Last_One_(software)

All the same buzzwords, including "AI"! In 1981!

8 months ago by empath75

AIs actually are very good at this. They wouldn't be able to write code at all otherwise. If you're careful in your prompting, they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.

8 months ago by 9rx

> If you're careful in your prompting

In other words, if you replace natural language with a programming language then the computer will do a good job of interpreting your intent. But that's always been true, so...

8 months ago by benatkin

Being careful in your prompting doesn’t imply that. That can also be thought of as just using natural language well.

8 months ago by LordDragonfang

> they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.

Which model are you talking about here? Because with ChatGPT, I struggle with getting it to ask any clarifying questions before just dumping code filled with placeholders I don't want, even when I explicitly prompt it to ask for clarification.

8 months ago by oxfordmale

AI is very good at this. Unfortunately, humans tend to be super bad at providing detailed verbal instructions.

8 months ago by nomel

Then those same humans won't be able to reason about code, or the problem spaces they're working in, regardless, since it's all fundamentally about precise specifics.

8 months ago by indymike

Languages used for day to day communication between humans do not have the specificity needed for detailed instructions... even to other humans. We out of band context (body language, social norms, tradition, knowledge of a person) quite a bit more than you would think.

8 months ago by croes

AI is a little bit like Occamā€˜s razor, when you say hoofbeats, you get horses. Bad if you need Zebras.

8 months ago by matthewsinclair

Yep. 100% agree. The whole ā€œchat as UXā€ metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.

I think about this like SQL in the late 80s. At the time, SQL was the ā€œnext big thingā€ that was going to mean we didn’t need programmers, and that management could ā€œwrite codeā€. It didn’t quite work out that way, of course, as we all know.

I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.

In the same way that no end-users ever touch SQL these days (mostly), we won’t expose the chat-based UX of an LLM to users either.

There will be a place for an ad-hoc natural language interface to a machine, but I suspect it’ll be the exception rather than the rule.

I really don’t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.

8 months ago by jug

I think a counterpoint to this is that SQL has a specific and well-defined meaning and it takes effort to get what you actually want right. However, communication with an AI can sometimes request a specific context or requirements but also be intentionally open-ended where we want to give the AI leeway. The great thing here is that humans _and_ AI now quite clearly understand when a sentence is non-specific, or with great importance. So, I think it’s hard to come up with a more terse or approachable competitor to the sheer flexibility of language. In a way, I think it’s a similar problem that still has engineers across the world input text commands in a terminal screen since about 80 years now.

8 months ago by sangnoir

> The whole ā€œchat as UXā€ metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.

Only when someone discovers another paradigm that matches or exceeds the effectiveness of LLMs without being a language model.

8 months ago by amedviediev

I actually came to the same conclusion. I am currently working on a side project that's an AI powered writing app for writers, and while I still provide chat because that seems to be the expectation, my goal is to abstract all the AI assistance a writer might need into curated UI options.

8 months ago by daxfohl

Or DSLs like cucumber for acceptance tests. Cute for simple things, but for anything realistic, it's more convoluted than convenient.

8 months ago by spolsky

I don't think Daniel's point is that Chat is generically a clunky UI and therefore Cursor cannot possibly exist. I think he's saying that to fully specify what a given computer program should do, you have to provide all kinds of details, and human language is too compressed and too sloppy to always include those details. For example, you might say "make a logon screen" but there are an infinite number of ways this could be done and until you answer a lot of questions you may not get what you want.

If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.

I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".

So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.

If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.

8 months ago by zamfi

Yeah, and in fact this is about the best-case scenario in many ways: "good defaults" that get you approximately where you want to be, with a way to update when those defaults aren't what you want.

Right now we have a ton of AI/ML/LLM folks working on this first clear challenge: better models that generate better defaults, which is great—but also will never solve the problem 100%, which is the second, less-clear challenge: there will always be times you don't want the defaults, especially as your requests become more and more high-level. It's the MS Word challenge reconstituted in the age of LLMs: everyone wants 20% of what's in Word, but it's not the same 20%. The good defaults are good except for that 20% you want to be non-default.

So there need to be ways to say "I want <this non-default thing>". Sometimes chat is enough for that, like when you can ask for a different background color. But sometimes it's really not! This is especially true when the things you want are not always obvious from limited observations of the program's behavior—where even just finding out that the "good default" isn't what you want can be hard.

Too few people are working on this latter challenge, IMO. (Full disclosure: I am one of them.)

8 months ago by skydhash

Which no one argues about really. But writing code was never the issue of software project. And if you open any books about software engineering, there’s barely any mention of coding. The issue is the process of finding what code to write and where to put it in a practical and efficient way.

In your example, the issue is not with writing the logon screen (You can find several example on github and a lot of css frameworks have form snippets). The issue is making sure that it works and integrate well with the rest of the project, as well as being easy to maintain.

8 months ago by jakelazaroff

I agree with the premise but not with the conclusion. When you're building visual things, you communicate visually: rough sketches, whiteboard diagrams, mockups, notes scrawled in the margins.

Something like tldraw's "make real" [1] is a much better bet, imo (not that it's mutually exclusive). Draw a rough mockup of what you want, let AI fill in the details, then draw and write on it to communicate your changes.

We think multi-modally; why should we limit the creative process to just text?

[1] https://tldraw.substack.com/p/make-real-the-story-so-far

8 months ago by Edmond

This is about relying on requirements type documents to drive AI based software development, I believe this will be ultimately integrated into all the AI-dev tools, if not so already. It is really just additional context.

Here is an example of our approach:

https://blog.codesolvent.com/2024/11/building-youtube-video-...

We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.

Here's a demo:

https://youtu.be/NjYbhZjj7o8?si=XPhivIZz3fgKFK8B

8 months ago by wongarsu

Now we just need another tool that allows stakeholders to write requirement docs using a chat interface

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.