Yon – a topos-oriented language with a content-addressed lattice heap

1 week ago/89 comments/yon-lang.org

Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could nonetheless be useful to improve symbolic reasoning about the concepts themselves. I made use of both books and AI, and I followed the development of the latter, mainly with a critical eye. I have several open projects, and from some observations and explorations on one of them I started asking myself what the current limits of reasoning, of logic, of mathematics itself are. So I explored categories, and topoi, above all starting from Mazzola's theory of music. I asked myself whether this could influence type theory in programming, and I ran some experiments. Out of this came this programming language, Yon, inspired by Yoneda and by morphisms. From another project I drew observations on the Leech lattice; from yet another, some experiments with mmap and coordinate-based allocation in a structure that would be advantageous, again, in a topological sense. The language certainly has mistakes here and there and I wrote the documentation in a hurry; the work took 3 weeks in total. It compiles to LLVM for performance reasons, and for now I preferred to avoid a VM and a GC. It contains unusual data structures that turn out to be performant. It's worth a look, and I hope it will win some converts, and that someone will want to help me with its development. I'd love for it to bring fresh stimuli to programming and maybe open a few new frontiers. A few concrete details, for those who want to look under the hood. The compiler is a real pipeline, not an interpreter: an OCaml frontend takes .yon source into a custom MLIR dialect I called "topos", where the categorical constructs live as first-class operations; its lowering passes take everything down to LLVM IR and from there to a native executable. A single command, yonc, drives the whole chain, and you can stop at any intermediate stage to see what a categorical construct actually becomes on its way to silicon. The runtime is where the Leech lattice observations ended up. The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address. That buys three things I would now find hard to give up: equality is a single machine comparison no matter how big the value is (string equality benches flat at ~17 ns up to 32,768-character strings, because it compares handles, never bytes); deduplication is global and automatic, with no interning logic in user code; and giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move. Concurrency I kept deliberately simple-minded: no threads, no shared mutable state. A program splits into isolated "Spaces" (separate processes, isolation enforced by the MMU) that talk over shared-memory channels with explicit failure semantics. About what is verified and what is just hope: the ground truth is a regression suite of 112 examples plus a cross-Space scenario suite, with exit codes identical on Linux x86-64 and macOS Apple Silicon (Intel Macs: untested). The book on the site, 21 chapters plus appendices, had every snippet compiled and run before being written down. The benchmarks appendix declares its environment and method; I tried not to publish any number without one. The limits of 1.0 are written down as well, in a baseline document that lists every fixed pool (256 heaps per chain, 64 Spaces, 16 concurrent RPC sessions, and so on), with the rationale that a hard limit that fails loudly is a specification, while a soft limit that degrades silently is a bug. For the license I went with the GCC model: compiler and toolchain are AGPLv3, the runtime is AGPLv3 with an explicit linking exception, so the language itself stays free, and the programs you write in it are entirely yours, under any license you choose.

Site + book: https://yon-lang.org Repo: https://github.com/yon-language/yon (tag v1.0.0)

Happy to answer anything: the topos dialect, why a lattice rather than a hash, what the categorical constructs lower to, what broke along the way.

11 days ago by mccoyb

I'm not sure where or how to convey this, because I've seen several of these languages designed with AI, documentation created using AI, etc -- posted on Hacker News in the last months or so, and I've responded to each one with roughly the same feedback (and I'm assuming good faith: that the intent is that the poster wishes to grow as a language designer).

Your audience, or whoever you aim your work at, should be treated with respect. Otherwise, why should they give you the time of day? Why would you expect them to respond positively to effort alone when effort (in code and in shit prose) is extremely cheap right now? Their time is not cheap ...

When I read the documentation, and it is extremely clear that you haven't taken the time to clarify your ideas, when much of it is LLM prose, when much of the content introduces highfalutin ideas without motivation, blending categorical concepts (which, by the way, should never be mixed with vague prose claims about the language), violating my reader context model, preventing me from understanding what problem exactly your language design is solving (where is that problem stated clearly?), it is a waste of my time.

> The work took 3 weeks in total ... it's worth a look, and I hope it will win some converts, and that someone will want to help me with its development.

You've gone too fast, too much is vague, nothing is clear.

I'd delete everything, start over, and try and explain just one of the ideas clearly. Seriously. This sounds harsh, but it's honestly the correct approach to something as subtle and nuanced as programming language design.

11 days ago by VoidWarranty

This reads to me like someone's mania project. I wish OP the best and hope they can get some rest.

11 days ago by skulk

> Your audience, or whoever you aim your work at, should be treated with respect.

I just want to amplify this point. As I was reading this, the LLMisms kept jumping out at me and each one felt like the author looking at me and deciding that my time spent reading this prose wasn't actually worth anything to them.

OP: I want YOUR thoughts, not the next token predictions of a gigantic pile of matrix multiplications. I want your awkward sentences, grammar mistakes, half-baked thoughts, self-doubt, silly jokes. I don't want this pile of grandiose mechanical slop completely devoid of humanity.

11 days ago by solomonb

Personally I don't want to read the codebase AND book of someone 3 weeks into a mania focused on a subject it is unclear they have any prior experience with. Its disrespectful for someone to think they can produce something worthy of consuming another human's time under those constraints.

9 days ago by dkersten

> You've gone too fast, too much is vague, nothing is clear.

Contrast to when Clojure was released: Rich Hickey had spent years thinking about, researching, and refining the concepts. It was easy to understand what the language is. And it shows in the design quality as even now, almost two decades later, the language has changed surprisingly little and is still really good.

11 days ago by nathan_compton

I have to second this. I find the AI written documentation extremely loathsome, hard to read, and somehow both pretentious and lazy.

Please, I beg everyone, stop posting AI slop.

11 days ago by TimorousBestie

Regrettably, the beatings are going to continue until morale improves.

11 days ago by jrmg

Just a comment: this sounds a lot like when someone I knew mildly succumbed to AI psychosis, and thought he, with Gemini, had made some physics/metaphysics breakthrough. If you’re losing sleep and feeling distressed or euphoric, maybe lay off for a few days, no matter how hard that is. Talk to friends and/or family about unimportant things. Get outside for a while. Go back to old hobbies (reading, hiking, just going to coffee shops or thrift stores - whatever) and then reassess.

This language looks interesting, but I don’t understand the concepts. Does this stuff make sense to other people?

The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address.

What is ‘Λ₂₄’? What is a ‘lattice point’?

giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move

This kind of sounds like you’re saying that there’s nothing to free, which implies that nothing takes up memory, which I presume is not the case. Do you mean everything is immutable and content-addressed (like Git)? Doesn’t stuff still need to be freed somehow when the programs done with it, otherwise memory will grow for ever?

11 days ago by leecommamichael

Agreed. Everything is a weird mixture of poetry and mathematics jargon. Basically every page of the book contains some esotericism which makes empty claims. It's completely divorced from reality.

11 days ago by dirkt

> Does this stuff make sense to other people?

Nope, and I actually learned about application of category theory to programming language in university.

I tried to get an idea about the main points, and then stumbled over

> a thing is what you can observe of it. > > [...] > > Content addressing is extensionality made physical (chapter 11): two values indistinguishable by observation are not merely equal, they are the same slot

That only works in a category because you have enough (a countably or uncountably infinite number) functions that you can compose and "test" so you don't need (or don't care) about the "value" itself.

But on a real computer that doesn't work, because you can't go beyond a countable number, and even then you run into the halting problem pretty soon. So equality in this model is not computable. Which is sort of bad if you want to somehow store values "in the same slot" just based on observability. It might work for string literals, and even for concatenated strings, but not in general.

Picking some random lattice (a lattice is a partially ordered structure with some extra conditions) as a base of addressing doesn't help...

So yes, crackpot AI slop. The words sort of make sense, but there's nothing solid behind it, and as soon as you look at details it falls apart.

10 days ago by amenn

I am using flat memory comparisons (memcmp)

11 days ago by canyp

I didn't even get that far; I found the syntax annoying.

11 days ago by esafak

https://en.wikipedia.org/wiki/Leech_lattice

11 days ago by jrmg

Maybe I just don’t have the mathematics knowledge to understand it, but that doesn’t really tell me how you could represent one in memory, or use one as a backing store for a hash-addressed data structure.

11 days ago by danieltanfh95

There is nothing physics/metaphysics about this. If you don’t understand the terms, don’t pretend you do and write slop as a comment, it is really not that different from using LLM to generate slop.

11 days ago by itishappy

The parent comment is not suggesting that Yon is about physics/metaphysics.

Understanding is important for readers. Demonstrating understanding is important for writers of both technical documentation and internet comments, and of critical importance in the era of AI.

11 days ago by danieltanfh95

Understanding goes both ways. OP was just sharing something they thought was interesting. The Ted Chiang piece was horribly written logically and yet it was "written well" in prose. We should look past the writing and learn (if any) the interesting parts.

11 days ago by ModernMech

"If you don’t understand the terms, don’t pretend you do"

The comment you're replying to explicitly says "This language looks interesting, but I don’t understand the concepts." so I'm not sure what you're trying to say. Their note about physics/metaphysics was about "someone [they] knew", not TFA.

11 days ago by danieltanfh95

Then why even insinuate that these are similar? It's just using it to heavily suggest it is crankery.

11 days ago by nvme0n1p1

What if it's pure nonsense, therefore impossible for anyone to understand. Does that mean all criticism is "slop" and nobody's allowed to comment on it?

11 days ago by danieltanfh95

This is multiple logical fallacies in one comment and definitely a comment I would mark in the "pure nonsense" bin. Not all criticism is slop, but anything ad hominem (personal attacks), argumentum ad populum (appeal to popularity), or argumentum ad verecundiam (appeal to authority) is not useful.

11 days ago by cjs_ac

The documentation is a work of art. Every time I try to work out what just one of the unexplained ideas is, it just introduces new unexplained ideas. I don't know where these ideas came from, how they fit together, or why putting them together is useful. I certainly don't know why I would want to write a program in this language, as opposed to any other language I already know.

11 days ago by skrebbel

Sibling comment suggests maybe it’s AI psychosis and that would clarify a lot.

11 days ago by ModernMech

Reminds me a lot of Urbit docs in that sense.

11 days ago by Chinjut

I have a PhD in category theory and know what the Leech lattice is and I still don't understand what is going on here. What is the value of using the Leech lattice to store memory?

10 days ago by amenn

[flagged]

11 days ago by esafak

Could you name a few languages you had in mind while developing this, their respective problems, and how your language improves them, feature by feature?

> Yon allocates into xleech2, a content-addressed heap whose geometry is the Leech lattice Λ24: exactly 196,560 slots per heap.

What is the computational complexity of memory allocation into this Leech lattice? What applications did you have in mind where making allocation a maths problem in order to save time on comparisons makes sense? What is going to happen when a program exhausts your little heap?

11 days ago by jrmg

I have trouble with the idea that these lattice structures could be less computationally complex or less likely to collide than a good simple hash table. I guess they could be more guaranteed to have stable access times?

The more I try to understand, the more it appears that they are a hash table (hash-addressed-structure to be pedantic), but with way more complicated backing than a hash table.

10 days ago by amenn

[dead]

11 days ago by leecommamichael

I found this page, https://yon-lang.org/book/coming-from

It's such a weird mixture of poetry and math that it's hard to tell what's going on. I suspect the author does not speak English as a first language (or at all?) and has used an LLM to generate this stuff.

11 days ago by iterateoften

I noticed since 5.5 GPT has been adding "lattice" to a lot of things. Not sure if it is the new Gremlins.

11 days ago by canyp

> Content addressing is extensionality made physical (chapter 11)

Actually, that's in chapter 12; 11 is the standard library. Maybe the LLM got confused because the chapters are 0-indexed.

I was curious about that topic but it seems over my head. I don't think it works outside of mathematics? In programming, one can have two objects that are identical in both structure and value but have different identities. It's why lisp has eq, eql, equal, etc. How'd you get around that other than adding an identity property?

Also:

> A handle, what your variables actually hold for strings, sections, lists, trees, is that slot index, carried as an f64

Why does the handle need floating point?

11 days ago by TimorousBestie

> Why does the handle need floating point?

I don’t know if Yon does this (the documentation is gibberish) but it’s possible to use f64 NaNs to hold convenient metadata. I had a professor who wrote a bespoke teaching language (roughly based on Scheme) that did that.

11 days ago by ModernMech

Here's an implementation of such: https://docs.rs/nanval/latest/nanval/

11 days ago by canyp

I still don't get what is the advantage over an unsigned integer. Yes, fp64 has unused bits. But why are you going to involve the FPU at all when a uint64 does the trick as well? Plus with a uint64 you get all the flexibility of what bits to dedicate to the address vs metadata.

Edit: I guess one advantage is that, if we later treat the handle like a pointer, NaN math gets you NaN again, whereas the uint64 math might get you an invalid address, or you'd need extra logic to check that the uint64 is not a valid handle?

10 days ago by amenn

[dead]

11 days ago by KnuthIsGod

I know category theory and the Leech lattice.

I assure you brethren, that this project is unmitigated AI derived fresh organic faeculent material.

A pile of steaming sh*t like our esteemed elders used to say.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.