Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

14 hours ago/178 comments/github.com

I have no affiliation with them but here's what I think happened:

1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).

2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.

4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.

5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.

7 hours ago by throwa356262

Regarding #2

https://news.ycombinator.com/item?id=48529544

6 hours ago by matheusmoreira

I'm honestly impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is probably the last headline I ever expected to read on HN.

3 hours ago by airstrike

Worth reminding everyone that Lua was also created in Rio, though admittedly at PUC rather than by the government.

Rio has a strong engineering talent pool, along with many other major capitals in Brazil

3 hours ago by matheusmoreira

Brazil does have talent. Mauro Carvalho Chehab is a Linux kernel maintainer. Elixir was created by José Valim, a brazilian. I have also created my own programming language.

What Brazil doesn't have is a history of properly rewarding talent, which often causes it to migrate elsewhere. So it's definitely surprising when any sort of technological development happens in Brazil: it implies someone who stayed managed to get something done, most likely for much less than what that something is actually worth, while also being crushed by extremely high taxes that essentially doubles the cost of computer hardware.

3 hours ago by undefined

[deleted]

5 hours ago by cscheid

Yes! That "prefeitura do Rio" huggingface URL is definitely shocking to read to this Brazilian as well (I'm assuming you and parent also are from your usernames).

3 hours ago by Aurornis

> 2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.

If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.

2 hours ago by smus

What do you mean World Cup debut? haven't they won 5?

an hour ago by alxndresp

They meant their first, opening game of this current World Cup tournament

9 hours ago by hintymad

> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

8 hours ago by Aurornis

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Enhanced it on a couple benchmarks, supposedly.

The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.

This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.

8 hours ago by andai

They seem to have deleted most of the README now, but the archived version has benchmarks.

https://web.archive.org/web/20260614082641/https://huggingfa...

And the Nex benchmarks for comparison

https://huggingface.co/nex-agi/Nex-N2-Pro

Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?

5 hours ago by monster_truck

I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.

15 minutes ago by tredre3

That is something often claimed by heretics. My experience couldn't diverge more, however. All heretic (and abliterix) models I've tried are worse than the original. It's not immediately obvious if all you do is ask 2-3 questions and marvel at how it didn't refuse, but try using them for real over longer 8k+ contexts and it falls apart real fast.

They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).

I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.

4 hours ago by Aurornis

Abliterarion is a brute force technique that removes or silences parts of the model. It reduces performance because the abliterated elements aren’t perfectly isolated to censorship so other aspects suffer.

Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.

5 hours ago by manquer

> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it

i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.

The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .

The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.

8 hours ago by x312

This works because Nex itself is a finetune of Qwen3.5 (https://huggingface.co/nex-agi/Nex-N2-Pro). It's merging Qwen3.5 with a Qwen3.5 finetune.

I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.

7 hours ago by hashmap

not this exact thing, no, because the functional circuits dont appear in the same places across models. but if you find where they are you can do something like branch between some of the middle functional circuits between models and it kinda just works, or even do one after the other. you cant just like swap any two layers cause a bunch of em bend hyperbolic curvature to do hierarchical stuff deep in the poincare ball and the geometries get all bonkers, but before and after they do that things are relatively flat, and the geometries are more or less transferrable up to rigid rotation if they're each trained on large enough data.

7 hours ago by oofbey

Correct. We used to think that because NN optimization is non-convex there are all these local minima. Now we know that once you get past the very early parts of training from random init, the loss surface is fairly smooth, and not really convex, but close enough in a bunch of ways - linear combinations of trained models are pretty much always valid combinations. You can think of fine tunings as deltas on the original model which can be summed together successfully. I think this paper first showed that to me: https://arxiv.org/pdf/1802.10026 which was 8 years ago now.

9 hours ago by woadwarrior01

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

8 hours ago by kolanos

This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.

5 hours ago by tarruda

What I find fascinating is the idea that there might be a set of "secret" tweaks that when applied to those weights (or even smaller models) could result in an intelligence simulation that could vastly surpass even something like Fable.

11 hours ago by unrvl22

The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.

9 hours ago by DonsDiscountGas

I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)

8 hours ago by bwhitty

As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482

6 hours ago by nightpool

it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.

10 hours ago by Lucasoato

So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?

9 hours ago by Aurornis

The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.

Then researchers looked at the weights and there is no post training at all.

They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.

8 hours ago by serial_dev

I’d believe they accidentally uploaded the wrong files if they uploaded the correct ones. To state that they accidentally uploaded something else and then not upload the correct version means they probably do not have anything and either hope people forget about this or they are scrambling to have something that is at least close to their original claim.

10 hours ago by undefined

[deleted]

10 hours ago by clear-octopus

[dead]

10 hours ago by zinodaur

Oh no, someone is profiting off of their work without proper attribution!?!?

9 hours ago by Aurornis

This is an open weights model based on other open weights models.

The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.

The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.

Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.

9 hours ago by moritzwarhier

Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.

But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.

9 hours ago by iknowstuff

How do they just splice two models together?

9 hours ago by Aurornis

The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.

In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.

But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.

9 hours ago by ninja3925

Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.

an hour ago by s1artibartfast

How do you feel about the government or government contractors saying they did a bunch of work when they did nothing instead?

10 hours ago by internet2000

Attribution isn't the relevant part. Lying about your lab's capabilities is.

10 hours ago by Planktonne

That's also something all the AI companies have been doing.

10 hours ago by dofm

Lying about model capability is right now the lingua franca of the cloud AI business model, almost; they yes-and each other's lies because they are in a position of needing to generate interest, including going as far as needing to trigger regulatory capture.

(It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).

8 hours ago by low_tech_love

They’re using public money to “train” this.

8 hours ago by vips7L

Sounds like the whole AI movement.

8 hours ago by themafia

It seems to me like the lies are both for the same reason. To capture attention and profits that are not deserved.

10 hours ago by outside2344

But the whole game is lying and stealing isn't it?

10 hours ago by undefined

[deleted]

9 hours ago by jordz

Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).

9 hours ago by calebkaiser

This is a good starting point: https://huggingface.co/docs/peft/developer_guides/model_merg...

But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.

I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.

8 hours ago by undefined

[deleted]

10 hours ago by fkozlowski

I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?

9 hours ago by Havoc

Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it

8 hours ago by axus

I like the [dead] comment theory that they proposed a huge LLM training budget to the government, kept most of the money, and released a cheap merge to justify the grift.

5 hours ago by dormento

This would be so very brazilian of them.

Source: am Huelander.

6 hours ago by seba_dos1

It's kinda weird to claim extraordinary results in such case though, as that brings a lot of eyes to it.

5 hours ago by mgambati

Nothing weird. The mayor wanted something brag about. That Rio, my friend.

3 hours ago by fkozlowski

Ah that makes sense

6 hours ago by matheusmoreira

That's essentially Brazil's standard operating procedure. Wouldn't be surprising if that turned out to be the case.

Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.

4 hours ago by aaronbrethorst

They really missed out by not calling it Neuromancer.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack