There's been a rush of releases of reasoning models in the past couple of weeks. This one looks interesting, too.
I found the following video from Sam Witteveen to be a useful introduction to a few of those models:
In what way did they "release" this? I can't find it in hugging face or ollama, and they only seem to have a "try online" link in the article. "Self-sovereign intelligence", indeed.
They released it in the same sense OpenAI released GPT4. There is an online demo you can chat with, and a form to get in touch with sales to get API access
they didn't
"what is the population of manhattan below central park"
ChatGPT-o1-preview: 647,000 (based on 2023 data, breaking it down by community board area): https://chatgpt.com/share/674b3f5b-29c4-8007-b1b6-5e0a4aeaf0... (this appears to be the most correct, judging from census data)
DeepThought-8B: 200,000 (based on 2020 census data) Claude: 300-350,000 Gemini: 2.7M during peak times (strange definition of population !)
I followed up with DeepThought-8B: "what is the population of all of manhattan, and how does that square with only having 200,000 below CP" and it cut off its answer, but in the reasoning box it updated its guess to 400,000 by estimating as a fraction of land area.
Legally, you cannot name the llama3 based models like that, YOu have to use, llama in the name
https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blo...
Too bad :)
Facebook trained the model on an Internet's worth of copyrighted material without any regard for licenses whatsoever - even if model weights are copyrightable, which is an open question, you're doing the exact same thing they did. Probably not a bulletproof legal defense though.
Can't wait until Meta sue them so we can have a judgment on whether or not models weights are subject to copyright.
Model weights are (abstractly speaking) a very intensive, concentrated form of website scraping, yes ?
What does the (USA) law say about scraping ? Does "fair use" play a role ?
Yes, and there have already been court cases that ruled AI training of copyrighted data to be fair use, because it's technically no different than any other form of art, everything is based off of seeing other ideas elsewhere, there are no new ideas anymore.
Am I wrong to think that "reasoning model" is a misleading marketing term?
Isn't it a LLM with an algo wrapper?
Whether you bake the behaviour in or wrap it in an external loop, you need to train/tune the expected behaviour. Generic models can do chain of thought if asked for, but will be worse than the specialised one.
They're not baking anything in. Reasoning, as it is defined by AI marketing departments, is just beam search.
Could you educate me on what is beam search ? Or link a good ressource
EDIT: https://www.width.ai/post/what-is-beam-search
So the wider the beam, the better the outcome?
Yep, no reasoning, just a marketing term to say "more accurate probabilities"
AI marketing departments are fond of anthropomorphic language but it's actually just regular beam search.
The same way they now call "open-source" a completely closed-source binary blob full of copyright infringement.
"reasoning model" means nothing so I don't think it's misleading.
Reasoning means "inference" or "deduction" to me, or at least some process related to first order logic.
The known upper bound for transformers on the fly computation abilities is a complexity class called DLOGTIME-uniform TC^0.
There is a lot to unpack there but if you take FO as being closed under conjunction (â§), negation (ÂŹ) and universal quantification (â); you will find that DLOGTIME-uniform TC^0 is equal to FO+Majority Gates.
So be careful about that distinction.
To help break the above down:
DLOGTIME = Constructible by a RAM or TM in logarithmic time. uniform = Only one circuit for all input sizes, when circuits families are the default convention TC^0: Constant-Depth Threshold Circuits
Even NP == SO-E, the second-order queries where the second-order quantifiers are only existantials.
DLOGTIME-uniform TC^0 is a WAY smaller group than most people realize, but anything that is an algorithm or a program basically is logic, with P being FO + transitive closure or a half a dozen other known mappings.
Transformers can figure out syntax, but if you dig into that dlogtime part, you will see that semantic correctness isn't really an option...thus the need to leverage the pattern matching and finding of pre-training as much as possible.
I asked it "Describe how a device for transportation of living beings would be able to fly while looking like a sphere" and it just never returned an output
I asked it to just count letters in a long word and it never returned an output (been waiting for 30 minutes now)
It isnât pleased you ask it such questions
Blaine is a pain
Given the name they gave it, someone with access should ask it for the âAnswer to the Ultimate Question of Life, The Universe, and Everythingâ
If the answer is anything other than a simple â42â, I will be thoroughly disappointed. (The answer has to be just â42â, not a bunch of text about the Hitchhikers Guide to the Galaxy and all that.)
Deep Thought didn't answer right away either.
âRight awayâ. lol.
The reasoning steps look reasonable and the interface is simple and beautiful, though Deepthought-8b fails to disambiguate the term "the ruliad" as the technical concept from Wolfram physics, from this company's name Ruliad. Maybe that isn't in the training data, because it misunderstood the problem when asked "what is the simplest rule of the ruliad?" and went on to reason about the company's core principles. Cool release, waiting for the next update.
Xd, Gotta love how your first question to a test a model is about a âruliadâ. Itâs not even in my ios dictionary
Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.