I'm actually really excited for this!
I noticed recently there weren't any good open source hardware projects for voice assistants with a focus on privacy. There's another project I've been thinking about where I think the privacy aspect is Important, and figuring out a good hardware stack has been a Process. The project I want to work on isn't exactly a voice assistant, but same ultimate hardware requirements
Something I'm kinda curious about: it sounds like they're planning on a sorta batch manufacturing by resellers type of model. Which I guess is pretty standard for hardware sales. But why not do a sorta "group buy" approach? I guess there's nothing stopping it from happening in conjunction
I've had an idea floating around for a site that enables group buys for open source hardware (or 3d printed items), that also acts like or integrates with github wrt forking/remixing
I'm also very excited. I've had some ESP32 microphones before, but they were not really able to understand the wake word, sometimes even when it was quiet and you were sitting next to the mic.
This one looks like it can recognize your voice very well, even when music is playing.
Because... when it works, it's amazing. You get that Star Trek wake word (KHUM-PUTER!), you can connect your favorite LLM to it (ChatGPT, Claude Sonnet, Ollama), you can control your home automation with it and it's as private as you want.
I ordered two of these, if they are great, I will order two more. I've been waiting for this product for years, it's hopefully finally here.
As a side note, it always slightly puzzles me when I see "voice interface" and "private" used together. Maybe it takes living alone to issue voice commands and feel some privacy.
(Yes, I do understand that "privacy" here is mostly about not sending it for processing to third parties.)
Private meaning that a big American corporation is not listening and using my voice to either track me or teach their own AI service with it.
> Yes, I do understand that "privacy" here is mostly about not sending it for processing to third parties.
Then why does it puzzle you?
> Maybe it takes living alone to issue voice commands and feel some privacy
Perhaps your definition of "private" is more stringent than most people's. Collective privacy exists, for example "The family would appreciate some privacy as they grieve". It is correct to term something "private" when it is shared with your entire household, but no one else.
I don't like these interaces because unless they are button activated or something, they must be always listening and sending sound from where you are to a 3rd party server. No thanks. Of course this could be happening with my phone, but at least it have to be a malicious action to record me 24/7
I'm trying to understand. Is there an SDK I can use to enhance this? Or is this a package product?
I'm really hoping it's the former. But I don't see any information about how to develop with this.
Yep, ESPHome SDK. It's all open source and well-documented:
Some notable blog posts, docs and a video on the wake words and voice assistant usage:
https://community.home-assistant.io/t/on-device-wake-word-on...
https://esphome.io/components/voice_assistant.html
https://www.home-assistant.io/voice_control/create_wake_word...
A group buy for an existing product makes sense. Want to buy a 24TB Western Digital hard drive? Itās $350. But if you and your 1000 closest friends get together the price can be $275.
But for a first time unknown product? You get a lot fewer interested parties. Lots of people want to wait for tech reviews and blog posts before committing to it. And group buys being the only way to get them means availability will be inconsistent for the foreseeable future. I donāt want one voice assistant. I want 5-20, one for every space in my house. But I am not prepared to commit to 20 devices of a first run and I am not prepared to buy one and hope Iāll get the opportunity to buy more later if it doesnāt flop. Stability of the supply chain is an important signal to consumers that the device wonāt be abandoned.
> But for a first time unknown product? You get a lot fewer interested parties. Lots of people want to wait for tech reviews and blog posts before committing to it.
I used to think so too. But then Kickstarter proved that actually, as long as you have a good advertising style, communicate well, and get lucky, you can get people to contribute literal millions for a product that hasn't even reached the blueprints stage yet.
[flagged]
Kickstarter isn't a group buy.
> I am not prepared to buy one and hope Iāll get the opportunity to buy more later
As long as this thing works and there's demand for it, I doubt we'll ever run out of people willing to connect an XU316 and some mics to an ESP32-S3 and sell it to you with HA's open source firmware flashed to it, whether or not HA themselves are still willing to.
I agree! I mean, just look at the market for Meshtastic devices! So many options! Or devices with WLED pre-installed! It'll take a Lot for Esp32 to go out of style
There are two types of "group buy". The one that you illustrated, but also one not only focused on saving bucks but also helping small, independent makers/producers to sell their usually more sustainable or more private product (which is also usually more expensive due to the lack of economies of scale).
Kickstarter shows that a lot of people feel different.
Kickstarter isnāt a group buy. Similar, but not the same.
I invested in Mycroft and it flopped. Hereās hoping some others can go where they couldnāt.
I think Mycroft was unfortunately just ahead of its time. STT was just becoming good enough but NLU wasnāt quite there yet. Add in youāre up against Apple Google and Amazon who were able to add integrations like music and subsidize the crap out of their products.
I just think this time around is different. Open Whisper gives them amazing STT and LLMs can far more easily be adapted for the NLU portion. The hardware is also dirt cheap which makes it better suited to a narrow use case.
I guess the difference here is that HA has a huge community already. I believe the estimate was around 250k installations running actively. I suspect a huge chunk of the HA users venn diagram slice fits within the voice users slice.
Our estimates are more than a million active instances https://analytics.home-assistant.io/
IIRC one of the main devs behind this device came from Mycroft.
Yep, Mike Hansen was on the live stream launching the new device. He also notably created Rhasspy [1], which is open-source voice assistant software for Raspberry Pi (when connected to a microphone and speaker).
OP's username checks out.
I believe Mycroft was killed in part due to a patent troll:
https://www.theregister.com/AMP/2023/02/13/linux_ai_assistan...
Hopefully the troll is no longer around
I think another part is that there is a failure mechanism on their boards that was recently identified: https://community.openconversational.ai/t/sj-201-sj201-failu...
The short version, from the post, is that there are 4 capacitors that are only rated for 6.3v, but the power supply is 12v. Eventually one of these capacitors will fail, causing the board to stop working entirely.
It would be hard for a company to stay in business when they are fighting a patent troll lawsuit and having to handle returns on every device they sold through kickstarter.
Your idea about group buys is really intriguing. I wonder if the community might organically set something like that up once thereās enough interest
We need more projects like home assistant. I started using it recently and was amazed. They sell their own hardware but the whole setup is designed to works on any other hardware. There are detailed docs for installation on your own hardware. And, it works amazingly well.
Same for their voice assistant. You can but their hardware and get started right away or you can place your own mics and speakers around home and it will still work. You can but your own beefy hardware and run your own LLM.
The possibilities with home assistant are endless. Thanks to this community for breaking the barriers created by big tech
I am working on automation of phones (open source) - https://github.com/BandarLabs/clickclickclick
I haven't been able to quite get the Llama vision models working but I suppose with new releases in future, it should work as good as Gemini in finding bounding boxes of UI elements.
Itās a great project overall, but Iāve been frustrated by how anti-engineer it has been trending.
Install the Node-RED add on. I use that to do the tricky stuff.
Install the whole thing on top of stock Debian "supervised" then you get a full OS to use.
You get a fully integrated MQTT broker with full provisioning - you don't need a webby API - you have an IoT one instead!
This is a madly fast moving project with a lot of different audiences. You still have loads of choice all tied up in the web interface.
+1 on installing supervised on stock debian. It feels like any other software and I still get to keep full control of my system.
Iām currently running, HA, Frigate and pihole on same machine
Or the Digital alchemy addon. Let's you write your automations using typescript
Do you mean the move away from YAML first configs?
I was originally somewhat frustrated, but overall, it's much better (let's be honest, YAML sucks) and more user friendly (by that I mean having a form with pre-filled fields is easier than having to copy paste YAML).
Yes, config is a major part of it. But also a lack of good APIs, very poor dev documentation, not great logging. A general ātake it or leave itā attitude, not interesting in enabling engineers to build.
It's worse though when you need to add a ton of custom sensors at once, e.g., for properly automating a Solar PV + Battery solution.
Oh thank got. Just started using HA few months ago and all these yaml is so confusing when I try to code it with ChatGPT , constant syntax or some other random errors.
How so?
Im a different user- but I can say Iāve been frustrated with their refusal to support OIDC/oauth/literally any standard login system. There is a very long thread on their forums documenting the many attempts for people to contribute this feature.[0] The devs simply shut it down every time, with little to no explanation.
I run many self hosted applications on my local network. Homeassistant is the only one Iām running that has its own dedicated login. Everything else Iām using has OIDC support, or I can at least unobtrusively stick a reverse proxy in front to require OIDC login.
[0] https://community.home-assistant.io/t/open-letter-for-improv...
Edit: things like this [1] donāt help either. Where one of the HA devs threatens to relicense a dependency so that NixOS canāt use it, becauseā¦ he doesnāt want them to? The license permits them to. Seemed very against the spirit of open source to me.
> We need more projects like home assistant
Isn't openHAB an existing popular alternative?
HA long ago blew past OpenHAB in functionality and community.
Unless you have a hard-on for JVM services, HA is the better XP these days.
When I was evaluating both projects about 5 years ago, I went with openHAB because they had native apps with native controls (and thus nicer design imo). At the time, HA was still deep in YML config files and needed validation before saving etc etc. Not great UX.
Nowadays, HA has more of the features I would want and other external projects exist to create your own dashboards that take advantage of native controls.
Today Iām using Homey because Iām still a sucker for design and UX after a long day of coding boring admin panels in the day job, but I think in another few years when the hardware starts to show its age that I will move to home assistant. Hell, there exists an integration to bring HA devices into Homey but that would require running two hubs and potentially duplicating functionality. We shall see.
> HA long ago blew past OpenHAB in [...] community.
Home Assistant seems insurmountable to beat at that specific metric, seems to be the single biggest project in terms of contributions from a wide community. Makes sense, Home Assistant tries to do a lot of things, and succeeds at many of them.
I think they meant "projects with a culture and mindset like homeassistant", not just a competitor to the existing project.
Completely agree! Home Assistant feels like a breath of fresh air in a space dominated by big tech's walled gardens.
It's too bad it's sold out everywhere. I've tried the ESP32 projects (little cube guy) for voice assistants in HA but it's mic/speaker weren't good enough. When it did hear me (and I heard it) it did an amazing job. For the first time I talked to a voice assistant that understood "Turn off office lights" to mean "Turn off all the lights in the office" without me giving it any special grouping (like I have to do in Alexa and then it randomly breaks). It handled a ton of requests that are easy for any human but Alexa/Siri trip up on.
I cannot wait to buy 5 or more of these to replace Alexa. HA is the brain of my house and up till now Alexa provided the best hardware to interact with HA (IMHO) but I'd love something first-party.
I'm definitely buying one for robotics, having a dedicated unit for both STT and TTS that actually works and integrates well would make a lot of social robots more usable and far easier to set up and maintain. Hopefully there's a ROS driver for it eventually too.
How did you find it for music tasks?
I didnāt test that. I normally just manually play through my Sonos speaker groups on my phone. I donāt like the sound from the Echos so Iām not in the habit of asking them to do anything related to music.
Right now I only use Alexa for smart house control and setting timers
If it's possible for the hardware to facilitate a use case, the employees working on the product will try to push the limits as far as they possibly can in order to manufacture interesting and challenging problems that will get them higher performance ratings and promotions. They will rationalize away privacy violations by appealing to their "good intentions" and their amazing ability to protect information from nefarious actors. In their minds they are working for "the good guys" who will surely "do the right thing."
At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.
As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.
So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.
Open source
Yeah, OP is comparing this to Google/Amazon/Apple/etc devices but this is being developed by the nonprofit that manages development on Home Assistant and in cooperation with their large community of users. It's a very different attitude driving development of voice remotes for Home Assistant vs. large corporations. They've been around for a while now and have a proven track record of being actual, serious advocates for data privacy and user autonomy. Maybe they won't be forever, but then this thing is open source.
The whole point is that you control what these things do, and that you can run these things fully locally if you want with no internet access, and run your own custom software on them if that's what you want to do. This is a product for the Home Assistant community that will probably never turn much of a profit, nor do I expect it is intended to.
> Yeah, OP is comparing this to Google/Amazon/Apple/etc devices
Thanks; it seems I actually needed to spell that out in my post.
That's a pretty timely release considering Alexa and the Google assistant devices seem to have plateaued or are on the decline.
Curious what you mean by that.
For me the Alexa devices I own have gotten worse. Can't do simple things (setting a timer used to be instant, now it takes 10-15 seconds of thinking assuming it heard properly), playing music is a joke (will try to play through Deezer even though I disaled that integration months ago, and then will default to Amazon Music instead of Spotify which is set as the default).
And then even simple skills can't understand what I'm asking 60% of the time. The first maybe 2 years after launch it seemed like everything worked pretty good but since then it's been a frustrating decline.
Currently they are relagated to timers and music, and it can't even manage those half the time anymore.
It is, I think, a common feeling among Echo/Alexa users. Now that people are getting used to the amazing understanding capabilities of ChatGPT and the likes, it probably increases the frustration level because you get a hint of how good it could be.
I believe it boils down to two main issues:
- The narrow AI systems used for intent inference have not scaled with the product features.
- Amazon is stuck and can't significantly improve it using general AI due to costs.
The first point is that the speech-to-intent algorithms currently in production are quite basic, likely based on the state of the art from 2013. Initially, there were few features available, so the device was fairly effective at inferring what you wanted from a limited set of possibilities. Over time, Amazon introduced more and more features to choose from, but the devices didn't get any smarter. As a result, mismatches between actual intent and inferred intent became more common, giving the impression that the device is getting dumber. In truth, itās probably getting somewhat smarter, but not enough to compensate for the increasing complexity over time.
The second point is that, clearly, it would be relatively straightforward to create a much smarter Alexa: simply delegate the intent detection to an LLM. However, Amazon canāt do that. By 2019, there were already over 100 million Alexa devices in circulation, and itās reasonable to assume that number has at least doubled by now. These devices are likely sold at a low margin, and the service is free. If you start requiring GPUs to process millions of daily requests, you would need an enormous, costly infrastructure, which is probably impossible to justify financiallyāand perhaps even infeasible given the sheer scale of the product.
My prediction is that Amazon cannot save the product, and it will die a slow death. It will probably keep working for years but will likely be relegated by most users to a "dumb" device capable of little more than setting alarms, timers, and providing weather reports.
If you want Jarvis-like intelligence to control your home automation system, the vision of a local assistant using local AI on an efficient GPU, as presented by HA, is the one with the most chance of succeeding. Beyond the privacy benefits of processing everything locally, the primary reason this approach may become common is that it scales linearly with the installation.
If you had a cloud-based solution using Echo-like devices, the problem is that youād need to scale your cloud infrastructure as you sell more devices. If the service is good, this could become a major challenge. In contrast, if you sell an expensive box with an integrated GPU that does everything locally, you deploy the infrastructure as you sell the product. This eliminates scaling issues and the risks of growing too fast.
Thatās interesting because I have a bunch of Echos of various types in my house and my timers and answers are instant. Is it possible your internet connection is wonky or you have a slow DNS server or congested Wi-Fi? I donāt have the absolute newest devices but the one in my bedroom is the very original Echo that I got during their preview stage, the one in my kitchen is the Echo Show 7ā and I have a bunch of puck ones and spherical ones (donāt remember the generations) around the house. One did die at one point after years of use and got replaced but it was in my kids room so I suspect it was subject to some abuse.
Amazon also fired a large number of people from the Alexa team last year. I don't really think Alexa is a major priority for Amazon at this point.
I don't blame them, sure there are millions of devices out there, but some people might own five device. So there aren't as many users as there are devices and they aren't making them any money once bought, not like the Kindle.
Frankly I know shockingly few people who uses Siri/Alexa/Google Assistant/Bixby. It's not that voice assistants don't have a use, be it is a much much small use case than initially envisioned and there's no longer the money to found the development, the funds went into blockchain and LLMs. Partly the decline is because it's not as natural an interface as we expected, secondly: to be actually useful, the assistants need access to control things that we may not be comfortable with, or which may pose a liability to the manufacturers.
That aligns with some of the frustration Iāve heard from others. Itās surprising (and disappointing) how these platforms, which seemed to have so much potential early on, have started to feel more like a liability
GH is basically abandonware at this stage it seems. They just seem to break random things, and there havenāt been any major updates / features for ages (and Gemini is still a way off for most).
Google Home's Nest integration is recent and top-notch though.
Hopefully in a year they'll have rolled out the Gemini integration and things will be back on track.
I was an early adopter of google home, have had several generations (including the latest). I quite like the devices, but the voice recognition seems to be getting worse not better. And the Pandora integration crashes frequently.
In addition, it's a moron. I'm not sure it's actually gotten dumber, but in the age of chatgpt, asking google assistant for information is worse than asking my 2nd grader. Maybe it will be able to quote part of a relevant web page, but half the time it screws that up. I just want it to convert my voice to text, submit it to chatgpt or claude, and read the response back to me.
All that said, the audio quality is good and it shows pictures of my kid when idle. If they suddenly disappeared I would replace them.
On the Google side it's become basically useless for anything beyond interacting with local devices and setting timers and reminders (in other words, the things that FOSS should be able to do very easily). Its only edge over other options used to be answering questions quickly without having to pull out a screen, but now it refuses to answer anything (likely because Google Search has removed their old quick answers in favor of Gemini answers).
I don't fully understand the cloud upsell. I have a beefy GPU. I would like to run the "more advanced" models locally.
By "I don't fully understand," I mean just that. There's a lot of marketing copy, but there's a lot I'd like to understand better before plopping down $$$ for a unit. The answers might be reasonable.
Ideally, I'd be able to experiment with a headset first, and if it works well, upgrade to the $59 unit.
I'd love to just have a README, with a getting started tutorial, play, and then upgrade if it does what I want.
Again: None of this is a complaint. I assume much of this is coming once we're past preview addition, or is perhaps there and my search skills are failing me.
You can do exactly that - set up an Assist pipeline that glues together services running wherever you want, including a GPU node for faster-whisper. The HA interface even has a screen where you can test your pipeline with your computerās microphone.
Itās not exactly batteries-included, and doesnāt exercise the on-device wake word detection that satellite hardware would provide, but itās doable.
But I donāt know that the unit will be an āupgradeā over most headsets. These devices are designed to be cheap, low-power, and have to function in tougher scenarios than speaking directly into a boom mic.
It's an upgrade mostly because putting on a headset to talk to an assistant means it's not worth using the assistant.
Does it use Node-RED for the pipeline?
No, all of the voice parts are either inbuilt or direct addons.
Finding microphones that look nice, can pick up voice at high enough quality to extract commands and that cover an entire room is surprisingly hard.
If this device delivers on audio quality it's totally worth it at $59.
I've found it quite hard to find decent hardware with both the input capability needed for wakeword and audio capture at a distance, whilst also having decent speaker quality for music playback.
I started using the Box-3 with heywillow which did amazing input and processing using ML on my GPU, but the speaker is aweful. I build a speaker of my own using a raspberry pi Z2W, dac and some speakers in a 3d printed enclosure I designed, and added a shim to the server so that responses came from my speaker rather than the cheap/tiny speaker in the box-3. I'll likely do the same now with the Voice PE, but I'm hoping that the grove connector can be used to plonk it on top of a higher quality speaker unit and make it into a proper music player too.
As soon as I have it in my hands, I intend to get straight to work looking at a way to modify my speaker design to become an addon "module" for the PE.
100%. For a lot of users that have WAF and time available to contend with, this is a steal.
Bear in mind that a $50 google home or Alexa mini(?) is always going to be whatever google deem it to be. This is an open device which can be whatever you want it to be. Thatās a lot of value in my eyes.
In many cases the issue isn't the microphone but the horrid amount of reflections that the sound produces before reaching it. A quite good microphone can be built using cheap, yet very clean, capsules like the AOM-5024L-HD-F-R (80 dB s/n) which is ~$3 at Mouser, but room acoustics is a lot more important and also a real pain in the ass when also not a bank account drain if done professionally, although usually carpets, wood furniture, curtains to cover glass and sound panels on concrete walls can be more than enough.
This device is just the mic/speaker/wakeword part. It connects to home-assistant to do the decoding and automation. You can test it right now by downloading home-assistant and running it on a pi or a VM. You can run all the voice assist stuff locally if you want. There are services for the voice to text, text to voice and what they call intents which are simple things like "turn off the lights in the office". The cloud offering from Nuba Casa, not only funds the development of Home Assistant but also give remote access if you want it. As part of that you can choses to offload some of the voice/text services to their cloud so that if you are just running it on a Pi it will still be fast.
I can't speak to home assistant specifically, but the last time I looked at voice models, supporting multiple languages and doing it Really Well just happens to require a model with a massive amount of RAM, especially to run at anything resembling real-time.
It's be awesome if they open sourced that model though, or published what models they're using. But I think it unlikely to happen because home assistant is a sorta funnel to nabu casa
That said, from what I can find, it sounds like Assist can be run without the hardware, either with or without the cloud upgrade. So you could definitely use your own hardware, headset, speakers, etc. to play with Assist
shrug whisper seems to do well on my GPU, and faster than realtime.
Found what I was thinking of [1]
Part of my misremembering is I was thinking of smaller/iot usecase which, alongside the 10GB VRAM requirements for the large multilingual model, felt infeasible -shrug-
[1] https://git.acelerex.com/automation/opcua.ts/-/project_membe...
I've been using it to generate subtitles for home movies, for an aging family member who is losing their hearing, and it's phenomenal
One thing that makes me nervous: Home Assistant has an extremely weak security model. There is recent support for admin users, and thatās about it. Iām sort of okay with the users on an installation having effectively unrestricted access to all entities and actions. Iām much less okay with an LLM having this sort of access.
An actually good product in this space IMO needs to be able to define specific sets of actions and allow agents to perform only the permitted actions.
You can already choose which entity to expose to the LLMs
Had to laugh a bit at the caveat about powerful hardware. Was bracing myself for GPU and then it says N100 lol
I mean, comparatively many people are hosting their home Assistant on an raspberry Pi so it is relatively powerful :D
And the CM5 is nearly equivalent in terms of the small models you run. Latency is nearly the same, though you can get a little more fancy if you have an N100 system with more RAM, and "unlocked" thermals (many N100 systems cap the power draw because they don't have the thermal capacity to run the chip at max turbo).
If we're being fair you can more like, walk models, not run them :)
An 125H box may be three times the price of an N100 box, but the power draw is about the same (6W idle, 28W max, with turbo off anyway) and with the Arc iGPU the prompt processing is in the hundreds, so near instant replies to longer queries are doable.
Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.