- Scan containers and Pods for vulnerabilities or misconfigurations.
- Run containers and Pods with the least privileges possible.
- Use network separation to control the amount of damage a compromise can cause.
- Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.
- Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface.
- Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity.
- Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied.
> and encryption to protect confidentiality
Probably the hardest part about this. Private networks with private domains. Who runs the private CA, updates DNS records, issues certs, revokes, keeps the keys secure, embeds the keychain in every container's cert stack, and enforces validation?
That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete. How many teams are actually going to do this, versus just terminating at the load balancer and everything in the cluster running plaintext?
For as fundamental and important as encryption-in-transit is, it's always baffled me that there isn't a simpler, easier solution to accomplishing it on private networks. Everyone knows its important, and everyone wants to do it, but it's just such a pain in the ass and so prone to error that even some top security leaders will tell you not to bother because it's such a footgun.
We really need something to help make the process simpler, like how Let's Encrypt made public HTTPS so much easier to do for even the smallest of websites.
In some senses itâs differently complex, but WireGuard or similar may be simpler since itâs lower on the OSI and every application gets it âfor freeâ.
If Operating Systems had TLS built into the TCP/IP stack exposed by the kernel/system, you would never need to shim it in anywhere. You would just make a system call and use an open file descriptor/socket. One of the many programming-in-1970s-style things we still have not fixed.
But 1) kernel hackers won't implement it, 2) app devs are too possessive of their stack/codebase to just use one standard implementation/interface, and 3) security people are too paranoid to leave something "so important" up to the OS so they'd rather everyone implement it poorly/fragmentedly.
I doubt itâs a coincidence.
We learned recently that for a long time, the primary producers of cryptographic telephones was a single Swiss company. Owned by the CIA.
If security were easy, a lot of intelligence agencies would have a bad day.
Security doesnât have to be this hard. But the powers that be seem to prefer complex, complicated systems, like DNS or SELinux.
It could be easier. Much easier.
The thing is you can use Letâs Encrypt for private networks too. For example, I use a dns challenge to get a wildcard certificate for a sub domain on my personal site, but those domains only resolve in my house. The wildcard cert isnât essential for this - you could get individual ones - but it was easier for my home lab.
Service mesh solutions like Istio / Consul Connect+Vault can help a lot with this.
Depending on the existing size and complexity of your stack those months can be cut down to weeks or even days.
I don't mean to trivialize the time and expertise needed to set up and manage, but if you can afford to run a microservice architecture on k8s already it's definitely not untenable.
Encryption today is pretty much requirement for any regulated businesses and required practice for any sane shop, with or without Kubernetes. The only difference is that those services are communicating within the cluster internal network and not across different machines in the servers vlans.
If anything, setting up the whole things within Kubernetes ecosystem can be much easier with the available operators and automation frameworks like cert manager and/or Istio.
> That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete.
Agree! This is why that "Kubernetes Hardening Guidance" is for NSA, not for startups.
Resource needs aside, keeping basic AppSec/InfoSec hygiene is a strong recommendation. Also there are tons of startups that are trying to provide solutions/services to solve that also. A lot of times, it's worth the money.
This guidance is provided by the NSA, not for the NSA.
I wonder about that. What are the attack vectors within a K8s cluster to necessitate inter-cluster transport encryption?
Who scans the vulnerability scanners? Genuine question. How does the community/ecosystem solve this problem of auditability?
We deal with this by having multiple vulnerability scanners. Product A and Product B both scan your active environment. Product A scans Product B. Product B scans Product A. Additionally, make the vendors of those products sign NDAs so your threat actors, other than insiders, don't necessarily even know who they are. An attacker then needs to not only compromise both, but figure out who they are in the first place.
To this I'd add what is colloquialy referred to as a "Chinese wall", so that even insiders aren't aware of the full picture.
For anyone who hasn't read it:
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...
Are there any people working seriously on this? I'm aware of efforts for OCaml (http://gallium.inria.fr/~scherer/drafts/camlboot.pdf), but that's it.
If your threat profile says you need to audit your vulnerability scanners, you audit your vulnerability scanners. There's not really a problem there right?
NIST also says: if your scanner finds a vulnerability, it's up to you to VALIDATE that it's not a false-positive.
False-positives abound on these scanners.
I've never had to. I wanted feedback from people who have.
that was the issue in the solar winds hack: https://www.npr.org/2021/04/16/985439655/a-worst-nightmare-c...
I am mostly non technical person but why do we need to resort to firewalls etc. if we can employ UNIX like file permission system for network access? Wouldn't it be awesome if we can allow any installed software to contact ONLY whitelisted domains? Of course this excludes web browsers but you get the idea.
How about our mainstream OSes incorporate that kind of permission system similar to what we have in mobile OSes already have today?
It's a fair question and certainly is possible to have firewalls on a per-server basis. We do that for incoming traffic primarily. The catch is if that server itself gets compromised then you can't count on those rules still being enforced.
Having dedicated network appliances acting as firewalls means from a security perspective you need to compromise the local machine and then also compromise a dedicated, hardened external system as well. It vastly ups the difficulty barrier.
Firewalls does a lot more than block ports and services.
Think of them as a defence-in-depth that protect from accidental misconfiguration, software bugs, local exploits, etc.
SELinux
I didn't know that, learnt somthing today, Thank You!
Again, as a non technical person, why a software needs access to entire internet instead of whitelisted domains specific to its requirements is beyond me, since we already know how UNIX permission system works. Is it so hard to extend that to networks? Especially since everything is file in UNIX? Kindly pardon my ignorance :-)
> - Use network separation to control the amount of damage a compromise can cause.
> - Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.
Are we still on this? Why isn't anyone pushing for zero trust? A concept made significantly easier to achieve thanks to container orchestration.
Some useful guidance here, although worth noting that some of it is a bit dated (k8s security can move quickly).
Most notably from a scan through, they're mentioning PodSecurityPolicy, but that's deprecated and scheduled to be removed in 1.25.
There will be an in-tree replacement but it won't work the same way. Out of tree open source options would be things like OPA, Kyverno, jsPolicy, k-rail or Kubewarden.
We've actually already moved the official guidance from PSPs to OPA and that's what the primary DevSecOps reference implementation has used for about two months now.
"We" being the DoD, but our guidance is the NSA guidance. I'm not sure why it hasn't made it into the policy pdf, but the actual official IAC has been using OPA since April.
That's awesome. I know a lot of work is going into things like P1.
I scale some large K8s in fed (not DOD)... ATO is fun. Actually unsure how I'd position something like OPA (actually envisioned them being key back in '17 when working in the Kessel Run relm... called and they hadnt been exposed to fed at the time).
Legit question / maybe dumb - where is DOD at in general perimeter security. Outside looking in & everything before a container runs - network and to OS primarily, cloud envs as well. A lot of Fed needs help here before they can comprehend even a basic Kubernetes authorization. It's also generally more important (at list from controls perspective) in non DOD environments, than something like security context in pods.
P1 has been leading the pack here. Most of the guidance mentioned in this guide has been coming from the CSO's office [0] for a while. We're using OPA extensively for not just container level policies but blocking column/cell level access in queries. We have multiple roles [1] to help Kessel Run, Space CAMP, and other software factories with this.
[0] https://software.af.mil/dsop/documents/ [1] https://boards.greenhouse.io/raft
> Some useful guidance here, although worth noting that some of it is a bit dated.
Is there any digital security guidance from the feds that doesn't apply to? :)
Everybody wants small gov, until they don't.
This is why I think big vs little government is really missing the forest for the trees in a lot of contexts (unless your overall goal is to minimize taxes and regulations at all costs). It's really a debate about the nature of bureaucracy. Process vs nimble. You can organize things to promote either, depending on your actual goals.
Unfortunately small government activists have recognized this and have enacted policies that promote incompetence as much as possible. "Good enough for government work" is a choice, not an inevitability.
In-tree replacement is coming in v1.22...as in, just a few weeks away. It uses admission controllers, just like OPA/Kyverno et al, hence the current guidance to use one of those.
PaaS solutions can't cover everything that PSP was covering though.
Out of curiousity, which bits were you thinking of? OPA, Kyverno et all have policies which (AFAIK) hit all the bits of Kubernetes PSS.
I used to study and focus on security a lot more and keep up with trends. After several interviews this year I realize a lot of jobs prioritize leetcode over everything else. It's pretty annoying and makes me wonder if the focus for tech works is leetcode above all else then no wonder so many companies have insecure apps and servers.
I applied for a job that wanted someone who has experience with SAML. I've actually written my own hobby IDP, and I can diagram the handshake off the top of my head. I've spent a lot of time learning how to write custom decorators to handle access restrictions. I failed my interview because they wanted me to leetcode some shit with 3d geometric volumes. I'm sorry but what does this have to do with SAML or security?
Wow that's dumb. I've done some reading on 3d computational geometry for hobbyist game engine reasons, and in my admittedly limited experience, very few of the algorithms involved are intuitive enough to be derivable in an interview setting.
This sucks -- it is a lose-lose situation. I've seen this kind of thing happen all too often.
You interview them as well. You give me dumb, unrelated coding questions - you are out.
If you can't reverse a doubly linked binary prefix tree in O(1) then how can you be trusted with security?! /s :(
Yes but in most circumstances, quick security is better than linear security; not sure about bubble security though.
Bubble security sounds like a good idea. You know, put everything into its own isolated little bubble.
Best I can do is Bogo security
Consider the typical company is running servers/instances that haven't been updated or rebooted in 6 months to 3 years. Never mind the multiple year old software dependencies in their apps...
You are correct. Especially at big companies, programmers program and security is just some rules dropped on them from above.
You might be playing the long game. I think a CTO might benefit from knowing both app dev and security.
Thanks! Yeah, articles like this I would have studies in greater detail in the past but this year I realize In need to improve my leetcode/algo times so long term I'll keep focused on security and important topics. But in the meantime ... time to zig-zag a binary tree :(
At my company the head of security is also the chief programmer. Not sure if that's a good thing but he's got 30 years experience and likes to tell war stories.
The elephant in the room here is almost all containers according to artifacthub.io, etc are a complete tire fire
The DoD maintains its own registry of hardened container images they call the Iron Bank. I guess they can't issue guidelines to the general public that you should use these, but the DoD has to use them. Which kind of sucks, because they may be hardened, but they also break all the time because the people responsible for hardening them can't possibly understand all the myriad subleties involved in building and deploying software packaged with dependencies in the same way the actual software vendors do. They make some serious rookie mistakes, like just straight copying executables out of a Fedora image into a UBI images, which works perfectly fine when a brand-new UBI release happens and it's on the same glibc as Fedora, then immediately stops working and all your containers break when Fedora updates.
They may suck at building containers, but this also sounds like a release management issue. Both the producers and consumers of the release need a test suite to validate the new artifacts before they can make it into a pipeline to eventually deliver to a customer use case. (But also they should 100% not be copying random binaries)
For what it's worth I've seen worse from corporations. Bad hires lead to bad systems.
I work on Platform one and we use and deploy new versions of these containers weekly and have never had them break in that way. In the Beginning when I was on the Kubernetes team we struggled with the containers just not working at all but they have gotten better.
Now I work on deploying and we run every container from IB and have few issues. If you find them report the images and they will fix them pretty quick.
there are good free/oss container scanners. check out Trivy.âno reason not to use one.
A lot of this applies to containers in general. Not complaining, it's well written but wish they would break out the none kube container stuff into a general container-sec advice for people.
This is a great point. And containers don't even really exist in the first place, so really there should be (at least one of) a family of docs about securing the various namespaces, cgroups etc in modern Linux releases, and a doc about how to secure them in combination with each other.
How do I know that this advice is useful and does not put me in danger?
Example: NSA recommends to use RSA encryption.
https://www.theverge.com/2013/12/20/5231006/nsa-paid-10-mill...
You don't use this guide as a bible but take it into account and compare with other common security advice in the field. If you get similar results it most likely a good list of advice.
This isn't for regular people, this is telling third parties what they need, in order for them to try to sell something to the nsa.
so... a lot of this can be done with Fairwind's OSS tool Polaris... https://github.com/FairwindsOps/polaris
feels good that we've been addressing this for a bit already tbh. (disclaimer, I work for fairwinds)
How did you (/they) come up with the name Polaris?
If I had to guess it's a nautical theme, following Kubes. Fair Winds (sailing), Polaris (North Star, used for navigation.)
basically this. yes.
What yields the lowest risk - spending a ton of time hardening one cluster, or building multiple clusters to reduce the blast radius of bugs and misconfigurations?
> What yields the lowest risk - spending a ton of time hardening one cluster, or building multiple clusters to reduce the blast radius of bugs and misconfigurations?
Not sure this is a valid dichotomy.
If you are spinning up multiple clusters, you are presumably doing so in an automated fashion. If so, then the effort of hardening is very similar. It doesn't really matter where you do it.
Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.
> If so, then the effort of hardening is very similar. It doesn't really matter where you do it.
Nope. If the clusters are separate it limits how damaging a compromise of the cluster is. This is why cloud providers donât stick you on the same k8s cluster as another tenant.
> Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.
Itâs not really clear what youâre trying to say here. If someone compromises credentials shared between all clusters thatâs the same as compromising credentials used by one mega cluster.
> Nope. If the clusters are separate it limits how damaging a compromise of the cluster is.
But if the clusters are configured similarly, a flaw in one is likely present in the others. GPs point is that if you invest in hardening, you can easily apply it to multiple clusters.
> Itâs not really clear what youâre trying to say here.
I assume they mean having more clusters present means there are more opportunities to be compromised (e.g. more credentials to leak, more API servers to target, possible version skew, etc.).
You can't skip "spending a ton of time hardening one cluster" anyways.
Having multiple clusters may help reduce the blast radius of _certain_ attacks, to some degree. However, managing multiple clusters is a lot more difficult than managing one, and you will potentially replicate bad practices, vulnerabilities to multiple places and increase maintenance burden.
If I could go back, single cluster. Any benefits you get from going multi-cluster can be achieved by configuring a single cluster correctly.
The one benefit you get is protection from bugs in Kubernetes itself and a reduced blast radius. Even if you could produce a secure and H/A cluster, you still leave yourself open to Kubernetes bugs and configuration mistakes such as adding a network policy that blocks all communication across all namespaces.
Multiple clusters protects you from these types of configuration mistakes by reducing the blast radius and providing an additional landing zone to roll out changes over time.
And making it so that "many clusters" look exactly like "one cluster" is one of the goals the kcp prototype was exploring (although still early) because I hear this ALL the time:
1. 1 cluster was awesome
2. Many clusters means I rebuild the world
3. I wish there was a way to get the benefits of one cluster across multiples.
Which I believe is a solvable problem and partially what we've been poking at at https://github.com/kcp-dev/kcp (although it's still so early that I don't want to get hopes up).
If you have 2 clusters, wouldn't you just blue/green them for rolling changes?
Except for security and fault isolation of course.
>a single cluster correctly
Can you elaborate?
At a high level, almost anything you would want to use multiple clusters for can be done on a single cluster, using e.g. node pools, affinity, and taints to ensure that workloads only run on the machines you want them to. As a simple example, you can set up a separate node pool for production, and use node affinity and/or taints to ensure that only production workloads can run there.
One exception, as other have mentioned, is blast radius - with a single cluster, a problem with Kubernetes itself could take down everything.
At our very large org we do both. At least two clusters per region to isolate platform changes, all hardened to the same standards using automated tooling.
Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.