NSA Kubernetes Hardening Guidance [pdf]

3 years ago/169 comments/media.defense.gov

- Scan containers and Pods for vulnerabilities or misconfigurations.

- Run containers and Pods with the least privileges possible.

- Use network separation to control the amount of damage a compromise can cause.

- Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.

- Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface.

- Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity.

- Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied.

4 years ago by throwaway984393

> and encryption to protect confidentiality

Probably the hardest part about this. Private networks with private domains. Who runs the private CA, updates DNS records, issues certs, revokes, keeps the keys secure, embeds the keychain in every container's cert stack, and enforces validation?

That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete. How many teams are actually going to do this, versus just terminating at the load balancer and everything in the cluster running plaintext?

4 years ago by awsthro00945

For as fundamental and important as encryption-in-transit is, it's always baffled me that there isn't a simpler, easier solution to accomplishing it on private networks. Everyone knows its important, and everyone wants to do it, but it's just such a pain in the ass and so prone to error that even some top security leaders will tell you not to bother because it's such a footgun.

We really need something to help make the process simpler, like how Let's Encrypt made public HTTPS so much easier to do for even the smallest of websites.

4 years ago by amarshall

In some senses it’s differently complex, but WireGuard or similar may be simpler since it’s lower on the OSI and every application gets it “for free”.

4 years ago by throwaway984393

If Operating Systems had TLS built into the TCP/IP stack exposed by the kernel/system, you would never need to shim it in anywhere. You would just make a system call and use an open file descriptor/socket. One of the many programming-in-1970s-style things we still have not fixed.

But 1) kernel hackers won't implement it, 2) app devs are too possessive of their stack/codebase to just use one standard implementation/interface, and 3) security people are too paranoid to leave something "so important" up to the OS so they'd rather everyone implement it poorly/fragmentedly.

4 years ago by pyuser583

I doubt it’s a coincidence.

We learned recently that for a long time, the primary producers of cryptographic telephones was a single Swiss company. Owned by the CIA.

If security were easy, a lot of intelligence agencies would have a bad day.

Security doesn’t have to be this hard. But the powers that be seem to prefer complex, complicated systems, like DNS or SELinux.

It could be easier. Much easier.

4 years ago by klysm

The thing is you can use Let’s Encrypt for private networks too. For example, I use a dns challenge to get a wildcard certificate for a sub domain on my personal site, but those domains only resolve in my house. The wildcard cert isn’t essential for this - you could get individual ones - but it was easier for my home lab.

4 years ago by 3np

Service mesh solutions like Istio / Consul Connect+Vault can help a lot with this.

Depending on the existing size and complexity of your stack those months can be cut down to weeks or even days.

I don't mean to trivialize the time and expertise needed to set up and manage, but if you can afford to run a microservice architecture on k8s already it's definitely not untenable.

4 years ago by zaat

Encryption today is pretty much requirement for any regulated businesses and required practice for any sane shop, with or without Kubernetes. The only difference is that those services are communicating within the cluster internal network and not across different machines in the servers vlans.

If anything, setting up the whole things within Kubernetes ecosystem can be much easier with the available operators and automation frameworks like cert manager and/or Istio.

4 years ago by devy

> That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete.

Agree! This is why that "Kubernetes Hardening Guidance" is for NSA, not for startups.

Resource needs aside, keeping basic AppSec/InfoSec hygiene is a strong recommendation. Also there are tons of startups that are trying to provide solutions/services to solve that also. A lot of times, it's worth the money.

4 years ago by zeckalpha

This guidance is provided by the NSA, not for the NSA.

4 years ago by throwaway984393

I wonder about that. What are the attack vectors within a K8s cluster to necessitate inter-cluster transport encryption?

4 years ago by haolez

Who scans the vulnerability scanners? Genuine question. How does the community/ecosystem solve this problem of auditability?

4 years ago by nonameiguess

We deal with this by having multiple vulnerability scanners. Product A and Product B both scan your active environment. Product A scans Product B. Product B scans Product A. Additionally, make the vendors of those products sign NDAs so your threat actors, other than insiders, don't necessarily even know who they are. An attacker then needs to not only compromise both, but figure out who they are in the first place.

4 years ago by herodoturtle

To this I'd add what is colloquialy referred to as a "Chinese wall", so that even insiders aren't aware of the full picture.

4 years ago by knownjorbist

For anyone who hasn't read it:

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

4 years ago by Zababa

Are there any people working seriously on this? I'm aware of efforts for OCaml (http://gallium.inria.fr/~scherer/drafts/camlboot.pdf), but that's it.

4 years ago by tinco

If your threat profile says you need to audit your vulnerability scanners, you audit your vulnerability scanners. There's not really a problem there right?

4 years ago by Pokepokalypse

NIST also says: if your scanner finds a vulnerability, it's up to you to VALIDATE that it's not a false-positive.

False-positives abound on these scanners.

4 years ago by haolez

I've never had to. I wanted feedback from people who have.

4 years ago by jhawk28

that was the issue in the solar winds hack: https://www.npr.org/2021/04/16/985439655/a-worst-nightmare-c...

4 years ago by sound1

I am mostly non technical person but why do we need to resort to firewalls etc. if we can employ UNIX like file permission system for network access? Wouldn't it be awesome if we can allow any installed software to contact ONLY whitelisted domains? Of course this excludes web browsers but you get the idea.

How about our mainstream OSes incorporate that kind of permission system similar to what we have in mobile OSes already have today?

4 years ago by bennysaurus

It's a fair question and certainly is possible to have firewalls on a per-server basis. We do that for incoming traffic primarily. The catch is if that server itself gets compromised then you can't count on those rules still being enforced.

Having dedicated network appliances acting as firewalls means from a security perspective you need to compromise the local machine and then also compromise a dedicated, hardened external system as well. It vastly ups the difficulty barrier.

4 years ago by pavs

Firewalls does a lot more than block ports and services.

4 years ago by zeroxfe

Think of them as a defence-in-depth that protect from accidental misconfiguration, software bugs, local exploits, etc.

4 years ago by Meandering

SELinux

4 years ago by sound1

I didn't know that, learnt somthing today, Thank You!

Again, as a non technical person, why a software needs access to entire internet instead of whitelisted domains specific to its requirements is beyond me, since we already know how UNIX permission system works. Is it so hard to extend that to networks? Especially since everything is file in UNIX? Kindly pardon my ignorance :-)

4 years ago by gizdan

> - Use network separation to control the amount of damage a compromise can cause.

> - Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.

Are we still on this? Why isn't anyone pushing for zero trust? A concept made significantly easier to achieve thanks to container orchestration.

4 years ago by raesene9

Some useful guidance here, although worth noting that some of it is a bit dated (k8s security can move quickly).

Most notably from a scan through, they're mentioning PodSecurityPolicy, but that's deprecated and scheduled to be removed in 1.25.

There will be an in-tree replacement but it won't work the same way. Out of tree open source options would be things like OPA, Kyverno, jsPolicy, k-rail or Kubewarden.

4 years ago by nonameiguess

We've actually already moved the official guidance from PSPs to OPA and that's what the primary DevSecOps reference implementation has used for about two months now.

"We" being the DoD, but our guidance is the NSA guidance. I'm not sure why it hasn't made it into the policy pdf, but the actual official IAC has been using OPA since April.

4 years ago by ramoz

That's awesome. I know a lot of work is going into things like P1.

I scale some large K8s in fed (not DOD)... ATO is fun. Actually unsure how I'd position something like OPA (actually envisioned them being key back in '17 when working in the Kessel Run relm... called and they hadnt been exposed to fed at the time).

Legit question / maybe dumb - where is DOD at in general perimeter security. Outside looking in & everything before a container runs - network and to OS primarily, cloud envs as well. A lot of Fed needs help here before they can comprehend even a basic Kubernetes authorization. It's also generally more important (at list from controls perspective) in non DOD environments, than something like security context in pods.

4 years ago by Omnipresent

P1 has been leading the pack here. Most of the guidance mentioned in this guide has been coming from the CSO's office [0] for a while. We're using OPA extensively for not just container level policies but blocking column/cell level access in queries. We have multiple roles [1] to help Kessel Run, Space CAMP, and other software factories with this.

[0] https://software.af.mil/dsop/documents/ [1] https://boards.greenhouse.io/raft

4 years ago by xxpor

> Some useful guidance here, although worth noting that some of it is a bit dated.

Is there any digital security guidance from the feds that doesn't apply to? :)

4 years ago by sslayer

Everybody wants small gov, until they don't.

4 years ago by xxpor

This is why I think big vs little government is really missing the forest for the trees in a lot of contexts (unless your overall goal is to minimize taxes and regulations at all costs). It's really a debate about the nature of bureaucracy. Process vs nimble. You can organize things to promote either, depending on your actual goals.

Unfortunately small government activists have recognized this and have enacted policies that promote incompetence as much as possible. "Good enough for government work" is a choice, not an inevitability.

4 years ago by eatmyshorts

In-tree replacement is coming in v1.22...as in, just a few weeks away. It uses admission controllers, just like OPA/Kyverno et al, hence the current guidance to use one of those.

4 years ago by sleepybrett

PaaS solutions can't cover everything that PSP was covering though.

4 years ago by raesene9

Out of curiousity, which bits were you thinking of? OPA, Kyverno et all have policies which (AFAIK) hit all the bits of Kubernetes PSS.

4 years ago by __app_dev__

I used to study and focus on security a lot more and keep up with trends. After several interviews this year I realize a lot of jobs prioritize leetcode over everything else. It's pretty annoying and makes me wonder if the focus for tech works is leetcode above all else then no wonder so many companies have insecure apps and servers.

4 years ago by stillbourne

I applied for a job that wanted someone who has experience with SAML. I've actually written my own hobby IDP, and I can diagram the handshake off the top of my head. I've spent a lot of time learning how to write custom decorators to handle access restrictions. I failed my interview because they wanted me to leetcode some shit with 3d geometric volumes. I'm sorry but what does this have to do with SAML or security?

4 years ago by Gene_Parmesan

Wow that's dumb. I've done some reading on 3d computational geometry for hobbyist game engine reasons, and in my admittedly limited experience, very few of the algorithms involved are intuitive enough to be derivable in an interview setting.

4 years ago by xpe

This sucks -- it is a lose-lose situation. I've seen this kind of thing happen all too often.

4 years ago by papito

You interview them as well. You give me dumb, unrelated coding questions - you are out.

4 years ago by tra3

If you can't reverse a doubly linked binary prefix tree in O(1) then how can you be trusted with security?! /s :(

4 years ago by herodoturtle

Yes but in most circumstances, quick security is better than linear security; not sure about bubble security though.

4 years ago by dkersten

Bubble security sounds like a good idea. You know, put everything into its own isolated little bubble.

4 years ago by sellyme

Best I can do is Bogo security

4 years ago by icedchai

Consider the typical company is running servers/instances that haven't been updated or rebooted in 6 months to 3 years. Never mind the multiple year old software dependencies in their apps...

4 years ago by nuclearnice1

You are correct. Especially at big companies, programmers program and security is just some rules dropped on them from above.

You might be playing the long game. I think a CTO might benefit from knowing both app dev and security.

4 years ago by __app_dev__

Thanks! Yeah, articles like this I would have studies in greater detail in the past but this year I realize In need to improve my leetcode/algo times so long term I'll keep focused on security and important topics. But in the meantime ... time to zig-zag a binary tree :(

4 years ago by herodoturtle

At my company the head of security is also the chief programmer. Not sure if that's a good thing but he's got 30 years experience and likes to tell war stories.

4 years ago by nwmcsween

The elephant in the room here is almost all containers according to artifacthub.io, etc are a complete tire fire

4 years ago by nonameiguess

The DoD maintains its own registry of hardened container images they call the Iron Bank. I guess they can't issue guidelines to the general public that you should use these, but the DoD has to use them. Which kind of sucks, because they may be hardened, but they also break all the time because the people responsible for hardening them can't possibly understand all the myriad subleties involved in building and deploying software packaged with dependencies in the same way the actual software vendors do. They make some serious rookie mistakes, like just straight copying executables out of a Fedora image into a UBI images, which works perfectly fine when a brand-new UBI release happens and it's on the same glibc as Fedora, then immediately stops working and all your containers break when Fedora updates.

4 years ago by throwaway984393

They may suck at building containers, but this also sounds like a release management issue. Both the producers and consumers of the release need a test suite to validate the new artifacts before they can make it into a pipeline to eventually deliver to a customer use case. (But also they should 100% not be copying random binaries)

For what it's worth I've seen worse from corporations. Bad hires lead to bad systems.

4 years ago by spectre013

I work on Platform one and we use and deploy new versions of these containers weekly and have never had them break in that way. In the Beginning when I was on the Kubernetes team we struggled with the containers just not working at all but they have gotten better.

Now I work on deploying and we run every container from IB and have few issues. If you find them report the images and they will fix them pretty quick.

4 years ago by kenm47

there are good free/oss container scanners. check out Trivy.—no reason not to use one.

4 years ago by rob_c

A lot of this applies to containers in general. Not complaining, it's well written but wish they would break out the none kube container stuff into a general container-sec advice for people.

4 years ago by asymptosis

This is a great point. And containers don't even really exist in the first place, so really there should be (at least one of) a family of docs about securing the various namespaces, cgroups etc in modern Linux releases, and a doc about how to secure them in combination with each other.

4 years ago by kgarten

How do I know that this advice is useful and does not put me in danger?

Example: NSA recommends to use RSA encryption.

https://www.theverge.com/2013/12/20/5231006/nsa-paid-10-mill...

4 years ago by m1keil

You don't use this guide as a bible but take it into account and compare with other common security advice in the field. If you get similar results it most likely a good list of advice.

4 years ago by jdubs

This isn't for regular people, this is telling third parties what they need, in order for them to try to sell something to the nsa.

4 years ago by kenm47

so... a lot of this can be done with Fairwind's OSS tool Polaris... https://github.com/FairwindsOps/polaris

feels good that we've been addressing this for a bit already tbh. (disclaimer, I work for fairwinds)

4 years ago by herodoturtle

How did you (/they) come up with the name Polaris?

4 years ago by EamonnMR

If I had to guess it's a nautical theme, following Kubes. Fair Winds (sailing), Polaris (North Star, used for navigation.)

4 years ago by kenm47

basically this. yes.

4 years ago by pletnes

What yields the lowest risk - spending a ton of time hardening one cluster, or building multiple clusters to reduce the blast radius of bugs and misconfigurations?

4 years ago by outworlder

> What yields the lowest risk - spending a ton of time hardening one cluster, or building multiple clusters to reduce the blast radius of bugs and misconfigurations?

Not sure this is a valid dichotomy.

If you are spinning up multiple clusters, you are presumably doing so in an automated fashion. If so, then the effort of hardening is very similar. It doesn't really matter where you do it.

Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.

4 years ago by kortilla

> If so, then the effort of hardening is very similar. It doesn't really matter where you do it.

Nope. If the clusters are separate it limits how damaging a compromise of the cluster is. This is why cloud providers don’t stick you on the same k8s cluster as another tenant.

> Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.

It’s not really clear what you’re trying to say here. If someone compromises credentials shared between all clusters that’s the same as compromising credentials used by one mega cluster.

4 years ago by RandomThrow321

> Nope. If the clusters are separate it limits how damaging a compromise of the cluster is.

But if the clusters are configured similarly, a flaw in one is likely present in the others. GPs point is that if you invest in hardening, you can easily apply it to multiple clusters.

> It’s not really clear what you’re trying to say here.

I assume they mean having more clusters present means there are more opportunities to be compromised (e.g. more credentials to leak, more API servers to target, possible version skew, etc.).

4 years ago by zzyzxd

You can't skip "spending a ton of time hardening one cluster" anyways.

Having multiple clusters may help reduce the blast radius of _certain_ attacks, to some degree. However, managing multiple clusters is a lot more difficult than managing one, and you will potentially replicate bad practices, vulnerabilities to multiple places and increase maintenance burden.

4 years ago by bsamuels

If I could go back, single cluster. Any benefits you get from going multi-cluster can be achieved by configuring a single cluster correctly.

4 years ago by kelseyhightower

The one benefit you get is protection from bugs in Kubernetes itself and a reduced blast radius. Even if you could produce a secure and H/A cluster, you still leave yourself open to Kubernetes bugs and configuration mistakes such as adding a network policy that blocks all communication across all namespaces.

Multiple clusters protects you from these types of configuration mistakes by reducing the blast radius and providing an additional landing zone to roll out changes over time.

4 years ago by smarterclayton

And making it so that "many clusters" look exactly like "one cluster" is one of the goals the kcp prototype was exploring (although still early) because I hear this ALL the time:

1. 1 cluster was awesome

2. Many clusters means I rebuild the world

3. I wish there was a way to get the benefits of one cluster across multiples.

Which I believe is a solvable problem and partially what we've been poking at at https://github.com/kcp-dev/kcp (although it's still so early that I don't want to get hopes up).

4 years ago by dylan604

If you have 2 clusters, wouldn't you just blue/green them for rolling changes?

4 years ago by kortilla

Except for security and fault isolation of course.

4 years ago by agilob

>a single cluster correctly

Can you elaborate?

4 years ago by antonvs

At a high level, almost anything you would want to use multiple clusters for can be done on a single cluster, using e.g. node pools, affinity, and taints to ensure that workloads only run on the machines you want them to. As a simple example, you can set up a separate node pool for production, and use node affinity and/or taints to ensure that only production workloads can run there.

One exception, as other have mentioned, is blast radius - with a single cluster, a problem with Kubernetes itself could take down everything.

4 years ago by dharmab

At our very large org we do both. At least two clusters per region to isolate platform changes, all hardened to the same standards using automated tooling.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack