During my gym session, I ran across a hard problem. For SSH access, how could one replicate CloudFlare Access and not require VPN to ssh into Production assets?
What is Cloudflare’s Access?
Looking at CloudFlare’s Access, all they did was copy Bless, turned it into a service, and made Cloudflare’s edge a jumphost. It is still early days but works really well and is coming along.
What is Bless?
“…SSH Certificates are an excellent way to authorize users to access a particular SSH host, as they can be restricted for a single use case, and can be short lived. Instead of managing the authorized_keys of a host, or controlling who has access to SSH Private Keys, hosts just need to be configured to trust an SSH CA.
BLESS is an SSH Certificate Authority that runs as an AWS Lambda function and is used to sign SSH public keys….”
How would this work?
Traditionally, a well-architected AWS account would have users (and their associated credentials) tied to MFA. Now one could move to aws-okta (or okta-awscli : which gets role credentials via okta aws assume role.) The addition to make Bless work is that the AWS role(s) have a policy that allows operators access to a kms key, which has a policy that allows encrypt based on their user (which is an IAM variable.) The role also has access to lambda, which generates the ssh cert. The bless client generates a kmsauth token, using encrypt on the kms key. It takes that token, then calls the lambda, asking for a cert, passing in the token and their username. The lambda verifies the token, which maps it to user. it uses that info to generate a cert specific to the user
It’s a little fuzzy from just looking at Bless how RBAC comes into play. Does the user cert include group info? Or do you have to add users outlined in the certs into groups on the machine as part of the CA cert trust process?
It helps that all nodes have info on all users and limit access to groups.
That seems… complex. Like every time you get a new engineer who needs SSH access to Production assets, there’s a puppet/salt/ansible/kill and fill that updates with new user info?
Pre-generate nsscache files, deliver those cache files to all hosts, and point /etc/passwd.cache, /etc/shadow.cache, (and others) to the downloaded files. I used to install the users via salt, but after ~500 users it started becoming unmanageable. The .cache versions of the files are the exact same format as the non-.cache versions. You're really just shipping around passwd, shadow, and group files, but in a way that won't trash your systems if one buggers the system. Nsscache populates the data for libnss-cache. Since you are distributing the data, one doesn’t need nsscache on every system.
Where may I find additional information about Lyft’s Bless fork?
Additional information may be found @ https://eng.lyft.com/blessing-your-ssh-at-lyft-a1b38f81629d . That was Lyft’s first implementation and released their version of the Bless stack.
https://github.com/lyft/bless (another fork)
https://github.com/uber/pam-ussh (don’t try it for anything other than sudo. Thar be dragons.)