HIPAA Cloud thoughts with an architect

Today, while my red team automation was deceptively infiltrating the architect’s pet project, the architect asked me what should one do about keeping the red team automation out of his cloud-enabled PHI-tainted pet project.  Specifically to obtain and maintain a simple baseline.  We started to brainstorm what could a purple teamer efficiently and effectively execute.  Before you know it, we had a list.  Why?  Everyone loves lists of random thoughts.  Isn’t that why Word Clouds are popular?  The format below is not meant to be cohesive and comprehensive.  Just random thoughts translated into bytes on the Internet.  There is a difference between providing legal / security advice vs. providing business advice.  Specifically, if you take these off-the-cuff thoughts and I learn later I am being held responsible because someone relied on that advice to their detriment – no bueno.

 

No single failure points and defense in depth
Goals – not putting all eggs in one basket.  Instead, execute for redundancy and consensus.  Did you ever bring in cookies for coworkers only to have a random BART rider knock the tin out of your hand, causing the cookies to escape to the far corners of the station?  Well, try not to put all of the cookies into one tin.  Minimize the blast radius.  Consensus is covered below.

  
In practice
AWS Admin account – one person has the 2FA token.  The other has the password.  One can go further but this is a great start for pet projects.

 

Lock down Production Access
Add MFA to SSH.  FIDO U2F key is a start.  Arguably require all ssh to be XP / “pair programmed” via key separation

 
Special laptops for PROD access
Set aside some special machines in the office that are in a locked room.  Only used for SSH access.  Chromebooks work well.  Throw a dropcam / Nest camera in said room to record enter / exit.  Wipe machines on a regular basis.  Disable Intel management engine if possible.
 

Heavily audit SSH access
Not only setup the above bastion hosts, but restrict who has access and let the stakeholders know when they are accessed via group communications.  Nothing quite like having an executive ask “why were you logged in at 4am last night?  The notification woke up my wife….”  Wake people up when certain commands are issued.  Reputably log every action and key action which goes through the bastion host. OnionID goes a long way.  Storage of logs is just as important but that is a different hairball solved my log management / SIEM efforts.  DR needs to guarantee storage of every action for >10 years.  Immutable logging is a must.
 

Special rules for who gets production access 

Create a culture where production access is taken seriously.  Obvious controls like background checks.  The real political challenge – stay safe and bond the employees.  $$.  Or make copies of their local and intl. identifications, fingerprint stamps, and anything else needed to issue an intl. arrest warrant in the Philippines. 

 
Cold Storage of secrets / break glass
Stage 1 – on-premise safe for 24/7 availability.  Separate the passcode and key between parties.   Get a bank deposit box.  The end result being some MFA devices, usb drives, and paper backups in the deposit box(s) and / or safe.

A mature stage – Keys are generated offline in a secure environment.  Split via Shamir’s secret sharing.  Enables redundancy and quorum to restore a key.  Key holders are distributed and follow protocol during key signing ceremonies to verify their identity and assure the integrity of the keys / ceremony.  Hashicorp Vault works well.  Once again, the dropcam / nest cameras are a great detective control.
 

SIEMs / Log Management
Logging everything which happens across infrastructure.  Storage is cheap.  
Amazing audit trails required.
Deterrent and detective controls.
Need to design for low-latency and high variety logs.  
Probably Kinesis, CloudTrails, Flows, etc…  Too many log sources to mention here.  Large variety of sources, event formats, and event rates at the best of times.  I never heard of anyone stating "Gee, I wish we had less logging."  Correction - when running dtrace or light system tracing on a Production system.

CONTEXTUAL ALERTING!

 

Anomaly Detection
Warnings
Push warning to group communications.  Typical events are brute-forcing and hitting rate-limiting.

Errors
Trigger paging to wake someone up even if it is the middle of the night.  These require immediate attention.  An unusual movement of customer data is a typical event.

Critical Issues
Key phrase: contextual alerting with adaptive responses.  These trigger kill switches that gracefully shuts down relevant systems / services.  These kill switches require their own special ceremony (think manufacturing maintenance ISO9600 efforts to place physical locks on broken machinery) to re-enable.   Typical events include unauthorized access to certain machines / services.
 
Deployments
Consensus-based deploys

ANYONE MAY PROPOSE A CHANGE.  CONSENSUS IS REQUIRED BEFORE THE CHANGE IS APPLIED

If one is not adopting this philosophy, most likely the architects are stuck in 90s / early 00s design patterns (The M&M candy security model) or antiquated technologies are in use.  Zero Proof / Trust security models work great and are easy to implement.

Typical workflow – every pull request requires approvals (however that is achieved, +1 via github? .) Each branch or repo may have its’ own sensitivity value (more consensus vs. less based upon the risk.)
If someone in development is infected with malware, increase the consensus numbers across the board.  This protects everyone while allowing rational growth.     

Apply the same rational to deployments.   Immutable docker instances.  Just need a Docker file, docker compose file, and envars? to deploy.  Try to avoid Vagrant.  If Vagrant is required, zero out the empty space in the partitions.  Kubernetes vs. Docker vs. CoreOS vs. <insert technology stolen from VMWare ThinApp> - find what works best for your skillset and future growth.  
 

Cloud

Benchmarks go a long way.  See github.com/cloudsriseup for simple scripts to execute to close out the Big 4 (AWS/GCP/Azure/Alibaba) cloud providers gaps.  These scripts cover the benchmark / STIG / “standards” gaps – minus the “Hardware 2FA for AWS Root account” and similar items.  

Make the account(s) / services HIPAA / HITRUST adherent – another story for another time.  SafeNet / Gemalto HSM on-premise backup to AWS’s Gemalto HSM is a pain but will need to test periodically to ensure happiness, unlike Isaac Hall https://techcrunch.com/2012/09/05/recurly-failure/  . 

Strive to obtain HIPAA / HITRUST expert determination.  Close gaps identified by expert determination.  Keep determination in Legal share and court-approved escrow storage.

 

Red Team drills

If the red team wins, everyone loses.  Gamify while closing out executives’ risk concerns.

You know what you designed.  Red Team tells you what was implemented.

 

Bounty program

Fresh set of eyes. 

The benefits of 1000 one trick ponies instead of 3 one trick ponies are beyond reproach.

 

3rd party greybox tests

Afford the best vendor for the appropriate service.  For instance, I wouldn’t let Spider Labs near my systems.  But I would let Bishop Fox take on my mobile applications.

 

PII / PHI handling vendors

This will become the bane of your professional existence.  AWS will “accelerate maintenance schedules with 2 hour notifications in their commitment to your security.”  In other words, we are busy patching our EC2 infrastructure and our high availability is lacking.  Our lack of planning is now your emergency.

The best S3 bucket data breaches were not with the data owners, but the third party data custodians who would allow ANY AWS IAM identity to manipulate the bucket or flat out allow anonymous access.  Defeating the purpose of the original goals above. 

If the vendor is handling your customers’ PHI, get a sub BAA, not a BA contract from said vendor.  Very different in scope and legal liabilities.

 

Incident Response

It is a lifestyle, not a process.  Practice makes permament.  Rehearse or perform resilience testing.  Nothing works quite like pulling random cables out of devices.

Everything changes if one has to handle clean vs. white vs. ISO 1 rooms.  See how long it takes for one to introduce a new electronic device into an ISO 1 room.

 

New engineer / ops onboarding
Enough said.  Pour the koolaid on thick and employ simple, actionable opportunities to change behaviors. Nothing quite like saying welcome by escalating phishing attacks from the CEO while HR is socially engineering the new hire.