Based in San Francisco Bay Area, Securesql is a blog by John Menerick. His insights dissect complex systems, offering a masterclass in cyber guardianship through expert analysis and cutting-edge protective strategies.

Intro to Cloud Native Multi-Tenant Security Models

Every year, I spend some time architecting and writing terraform, sentinel, cloudformation, and python security controls for cloud native workloads on the major IaaS providers including Aliyun and Rackspace. This year, I decided to break apart my IaaS assumptions. Especially with AWS and Docker Enterprise focusing heavily on their security capability maturity models. I paid attention to multi-tenancy workloads. At 30,000 foot view, these are my pillars;

  • Trust

  • Segregation

  • Confidentiality & Privacy

  • Visibility & Manageability

  • Portability & Interoperability

  • Reliability & Resiliency

  • Identity

  • Compliance

24,000 foot view applying security models, patterns, and techniques for compliance-driven security as observed by Hoff;

image Copy 3.png

skydiving quickly to release the parachute at 1,000 feet looks like this for any environment, including hybrid and on-premise models with corresponding layers of abstractions;

image Copy 4.png

With these many projects to solve various cloud computing aspects, it quickly evolves to the challenge with R&D'ing multi-tenancy services. The major challenge is building out a friction-less and cost effective security model around the various multi-tenancy patterns. For arguments sake, let's assume this is the typical flow for a multi-tenant R&D effort

1.PNG
  • Design Time: At the design time, each pattern assumes Tenant as a logical entity. They focus on tenant aware application design that allows taking high-level design decisions to implement, measure and manipulate each tenant independently. Patterns included in this category are:

    • Tenant as a state

    • Tenant aware measurement

    • Tenant aware logging.

  • Development Time: Patterns in this category work with components of the application to incorporate various tenant management aspects. These patterns require modifying the application components. The patterns in this category are:

    • At least one multi-tenant component

    • Multi-tenant database schema

    • Re-entrant and tenant-specific components

  • Runtime: The patterns which focus on dynamic binding of components and connectors, are included in this category. They tailor the behavior of the application to meet the specific requirements of tenants at runtime. These patterns are:

    • Dynamic resource allocation

    • Dynamic architecture

An interesting challenge collapsing the data security model to the workload is handling the workloads' multi-tenancy trade offs. Engineers struggle to choose an adequate architectural style for multi-tenant software systems. Bad choices result in poor performance, low scalability, limited flexibility, insecurity assurances, and obstruct software evolution. The below chart shows an overview of the consequences of all different multi-tenancy patterns and assess the weight of the consequences for the specific situation. Based on the consequences and weights, one may select a subset of patterns to evaluate in more depth. A being Application Server. D being a datastore (typically relational database or graph database.) The second attribute refers to the isolation or not of the server, data store, and / or infrastructure - dedicated database with one database instance, dedicated database with shared database, dedicated database with one database with multiple schemas. For instance, AI DC is a shared application server and instance handling all tenant data flows connected to a single database server with tenants separated by the database’s schema. One Node.JS instance connected to a single Postgresql instance running a single database called Users with all of the customers separated by the database table’s schema. More information and below diagrams from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.398.8253&rep=rep1&type=pdf

AI, DC

2.PNG

The 12 distinct patterns using 17 different criteria points leads to the following chart:

4.PNG

A value of 1 indicates a negative effect. A score of 5 implies a positive effect. the last column shows the σ 2 -value, indicating how much the criterion is affected by the choice for a specific muli-tenanted pattern. In the data security domain, we place significant value to those patterns which optimize for availability, integrity, confidentiality, authenticity, and software complexity. A typical pattern prioritizing those weights may look like this;

23.PNG

with these weights

image Copy 2.png

Merging

432.PNG

with

image.png

results in a landscape like

image Copy 1.png

By utilizing a well accepted orchestration service, service mesh, and container container approach, that allows one to truly collapse the data security model around the code itself. That is a challenge when the underlying hardware and operating systems operate in a walled garden security model involving the enclaves, kernel, userland, and similar patterns akin to:

234.PNG

underneath the *aaS abstractions

423.PNG

Sadly Asbestos never took off for running the underlying workloads for the current container services and technologies. It provides a novel labeling and isolation mechanisms to contain, via policy, inter-process communication, and system-wide information flows. Such that the formal methods prove that it isn't possible for a process to conduct an unauthorized process call

image Copy 5.png

which may look akin to

image Copy 6.png
image Copy 7.png

The end result being "Asbestos should support efficient, unprivileged, and large-scale server applications whose application-defined users are isolated from one another by the operating system, according to application policy"

http://www.scs.stanford.edu/~dm/home/papers/efstathopoulos:asbestos.pdf

As a result, we are left with our current landscape and implementations. Side note - troubling as more specialized attention and publication is focused on CPU microcode vulnerabilities with popular data center vendors. What one would hope for when collapsing the security model to the workload akin to "A new event process abstraction provides lightweight, isolated contexts within a single process, allowing the same process to act on behalf of multiple users while preventing it from leaking any single user's data to any other user" isn't possible with the existing landscape. For instance, let's take a stab at Kubernete's multi-tenancy aspects. Focusing less on the container security controls, one would expect to see seccomp, apparmor, kernel namespaces, cgroups, capabilities, and an unprivileged OS server. There are a few of the controls to enable HARD multi-tenancy in kubernetes with defense in depth. The existing proposals are lacking. Failsafe defaults, complete mediation, educated deputy, least privileges, least common mechanisms are extremely hard to apply to Kubernete's API. By default, Kubernetes shares everything and many, many different broken drivers and plugins. The simplest method is moving every workload out of the DEFAULT namespace akin to:

2234.PNG

to a stronger defense in depth isolation akin to:

4123.PNG

While we could spend hours debating the various multi-tenancy aspects of Kubernetes and the above multi-tenancy criteria weights chart, there exists a working group to think and implement solutions to these challenges. Please join https://docs.google.com/document/d/1fj3yzmeU2eU8ZNBCUJG97dk_wC7228-e_MmdcmTNrZY/edit# if you wish to learn more and join us for debates. As the result of the trade offs we make in Kubernetes and Cloud Native Computing Foundation, it takes extremely specialized engineers and architects to work around and collapse the data security model around the workload such that one never had to trust the underlying container, cluster, and *aaS. The next post will detail how to compensate for those trade offs or the various technical controls one may apply.

Kubernetes Basics

What does it take to break into a Cloud Service?