Well-Architected: Security

Overview

The Security and Compliance pillar describes the architectural thinking needed to design, build and run an application or workload on hybrid cloud. The primary objectives are to build a system that protects a system from loss of confidentiality, integrity and availability from threats to the system.

Security domains and capabilities

Delivery of the security controls of a system are through five primary security domains:

Application security
Data security
Identity and access management
Infrastructure and endpoint security
Detect and respond

IBM developed the security blueprint by mapping both IBM and industry enterprise architecture models. Decomposing the five security domains created five capability groups for each domain. This ensured the domains and capabilities are comprehensive to cover all security control requirements.

Architectural guidance

Effective security and compliance in a hybrid cloud environment requires architectural guidance through:

Principles that guide the overall architecture.
Practices that guide the overall architecture development and delivery process.

Resources providing further guidance for the development of a secure and compliance architecture.
Next Steps in your journey to deploy a secure and compliant solution

The following sections elaborate the principles, practices and anti-patterns for architecting effective security and compliance.

Security domains and capabilities

The Table below shows the decomposition of the five security domains into five capability groupings together with other supporting security domains.

The Governance, Risk and Compliance domain governs delivery of the primary security domains. The Supporting Capabilities domain supports the primary security domains enabling effective delivery of security.

Physical Security and Personnel Security is normally delivered outside the remit of technology security teams but are essential to the effective operation of the primary security domains.

Security domains

Security capabilities

Governance, risk, and compliance

Strategy architecture, and governance

Security policy and processes

Risk and compliance

Audit and regulatory

Security awareness and education

Application security

Secure development lifecycle

Threat modeling and requirements management

Application urntime security

Application security testing

Application defect and risk management

Data security

Data lifecycle management

Data loss prevention

Data access, integrity, and monitoring

Encryption

Key and certificate lifecycle management

Identity and access management

Identity lifecycle management

Identity governance

Access and role management

Privileged identity and access management

Secrets management

Infrastructure and endpoint security

Platform protection

Endpoint protection

Edge protection

Core network protection

Multi-environment security management

Detect and respond

Vulnerability lifecycle management

Security testing

Threat detection

Threat investigation and response

Threat intelligence and hunting

Supporting capabilities

Incident, problem, and change management

Asset and configuration management

Performance, capacity and service management

Business continuity and resilience

Architecture and operations management

Principles

An application that processes sensitive business data, needs to be well-designed for security and compliance to offer adequate protection. Experience from design, build and run of hybrid cloud workloads has shown a number of guiding principles support effective delivery of a secure and compliant system.

In the past few years zero trust has been a big influence on guiding principles. Traditionally, once inside the network perimeter, trusted users and software can traverse the network, and access everything within the network. Zero trust suggests that security controls must not rely on implicit trust. A system shouldn't trust a user or entity based solely on its location (for example, inside the corporate network), the device used, or any other singular attribute. From this, three security principles are suggested:

Least Privilege
Continuous Verification
Assume Breach

You can find further information in the IBM Zero Trust Field Guide.

Together with zero trust, we defined a set of security principles based on other external guiding principles, such as those in OWASP Developer Guide, the experiences of cloud security architects within IBM.

We recommend the adoption the following security architecture guiding principles for security and compliance:

Adding security to a solution late in the design and development lifecycle often results in costly re-work and usability sacrifices that detract from the solution's capability and user experience. Designing solutions to be secure at the outset helps to ensure an appropriate balance of usability, interoperability, resilience and security for the final solution.

Secure by design is the principle that the design, development and delivery practices of a project must include security and compliance practices delivered by a skilled and experience team. Security design principles, such as those that follow, guide the architectural thinking practices.

The design process needs to start with the use of enterprise design thinking to focus on the required user outcomes for risk, compliance and security stakeholders, both internal and external to an organization. The external stakeholders include customer, governments, and regulators. The internal stakeholders include those managing risk, compliance, and security.

The design process continues with architectural thinking to define the architecture characteristics, architectural decision, functional architecture and cloud deployment model. Definition of characteristics, such as resilience, performance and scalability, are then completed for the security services.

After definition of the requirements and architecture, engineering of the security functionality and infrastructure can take place including following the Secure by default principle.

IBM has a long-standing approach to security and privacy by design used in the development of products. The published IBM Redpaper on Security in Development: The IBM Secure Engineering Framework is useful to review.

Where users or software with a system have excessive privileges there is a risk of misuse, either accidentally or deliberately. Threat actors may compromise the privileged accounts of internal users and use the rights to traverse the network and systems.

Least privilege is the principle that states that the system should give users or software the minimum capabilities needed to carry out their intended task. The system should implement this from the highest level in an application down to an individual connection between two software components.

To implement the principle of least privilege, you must understand the assets in your dynamic IT environment. Privileged access isn't just a people problem; you need to know what applications have access to what data. The process of discovery, classification, and risk assessment is continuous. Bring together risk data from your digital assets to uncover new business level risk insights to help you establish the right policies.

A user or software may have a secure context at the point of initial identification and authentication for a session, but a treat actor may compromise the active session and used it to further compromise a system.

Continuous verification is the principle that a user or software must be constantly verified to evaluate whether they still have a secure context. The intent of continuous verification is to find security issues and vulnerabilities as early as possible and ideally threat actors do.

Continuous verification confirms that entities are who, or what, they claim to be using challenges such as a second factor authentication request. A workload may require updating to support continuous verification of users and software using solutions using a Zero Trust Network Architecture (ZTNA). Organizations focused on delivering best-in class user experiences hesitate to disrupt the user with unnecessary requests; however, this level of identity assurance is critical for zero trust.

A threat actor may already have breached an organization. If an organization only looks for threats at the external boundary and not for threats that have used an valid mechanism to bypass the boundary control, they may not discover a breach for a long time.

Assume breach is the principle that an organization assumes a breach has taken place by a threat actor. The intent is to detect a threat as early as possible even if a compromise has bypassed

The approach requires threat management should add proactive internal detection, look for early warning signs, threat hunting, and use of proven runbooks. Detection and response must be tightly integrated with the insights and enforcement layers, sharing context and dynamically adjusting access control policies in response to identified threats.

An information system will have vulnerabilities in the software, firmware, and hardware. Configurations and software will change, and new vulnerabilities will appear. If one layer of defense is vulnerable, the system needs to remain secure.

Defense in depth is the principle that a system has multiple layers of defense so that if one layer fails, another layer remains in place to protect the sensitive data in the system. Using a layered approach to a security strategy ensures that an organization can stop an attacker at a subsequent layer when one layer of defense is breached.

The enterprise security strategy and architecture must include measures that offer protection across the subsequent layers of the traditional network computing model. Generally, an organization needs to plan security from the most basic (system level security) through the most complex (transmission level security).

An organization’s attack surface is the sum of vulnerabilities, pathways or methods—sometimes called attack vectors—that threat actors can use to gain unauthorized access to the network or sensitive data, or to carry out a cyberattack. As organizations increasingly adopt cloud services and hybrid (on-premises/work-from-home) work models, their networks and associated attack surfaces are becoming larger and more complex by the day.

Minimize attack surface is the principle that a system should reduce the scope for attack vectors so that it reduces the possible ways in which a threat actor can gain access. Security experts divide the attack surface into three sub-surfaces: The digital attack surface, the physical attack surface, and the social engineering attack surface.

The digital attack surface potentially exposes the organization’s cloud and on-premises infrastructure to any threat actor with an internet connection. Common attack vectors include network access, software vulnerabilities, enablement of unused resources, shadow IT and out of data software.
The physical attack surface exposes assets and information typically accessible only to users with authorized access to the organization’s physical office or endpoint devices. This includes attacks by malicious insiders, device theft, and “baiting” - an attack in which threat actors leave malware-infected USB drives in public places with the hope of tricking users into plugging the devices into their computers and unintentionally downloading malware.
The social engineering attack surface manipulates people into sharing information they shouldn’t share, downloading software they shouldn’t download, visiting websites they shouldn’t visit, sending money to criminals, or making other mistakes that compromise their personal or organizational assets or security.

An organization should assess the risk to the sensitive data being processes by the system examining each of these attack surfaces.

Users could have access to highly sensitive data that enables a bypass of controls, such as encryption keys, or have the rights to perform highly privileged actions, such transfer large sums of money. Even when a system monitors users, they may abuse their rights resulting in catastrophic business consequences.

Separation of duties is the principle that users with powerful access to data or actions require more than one user to complete the action. It provides an organization the ability to divide administrative functions across individuals without overlapping responsibilities, so that one user doesn't hold unlimited authority.

With activities such as managing encryption keys, the organization will require encryption key administrators to manage different parts of a key so that no single administrator has knowledge of the whole key. With business transactions, the application may require a person requesting a transaction to have one or more approvers for it to complete.

For some industries, a regulator may require separation of duties for powerful functions as a part of regulatory guidance. Separation of duties helps businesses comply with government regulations and simplifies the management of authorities.

Products or services contain many different points of configuration and there is a desire to make then as easy to use as possible. As a result, the product has an insecure security configuration such as network ports open or allowing easy to remember passwords.

Secure by default is the principle that an organization received products or services configured in a secure state out of the box. The products are setup to protect against the most prevalent threats and vulnerabilities without end-users having to take additional steps to secure them.

A system may be securely configured at the start, but a developer can subsequently make a change that modifies the configuration or a threat actor makes a change through a vulnerability that compromises the system.

Continuous compliance is the principle that the security configuration of the system is constantly checked, starting with the system development through to the ongoing running system. The intent of continuous compliance is to find security issues and vulnerabilities before threat actors do.

Ideally compliance checks are all automated and covers all security configurations including the cloud platform, the container platform, middleware and compute images such as virtual servers and containers. With compute images, a better approach may be to use Infrastructure as Code (IaC) to replace the whole image regularly.

Practices

An application that processes sensitive business data, needs to be well-designed for security and compliance to offer adequate protection. Experience from design, build and run of hybrid cloud workloads has shown a number of practices support effective delivery of a secure and compliant system.

These suggested practices help deliver complete security and compliance. Another way of looking at this list is that security capabilities or services are just applications running on hybrid cloud infrastructure. A security architect for these services is an application and infrastructure architect for the domains of security and compliance. Ensure that the activities completed for each security service receives the same attention of detail a critical business application would receive.

Shared responsibilities for the delivery of security and compliance in a hybrid cloud environment depends on the service models, compute platforms, and cloud service providers used by the workload or application. A security architect must document and communicate the responsibilities covering the design, build and run of the security services. Without clarity, gaps may exist in the security of the solution.

Each dimension has a different impact on shared responsibilities:

Cloud service model - The NIST definition of cloud computing the different scope of responsibilities depending on the cloud service model, such as IaaS, PaaS, SaaS or distributed cloud. Security services are part of the shared responsibilities. For example, the shared responsibilities for IBM Cloud include identity and access management, and security and regulation compliance. The shared responsibilities for these categories vary depending on the cloud service model.
Compute platform - The hosting of a workload may be one or more compute platforms, such as bare metals servers, virtual servers, container platforms and serverless. Each compute platform has a different set of shared responsibilities. For example, bare metal servers require the owner of the workload to be responsible for all the security controls on the platform except for the physical hardware and network integration.
Cloud service provider - The responsibilities also change depending on the cloud service provider. For example, the kubernetes platform from each cloud service provider are different (unless you are using Red Hat OpenShift) as each consists of a different set of curated open source packages.

So why should a security architect care about the differences in service ownership? The team providing security operations may have predefined security services to support the running of the service. The security services may be a part of the cloud platform or may be on-premises requiring extensive integration.

For example, an internal security operations team may use a different technology from a global systems integrator (GSI) such as IBM Consulting. For the internal security team their security control plane hosting the security services may be on-premises, whereas the GSI may be using cloud security services provided by another cloud service provider.

For the Software-as-a-Service (SaaS) cloud service model, only application security administration is available to the consumer of the cloud service. Security is normally built-in to the application as a part of the service with limited scope to control the security configuration. In this case the SaaS provider holds most of the security operations responsibilities.

Without documented responsibilities, there might not be owners assigned to design, build and run security components that are a dependency to the running of the workload. Understanding shared responsibilities for the security services enables the solution architecture to meet the operational needs and avoids a rework of the solution at a later stage in the project.

Security services in a hybrid cloud architecture use different cloud service models, compute platforms and cloud service providers. These services need an agreed control plane architecture to give consistent management and oversight of security and compliance.

With an enterprise that has on-premises data center workloads, the control plane may reside in an on-premises data center to be resilient to a failure of a cloud service provider. However, an on-premises security service won't necessarily have the ability to fully manage the cloud platform. Therefore, architect a control plane architecture to report security status and identify risks across the different cloud service providers.

With born-in-the-cloud organizations, hosting of the control plane could be another cloud, hosted in a point-of-presense (POP), or hosted in a co-lo data center. The security control plane may need to be available, even if a failure in a cloud service provider has occurred.

As a security architect, it's important to document and agree an architectural decision to ensure alignment on the security control plane architecture across an organization. Make this decision early on in the project to ensure the solution meets the strategy of the organization.

Security isn't primarily about deploying security capabilities, it's about protection of business data and processes from threats. Use a systematic approach to trace the data flows through the system to identify the security controls to protect data in transit, at rest and in use.

Architects must identify sensitive data for protection by starting with a system context diagram to identify the system boundary and external interactions that initiate the data flows through the system. Internal data flows of the application are then examined using a diagram to described the functional components, such as a component diagram.

As architects move onto the implementation, they examine the data flows between the components deployed onto the cloud infrastructure using a cloud architecture diagram. IBM has created a technical diagram design language that enables a design that's independent of a cloud service provider.

To identify the threats to these data flows, use threat modeling both for the workload functional components and the infrastructure for the workload.

Together these techniques and artifacts offer a repeatable and consistent approach to architectural thinking for security and compliance.

When processing the most sensitive data, consider another layer of protection by using confidential computing and homomorphic encryption.

A system needs to meet many different control requirements and the list can get long and difficult to manage. Instead of working with many individual requirements, work with processes, services or capabilities that group the requirements into delivery capabilities. For example, identity lifecycle management or unauthorised component detection are potential delivery capabilities.

Follow these steps to more effectively to manage compliance:

Create a single compliance framework with traceability back to the original sources.
Map the requirements in the compliance framework to a set of delivery capabilities
Ensure compliant capabilities at all stages of design, build, test and operation.
Continue to comply to requirements in production.

The security architect must ensure the security capabilities have associated Service Level Oobjectives (SLOs), responsibilities agreements and so on.

There is often a need to ensure data from one client, line of business or environment doesn't leak to another. Separation needs to take place for the running of the workload and storage of data. It could be as simple as a different table in a database or separate physical servers.

Hybrid cloud offers many options to enforce separation of data processing. Separation should take place between:

Workloads or applications - for example, accounting vs human resources
Environments - for example, development, test and production
Clients - for example, customer A vs Customer B
Location - for example, United Kingdom vs United States

There are many different options for segregation within a hybrid cloud environment that offer different capabilities and assurance:

Enterprise Account
Cloud Account
Resource Group
Virtual Private Cloud
Execution domain - for example, a container or server image
Project
Key Management Domain
Physical devices

The organizational policies must define what type of separation is appropriate for the workload that's processing the data. For example, the policy may require that the hosting of the production customer data must not be in a non-production environment. The minimum level of separation needed is a different cloud account as that's where the management of identity and access is taking place.

Related to the topic of data separation is that of data sovereignty. Certain countries are imposing constraints on where the processing and access of data can take place. The "Addressing Regulations and Driving Innovation with Sovereign Cloud blog post" provides a discussion on a hierarchy of the needs for data sovereignty.

Threat modeling is often thought of as a technique to identify and validate the security controls for data flowing through a system.

However, threat modeling has a second purpose - to identify threats for monitoring in a threat monitoring system. The security architect must work to identify a prioritized list of threats. Then define the required threat detection use cases for the threat detection capability. Implement and test the detection use cases with incident response runbooks to enable effective response to threats.

Ensure documented traceability between the threats, threat detection use cases and incident response runbooks to ensure complete design and delivery for the project.

With on-premises data centers of the past, security operations teams completed activities in a matter of days or hours and didn't require stringent service levels. With Infrastructure as Code (IaC) automation, the loss of a centralized security service such as secrets or certificate management, has an immediate impact on the availability of an application. Therefore, security services require service levels and an architecture to meet the availability requirements.

Define for each security capability or service:

Hours of service
Incident, problem and change response times
Skills and experience of operations staff
Detailed tested procedures to match the service requirements
Automation to match the service requirements

If the internal security operations team can't meet the demands of a cloud native workload, consider whether a managed security services provider is appropriate.

If you think about it, many security services today need resiliency and service levels that are at least as good as the workload they support.

Old ways of working meant that security was an afterthought making security configuration and patching difficult to complete as it wasn't early into the development lifecycle. With DevOps working practices, the security build standards need to be:

Included into the development and test environments.
Be consistent from initial development to full production.
Be continuously tested for ongoing compliance.

The build processes must prevent insecure deployment of a workload at all stages of development and operation to ensure security isn't an afterthought.

Resources

The previous sections elaborated on the principles, practices and anti-patterns for architecting effective security and compliance. There are many other sources of information that will support you on your journey to ensure effective security and compliance for hybrid cloud.

Industry control catalogs

Many organizations have created security policies or control frameworks by unifying legal and regulatory frameworks, and industry standards with adaptation to meet organization risk tolerances. For other organizations, where do they start?

There are a number of existing control frameworks and control catalogs that help:

NIST Cybersecurity Framework - The National Institute of Standards and Technology (NIST) is a United States non-regulatory agency that has created a framework
Cloud Security Alliance Cloud Controls Matrix

To develop a deeper and more advanced security and privacy controls, the NIST SP 00-53 security and privacy controls catalog provides a good starting point. IBM has used NIST SP 800-53 to develop the IBM Cloud Framework for Financial Services discussed below.

Further help is available from IBM Cybersecurity Services (CSS) in selecting the right security controls for your business. IBM CSS has a specialist team on governance, risk and compliance who can advise and develop controls documentation.

Cybersecurity for hybrid cloud

Hybrid cloud has brought additional complexity to the design, delivery and management of security and compliance for identities, data and workloads. The Cybersecurity Services team in IBM Consulting is there to help innovate and transform cybersecurity for businesses to drive growth and competitive advantage.

Learn how the IBM Cybersecurity Services team can help you in this journey.

IBM Cloud for Financial Services

IBM Cloud has taken the best practices to develop a comprehensive approach to design, delivery and operation of a cloud platform to support regulated workloads for financial services organizations.

The solution consists of four key components that speed up deploying security and compliance for hybrid cloud:

Controls Framework
Reference Architecture
Deployable Architecture
Continuous Compliance

The following sections summarize the capabilities and reference further sources of information to explore further the capabilities.

Controls framework

The foundation of any security solutions is a set of security requirements that a system needs to comply with. IBM has developed the IBM Cloud Framework for Financial Services, together with industry partners, forming the foundation of security and compliance for regulated workloads. The framework consists of 565 control requirements derived from NIST SP 800-53.

A mapping of the controls framework to the Cloud Security Alliance Cloud Controls Matrix demonstrates broad applicability across the industry.

Explore and download the control framework in the IBM Cloud documentation.

Reference architecture

Integration of the control framework and a set of best practices for developing software and services delivered reference architectures for VMware, virtual private cloud (VPC), Red Hat OpenShift and distributed cloud.

Deployable architectures

An effective way of consistently deploying security and compliance is through automation. IBM has taken the IBM Cloud Framework for Financial Services, designed Reference Architectures and then developed Deployable Architectures that automate.

Explore the documentation on the deployable architectures and have a go yourself at deploying the a simple virtual private cloud deployable architecture.

Continuous compliance

A deployed compliant reference architecture requires continuous assessment for compliance to the control requirements is has been designed and build to meet. Continuous compliance of a hybrid cloud platform is provided through IBM Cloud Security and Compliance Center (SCC) demonstrating compliance against the IBM Cloud Framework for Financial Services and other security benchmarks from NIST and the Center for Internet Security (CIS).

SCC provides an integrated suit demonstrating compliance of a hybrid cloud platform, cloud native workloads and servers. Detailed documentation is available on Security and Compliance Center, Security and Compliance Center Workload Protection and Tanium Comply for end-point security management.

IBM Well-Architected Framework Pillars

Hybrid and Portable

Resiliency

Efficient Operations

Security and Compliance

Performance

Financial Operations and Sustainability

Next steps