DevOps with controls
Faster releases with cloud infrastructure platforms
Many companies are shifting to a DevOps approach, where software deployment and the ongoing operations of production environments are both incorporated into the software development lifecycle. Rather than throwing completed code over the wall from development to operations when software is ready for release into production environments, DevOps incorporates the release into production into the overall release process. This typically relies on having a programmable infrastructure, such as cloud and software-defined networking (SDN), so that the whole environment is defined as code. Only cloud infrastructure platforms meet this requirement.
By using DevOps, companies can move faster than ever. With the dynamic, fast-changing nature of the cloud, it's difficult for key stakeholders in IT, operations, and security to have assurances that corporate policies for compliance are being met. Applying a discovery-focused, event-driven automation approach can give organizations the configuration controls they need to stay secure in the cloud.
Quick overview of DevOps
DevOps is an approach that deploys software quickly when new releases become available. One of the primary goals of DevOps is to help organizations move quickly. New releases are exposed to real-world users, customers, and usage as soon as possible and are tested for functionality and value to increase the benefit of the application. Most organizations use a cloud infrastructure as the operating environment for their DevOps teams. A simplified process might include the following steps:
- Commit code.
- Automatically package the application for deployment.
- Run a set of regression tests for code checking and smoke tests for production environment readiness. Run other tests, such as security/vulnerability probing.
- Depending on the test results:
- Failed tests: Generate alerts and logs for development, operations, and security teams. Return to coding.
- Passed tests: Automatically launch into production.
- Run in production until the next release.
This whole process, from end to end, is often referred to as a DevOps routine or playbook.
But not all DevOps teams know how to provision infrastructure correctly to keep in line with the organization's security and compliance requirements. And to compound the complexity, most DevOps approaches include the following mitigation mechanism for any problems: Revert the whole environment. DevOps playbooks almost always include rollback mechanisms.
Figure 1. DevOps playbook
By Medrecs (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
The cloud for DevOps
The cloud is uniquely positioned to work as the infrastructure for a DevOps approach for many reasons, including:
- The cloud is software defined; it is programmable and flexible.
- Cloud is software driven and has many different layers. The customer is responsible for each layer and must design it, from network configuration to service account and credentials through data protection and encryption settings.
- Cloud infrastructure is typically billed on a utility-style, pay-as-you-go model. This means that total application running costs can be discretely determined, and companies can do accurate ROI measurements to determine the value of an application.
Most organizations think that they can use cloud infrastructure similar to how they use IT infrastructure — namely through a centralized IT team. However, approaches like service catalogs, provisioning templates, and limited console access fail at scale because the needs of different teams are too diverse, and IT struggles to keep up with them. Eventually, serious cloud adoption becomes decentralized, and many teams access cloud infrastructure. This brings the fundamental problem: Not all of these teams are experts in infrastructure. And many simply don't know or don't understand the security requirements of the organization or how to implement them in software-defined cloud environments. When coupled with the complexity of large enterprises, business units, distributed teams, and various compliance regimes, this problem gets more and more difficult to manage at scale.
DevOps approach to fixing problems
Many DevOps teams automate both the deployment and roll-back of new applications and application releases. So if they identify a problem, the entire environment is torn down.
This complete tear down often implies downtime for the environment, followed by an exhaustive root-cause analysis. While this is a good exercise to improve long-term organizational improvement, it's a challenge to the immediate application availability.
A secondary challenge is that test coverage might not include every condition that should be considered to make a keep versus rollback decision. Tests often focus on application behavior, code correctness, and sometimes security assessment. Development teams tend to dominate DevOps groups, yet they might lack the background to check for infrastructure optimization or configuration.
Putting a set of security and configuration controls (sometimes referred to as guard rails) in place can give key executive stakeholders the protection and peace of mind that they need to let DevOps teams operate quickly, without putting the organization at risk.
Security perception of DevOps
The main concern that large organizations have about DevOps is the lack of security focus. This is often a conscious trade-off: Let the DevOps teams move quickly because they are working on critical new ventures for the organization. However, security is often sacrificed so that the teams and their environments are not constrained.
One of the reasons that the security perception of DevOps is so negative is that with cloud infrastructure, it can be hard to manage the number of security controls and configurations that need to be programmed. In a simple cloud application architecture design, there are at least ten unique controls to implement and configure correctly. A few examples:
- Are cloud audit logs enabled for all the infrastructure resources?
- Are firewall rules configured correctly?
- Are SSL certificates used at the right points?
- Are the SSL certificates valid and immune from known vulnerabilities?
- Are service accounts correctly used and secured for the services launching?
- Do service accounts have the appropriate permissions, policies, and key rotation?
- Are subnets sized appropriately?
- Are network Access Controls Lists (ACLs) being used and correctly configured?
- Is encryption used on data at rest where needed?
- Is encryption in transit used and correctly configured where needed?
- Is communication between application tiers properly secured with service accounts?
- Are static assets properly secured with appropriate permission sets?
Figure 2. Sample cloud architecture
Image credit: DivvyCloud; Used with permission
What controls are needed?
One of the key challenges for organizations in cloud adoption is defining the policies and standards that they want in place for their cloud environments. These requirements can vary by application, and they often come from multiple stakeholders. For instance:
- Production environments can have the highest level of security
- Require MFA for all admin accounts
- Only firewall port 443 is allowed to be open
- Audit trail must be enabled
- Test and development environments might have a focus on controlling costs yet
not allow any access outside the organizations:
- No ACLs on cloud networks, but no public web access
- No instances over eight cores (hybrid cost control and enforced focus on scaling out instead of scaling up)
- Maximum age of 30 days for any compute-based instance, database, or storage volume
- Back office applications likely need to connect back to corporate
- SSH access must be open to the corporate network IP range
- Databases backed up four times a day
- All IP traffic routed over VPN
Customers can put these configuration controls in place with different tools, ranging from those available from each cloud provider to open source inspection tools or commercial software. These tools can inspect and take action for the problems identified. Organizations can use tags, naming conventions, cloud region, or account placement to determine which sets of configuration controls apply to each application or workload, in addition to global policies that apply universally.
Security requirements likely come from different sources, as well. In the previous examples, for instance, the likely stakeholders would be:
|Port 443 traffic only||Security team, CISO, SecOps|
|No instances over eight cores||Finance, CFO, CIO|
|SSH access to corporate IT range||Operations team, IT, TechOps|
|Database backups||Operations team, TechOps, CIO, DR team|
Figure 3. Corporate-wide policies overlaid with business unit policies and application, project, or product-specific policies
Image by DivvyCloud; used with permission
A commonly successful approach in cloud automation is to start with visibility. Having a tool that continuously monitors and takes inventory of the cloud environment is not only useful but necessary to ensure that all cloud infrastructure is evaluated against those controls.
Some organizations rely on gathering this inventory from various deployment tools that should "check in" when new applications launch. Another approach is to use server agents to report back to a central inventory. Both of these approaches rely on the agents or deployment tools being installed, properly configured, and accessible.
An alternative approach is to use the cloud API layers to perform an exhaustive and repeated query of the environments. While this consistently leads to full infrastructure visibility, it has the trade-off of not being able to query deeply into the operating system or application tiers. As such, many organizations employ a combined approach and use APIs to consolidate the data.
Once the infrastructure is discovered, the applicable configuration controls are identified and evaluated for each virtual machine, application, network, storage asset, and workload. When problems are identified, the customer has two options: receive a notification and react or set predetermined automations to take action in near real-time.
A simple example is:
- The customer puts a control mechanism in place that disallows SSH access to production virtual machines.
- A new application is launched into production. The application defines a cloud network environment that inadvertently opens port 22 to the world.
- The cloud automation tools discover the new application and determine that it is a production environment.
- The tools compare the new application to the applicable production policy and determine that there is a problem.
- The customer has a predefined response that creates a notification, logs an event to the audit trail, and closes the firewall port.
This gives key executive stakeholders both security controls and peace of mind when applying DevOps and broader cloud adoption. These two factors together can help the organization be agile and move fast.
Figure 4. Cloud Configuration Control Process Diagram
Image credit: DivvyCloud; Used with permission
The cloud enables organizations to move rapidly. The true value, aside from potential cost savings, is the agility that the cloud allows. Customers can build, modify, and tear down environments at unprecedented speeds. By combining software-defined infrastructure with application release pipelines, companies can enable a truly agile DevOps approach to releases. This dramatically shortens the time between releases and between the release and receiving real-world user data that leads to continuous application improvement.
While cloud is potentially transformative, the real-time nature and broader access to infrastructure brings a new set of challenges to key stakeholders for security, cost controls, and policy compliance. Using a discovery-focused monitoring tool that can check and enforce controls can help remove key concerns for cloud adoption. Organizations can safely move forward faster with the cloud.
- Divvy Cloud
- IBM DevOps for Dummies ebooks
- Get started with DevOps on the cloud with toolchains and templates
- Accelerate your cloud transformation with the IBM Bluemix® Garage Method
- DevOps self-assessment