A 2012 Ponemon Institute report (see Resources) on the status of application security states that "seventy-one percent of developers feel security is not adequately addressed during the software development life cycle." This statement is disconcerting, as organizations now have new technologies to incorporate into their secure systems development life cycle (SDLC). These new technologies—namely, cloud computing and big data—will put further stress on an organization's secure development processes, if it has one at all.
This article provides an overview of cloud computing and big data, their vulnerabilities and weaknesses from an application security standpoint, and how to securely develop applications on these platforms using a secure SDLC process.
Secure development primer
To assimilate the cloud and big data paradigms into a secure SDLC process, an organization must first incorporate security into its SDLC and consistently follow it. An SDLC is a development process that focuses on at least five phases for developing quality software: requirements, design, development, testing, and implementation. Organization must incorporate security into each phase of this process. Whether this is done using a specific process model like Microsoft®'s security development life cycle (SDL; see Figure 1) or (ISC)²'s best practices (see Figure 2), by leveraging the Open Web Application Security Project's (OWASP) best practices, or by incorporating a custom framework, an SDLC is now a necessity for development teams.
Figure 1. Microsoft's SDL process
Figure 2. (ISC)²'s secure coding best practices
Cloud computing primer
Beyond understanding what a secure SDLC process is, it's important for an organization to understand what cloud computing is and how it can help the business find economies of scale and a renewed focus on the organization's core competencies. Cloud computing is a rebranding, if you will, of the old application service provider (ASP) model. However, true cloud offerings have some additional nuances to the ASP model—namely, resource pooling, on-demand functionality, multi-tenancy, and rapid elasticity. These attributes mean that you can receive an economy of scale by converting your fixed capital expenses (CapEx) to variable operating expenses (OpEx) by paying for what you use, when you use it.
The US National Institute of Standards and Technology (NIST) helped to further define what the cloud is by establishing cloud service models and cloud deployment models. Cloud service models are methods in which an organization can use the cloud predicated on the business requirements, such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS). Note that a cloud consumer has the greatest flexibility with IaaS and the least with SaaS. However, with the flexibility found in IaaS comes the cloud consumer's requirement to administer, monitor, and manage the environment. So, for IaaS, the consumer builds the (virtual) server starting at the operating system layer. For PaaS, the consumer builds the database, application, and business rules and loads the data. For SaaS, the consumer only has to load the data into the pre-built application.
NIST has also defined the cloud deployment models, which include public, private, hybrid, and community clouds (see Figure 3). An example of a public cloud model is Google Docs, where the application and data are stored in Google's data center somewhere. In this model, the consumer in essence has a floor in the Google "high-rise" apartment.
Figure 3. NIST clouds
Most large organizations leverage the cloud internally in the way of a private cloud. An example of a private cloud is an organization that uses cloud nuances (for example, resource pooling, on-demand functionality, multi-tenancy, or rapid elasticity) within its own data center for information processing. Note that you can have a private cloud within a cloud service provider's (CSP) facility, as well. Examples include Terremark's customers as well as those who use Amazon Virtual Private Cloud (Amazon VPC) for their Amazon Web Services™ (AWS) service line (see Figure 4). An example of a hybrid consumer would be a pharmaceutical company that uses a private model to store its research and development data, send the data securely (via Secure Sockets Layer or Transport Layer Security) to a public cloud for computations, and then send the data back to the private cloud. In essence, a hybrid model uses the best of both the public and private worlds. A community cloud leverages resource pooling to a large extent. One example is several schools in a school district that share server resources for information processing.
Figure 4. Amazon VPC
Big data primer
Beyond the cloud, big data is a new paradigm to industry. Oracle defines big data as an aggregation of data from three sources: traditional (structured data), sensory (log data, metadata), and social (social media) data. Big data is often stored in non-relational, distributed databases using new technology paradigms such as NoSQL (Not only Structured Query Language). There are four types of non-relational database management systems (non-RDBMSs): column based, key-value, graph, and document based. These non-RDBMSs aggregate the source data while analytical programs, such as MapReduce, analyze the information. Once big data is aggregated and analyzed, organizations can use this information for market research, supply chain research, process optimization, security incident analysis, or trending analysis.
Scenarios in which big data is a value-add include having market research data available to support a decision to outsource or in-source, engage in an acquisition or merger, move into a new market, or leave a market. Once thought of as a technology solely for academia, non-RDBMS systems are now reaching critical mass in industry. Leading technology service providers such as Twitter have begun to use them, and individuals and companies consume those provider offerings. Non-RDBMSs are becoming the preferred database architecture for organizations using Web 2.0 technologies because of the open source nature of these platforms, which leads to cost savings, because organizations do not have to invest in traditional relational database software licensing or local hardware. Depending on your budget, I advocate that an organization provision new positions to administer and manage big data systems, though analysts, programmers, project managers, and traditional RDBMS administrators should be cross-trained. How all this is done is based on the needs of the business, but you'll find a specialty focus within those organizations that already leverage big data platforms—Yahoo!, Facebook, and the like. If your organization decides to incorporate dedicated resources to big data, it is crucial to remember to use these systems to augment your existing RDBMS investments for storing and analyzing big data.
Enterprises will continue to use RDBMS and non-RDBMS systems concurrently. These systems have their similarities, but their differences should be noted. as well. For example, non-RDBMSs have data distributed across multiple computers, affecting an organization's state of privacy compliance when data is spanned or shared across multiple jurisdictions. Non-RDBMSs create, read, update, and delete data through an application programming interface (API) call rather than a database connection (for example, Open Database Connectivity, Java™ Database Connectivity) like RDBMS systems. Non-RDBMSs also differ from RDBMSs in the way they treat data. For example, tables in a non-RDBMS are referred to as domains or namespaces (as in Amazon DynamoDB, shown in Figure 5). Also, non-RDBMS Data Definition Language, or metadata, is not as easily queried as in an RDBMS. In addition, most non-RDBMSs have moved away from using SQL for Data Manipulation Language calls; many use NoSQL, instead. Finally, a non-RDBMS requires that an operating API service be running as opposed to a database server instance, which often leads to a considerably lower OpEx.
Figure 5. Amazon DynamoDB
Bringing it all together
By running an API service, such as Gemini Cloudian, non-RDBMS users can save on CapEx and OpEx, and they can easily build applications using the non-RDBMS technology. CSPs such as AWS are counting on this. AWS has an offering called Amazon SimpleDB that various startups (such as Flipboard, Kehalim, Livemocha, and LOUD3R) use to deliver solutions quickly to market. Non-RDBMS consumers connect their NoSQL databases to web applications by using third-party software products (for example, Cloudian) or write their own software. Organizations use non-RDBMSs to experience enhanced scalability, elasticity (sharding), modularity, portability, and interoperability while using NoSQL platforms with programming languages such as the Java language (see Figure 6), Web 2.0 technologies like Ruby on Rails (a Web 2.0 programming language focused on dynamic content), or enhanced web services or service-oriented architecture.
Figure 6. Using Java in Amazon DynamoDB
Tips and tricks
With the use of big data solutions often coupled with web development projects and with CSPs offering their own non-RDBMS/NoSQL solutions, a natural first step for extending an organization's secure SDLC process to the cloud or big data environments is to deploy this process to web development projects. As web or thin-client applications present specific vulnerabilities, such as cross-site scripting (XSS) or cross-site request forgery (XSRF), it is important for the development team to be aware of how to code secure software for this environment. Examples include providing proper training and awareness for conducting static and dynamic application security testing (DAST) per the vulnerabilities found within the OWASP Top Ten, updating threat modeling tasks and tools, extending an organization's development and quality assurance (QA) environment to the cloud to test like environments, and how to retool defense-in-depth strategies for big data systems (for example, non-RDBMS/NoSQL).
The focus on the application and middleware tiers for securing big data (that is, non-RDBMS/NoSQL) systems is needed, because these back-end solutions leverage an open architecture. To secure this architecture, an organization must incorporate additional safeguards and controls on the tiers beyond the data layer, especially if the system is located in the cloud. Examples of additional safeguards and controls include:
- Encryption. Data in motion, data in use
- Enhanced identity and access management (IAM) solutions. Security Assertion Markup Language, Representational State Transfer, two-factor authentication/one-time passwords
- Adding an explicit segregation or separation of duties. That is, make a clear delineation between those who can read and write changes and those who own and read the data.
- Logical access controls. Virtual firewalls, web application firewalls (such as Imperva), XML firewalls, database activity monitoring with the use of database firewalls (DBFs, such as Oracle DBFs)
- Enabling enhanced accounting, auditing, and accountability practices. In particular, using security information and event management tools
Once these safeguards and controls are in place, you must test their effectiveness. However, there are limitations to the DAST offerings available for the cloud because of CSPs limiting the ability of consumers to scan their cloud environments. Beyond that, most vulnerability scanners have not updated their products to incorporate a scan of a big data environment, yet. However, there are some exceptions to this rule from a cloud perspective, with AWS partnering with a company called Core. Core CloudInspect is allowed to scan an organization's IaaS-based Amazon Elastic Compute Cloud instance. The Microsoft® SQL Azure™ PaaS-based solution allows McAfee's Database Security Scanner to test a cloud consumer's security, as well. Beyond these exceptions, it should be noted that organizations can mitigate the risks involved with using non-current vulnerability scanners while deploying code to the cloud or big data systems by implementing proven secure coding conventions such as incorporating privacy and security by design as well as writing prepared statements for input validation. In addition to legacy scanning solutions, organizations face other challenges with testing cloud and big data code: incorporating secure and proper change and configuration management procedures.
Roadblocks and land mines
Many organizations may find challenges in extending their secure SDLC to the cloud and big data in the way they execute change and configuration management requests. When leveraging the cloud for development or QA work, your integrated development environment may not work properly when checking code in and out of the development/QA environment. This lack of communication may lead to additional work to keep consistent, up-to-date, and defect-free code in the library. To mitigate this risk, an organization should extend its IAM platform, via single sign-on, to the cloud for its development environment. Beyond instituting proper change and configuration management tasks, organizations need to assimilate the threat models, attack vectors, and testing of its third-party vendor products into their secure SDLC process.
When an organization leverages the cloud or big data, it must assimilate the new threat models, attack vectors, and testing of these third-party products and services into its environment. Examples include testing a CSP's PaaS environment, testing an appliance-based cloud or big data environment, or testing the security associated with a third-party web service or API. Specific focal points to incorporate include testing for input validation, memory overflow, encryption key management, and the handling of graceful versus ungraceful exits.
An organization must incorporate specific actions into its secure SDLC process to mitigate the risks of introducing the cloud or big data into the enterprise. These actions include testing the security posture of vendor software and hardware products, updating change and configuration management processes, understanding the limitations of existing security assessment tools, and extending the organization's secure SDLC process in an iterative manner to web development projects. This knowledge needs to be coupled with an enhanced understanding of what the cloud and big data consist of, knowing the vulnerabilities found in the cloud and big data, and how to remediate those vulnerabilities found in the cloud and big data. Per Ponemon's research, if your organization can do all this, you will be part of a selective minority.
- Read Ponemon Institute research report, "2012 Application Security Gap Study: A Survey of IT Security & Developers".
- Find more information on the Hadoop framework and its use in industry.
- Find more information on NoSQL and security.
- Find more information on the OWASP Top Ten web application vulnerabilities.
- Check out (ICS)².
- Amazon's AWS offers a host of cloud-based technologies for organizations and projects of all sizes.
- Explore developerWorks Cloud computing, where you will find valuable community discussions and learn about new technical resources related to the cloud.
- Follow developerWorks on Twitter.
- Learn more about Microsoft's Windows Azure™ platform.
- Learn more about Core CloudInspect.
- Check into Imperva.
Get products and technologies
- Check out Amazon DynamoDB, still in beta at the time of writing.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.