Open source software has become a major driver in development of complex applications. To reduce R&D effort and accelerate development time, developers often take widely available open source code, sometimes modifying it for their purpose, and integrate it into their projects. There is nothing wrong with this. If there is already an available piece of code that accomplishes what you need to do then it certainly makes sense to take advantage of it.
There is a misconception that because open source is publicly available, it can be used without restriction. The use of open source software is regulated through terms specified in a license. All open source software licenses come with obligations that must be complied with. Obligations can include anything from attribution (mentioning the pedigree of the code and including the original license in the distributed work), to having to make your source code publicly available. There may be certain open source license obligations that are not a good fit with your business model. It is important to know what open source content is in your code, so you can have a full understanding of the license terms and be in compliance with their obligations before your product is released.
Methods for managing open source software
Organizations can discover which open source software packages are used in their code and understand license obligations by ensuring open source code is properly adopted and managed. Code pedigree, and obligations associated with open source software, can be managed either manually or automatically. Manual methods of code examination can provide a high level view of third party content in a code portfolio. However, for all but smallest projects, manual audits are inaccurate, time consuming and require those examining the code to have deep knowledge of open source licenses and their obligations. Automated solutions are increasingly deployed in various stages of a software development life cycle, with organizations overlaying open source license management on their existing quality assurance processes.
Setting up an open source adoption process
Like any other quality management process, automated open source software license management can be integrated and applied at various stages in a development lifecycle. The earlier this process is applied, the sooner open source software usage is handled and the lower the effort and cost associated with correcting potential issues. Figure 1 suggests points in the software lifecycle where you could manage open source software.
Figure 1. Managing open source software at different points in the development lifecycle
In 2011, the team at Protecode studied the (then) current practices around adoption and use of open source software in more than seventy technology companies worldwide. The best practices in managing open source adoption were consolidated into an open source software adoption process (OSSAP), a series of necessary and optional steps that can be fully automated and integrated with existing development tools within a technology organization.
Establishing a policy
The best practices in open source license management are a lifecycle approach that includes policy-based, automated, and continuous license management as the software is being developed. Establishment of a structured open source software adoption process and a lifecycle approach ensures that any risks are addressed as early as possible to minimize their impact.
The first step of OSSAP is establishing a licensing policy. This policy determines which types of open source licenses or copyright holders are acceptable or unacceptable for use in the organization. Licensing policies also include actions to take if license violations are detected. Representatives from engineering, product management, and legal are involved in drafting this policy.
An open source software policy helps manage potential open source risks associated with legal, technical, and support aspects of a software product. An open source policy that is shared within an organization establishes confidence within the development community that the company understands and values the use of open source software, and that the organization is taking steps to maximize its benefits and minimize the risks.
The next step is analysis of the existing software portfolio to establish a baseline. This is followed by regular analysis and approval of third-party code or new code developed internally as part of the development process.
Figure 2 demonstrates the general structure of the OSSAP steps.
Figure 2. The general structure of the OSSAP steps
Accurately detecting open source
The goal of any automated open source software adoption process is accuracy. Automated tools work by comparing the content of an organizations code portfolio to a database of millions of open source and third party software files. This is achieved by:
- Examining software directories
- Examining folder and subfolder, even file names, in a directory
- Examining text files that indicate the existence of certain third-party packages and licenses
- "Scrubbing" software files and detecting statements such as "copyright xxx" or "licensed under yyy"
These techniques can detect open source content if the files are used in an application in their entirety. However their effectiveness is limited in a number of scenarios, when:
- The package name, folder names, or file names are changed.
- Directory structures are modified.
- Identifying text files such as readme.txt or license.txt are removed.
- Header information in the files are modified so that original copyright information or license attributions are lost.
We suggest using deep-scanning techniques to detect code similarity more accurately. This method involves comparing a software file signature to a large reference database of hundreds of millions of known open source software files, then looking for similarity between the signature under examination and the reference signature in the database. This step can be as basic as looking for exact matches in the database or as sophisticated as trying to match the code structure of a source file, in whole or partially, to the billions of lines of code in the reference database.
The next section describes an example of an existing, real-world approach to managing open source software in the cloud using ProtecodeCloud™.
Managing open source in the cloud
Powered by Protecode's GIPS database, ProtecodeCloud enables organizations to set up their own OSSAP procedure and manage open source compliance in a hosted environment. This is especially suitable for organizations that have already moved their development operations to the cloud. ProtecodeCloud is now available to IBM® SmartCloud Enterprise users, and is easy to set up by initiating a Protecode instance and purchasing file credits from Protecode's self-serve portal.
An IBM SmartCloud user can easily launch an instance of ProtecodeCloud within the SmartCloud ecosystem. ProtecodeCloud can be accessed through any web browser via a HTTPS link and appropriate credentials. A typical scanning process consists of four steps:
- Establish and digitally capture a policy, defining acceptable software parameters and providing direction to the automatic analysis engine.
- Select the software project under assessment and launch the automated scan. Automated scanning is very fast and once launched does not require intervention.
- Sign-off/confirmation on completion of the automated scan. In the sign-off/confirmation process, the machine results are examined and information is either approved, complemented, or information is provided. The sign-off process is manual; depending on the size of the portfolio under assessment, it can take anything between half a day to several days.
- Generate reports once the confirmation process is complete. Various reports can be generated designed for different audiences, such as licensing teams, export control staff, release managers, or customers.
Establishing a licensing policy in ProtecodeCloud
There are a number of parameters that can be defined in order to align the policy with an organization's needs. Organizations can scan files, defined by the number of lines of code or number of bytes in case of binaries. The sensitivity of the scanner for detection of code snippets could be adjusted (the more sensitive the setting, the smaller the size of the code fragments that could be matched to the reference database). You can choose to ignore common patterns, such as those created by code generators.
Define acceptable licenses in a policy by either choosing from a set of OSI-approved licenses or specifying their variations or custom licenses. You can define copyrights or authors that are acceptable or objectionable, and specify specific search terms (such as "crypto") that you are interested in your software portfolio. Figure 3 shows how easy it is to create a policy.
Figure 3. Create a policy
After a policy is in place, files are ready to be analyzed. The analyzer web interface can be accessed by clicking the analyzer tab in the ProtecodeCloud portal. A simple interface allows users to select software packages that the user wants to analyze, optionally name the resulting report, and click Run to start the analysis process. Figure 4 shows the interface for analyzing your code portfolio.
Figure 4. Analyze a code portfolio
Generating online reports
After the analysis is completed, a report is generated that lists all open source and other third-party projects, files, or snippets in your code portfolio. The overview contains all of the information about the folder that was scanned. Statistics such as number of files scanned, the number of extensions observed in the portfolio, as well as the software languages detected.
All data provided in the online report contains hyperlinks for more information. The report contains the top five open source or proprietary licenses found and a link to a complete list of licenses encountered within the package.
The report also lists the number of files matching open source projects fully or partially. It also highlights files that are not in the public domain but either:
- Contain information within the files (such as licensing, copyrights, or authorship) or
- Contain no information at all (possibly proprietary code that does not contain header information).
Other information is presented in the overview, for example:
- Encryption content detected during scan
- Common patterns that were ignored
- Number of files smaller than the detection threshold set in the policy
Figure 5 shows a sample of the overview summary.
Figure 5. Overview summary
The Full File Matches tab lets you explore details of files fully matching those in public domain further. The report can be sorted by the detected projects, licenses, or copyrights. All information is hyperlinked. For example, clicking on the hyperlinked project brings up more detail on that open source project.
File details can also be explored. The content of any file in a portfolio, the matching file in the public domain, and information on how licensing or copyright for a file was derived are all presented.
Other data such as any security vulnerabilities reported on a detected open source project, whether the open source software project includes encryption content, and any published export control information are also presented.
The Partial File Matches tab allows you explore files that contain modified open source content. A
diff function allows the user to visually examine the matching file snippets, with local file and the open source software file aligned side by side and the matches color-coded for ease of assessment.
The third step in a portfolio assessment process is manual sign-off or confirmation of the machine-generated report. Clicking on a file, a folder, or a whole project brings up the confirmation window. At this stage, the information provided by the analyser can be approved. Files and projects can be approved or rejected for use within company products.
The final step in a scan activity is reporting on the project. ProtecodeCloud has a number of reporting capabilities that cater to different purposes and audiences.
The License Obligations Report (LOR) provides an actionable list of obligations derived from the license terms and your method of using the open source packages in your project. While not a replacement for legal counsel, the LOR provides a checklist for release managers to ensure they are in compliance with various license terms found in the portfolio depending on how the product attributes (for example, if the product provides Digital Rights Management or DRM functions) and usage scenarios.
Based on your answers, LOR generates a bulleted list of actions you must take in order to be in compliance with the specific licenses you selected. Furthermore, the action list can be traced back to the original text of the license for further information. Figure 6 offers a glimpse of the LOR screen.
Figure 6. The License Obligations Report (LOR) screen
The License Compatibility Report (LCR) provides a holistic view of your software portfolio and checks to see if the terms of the licenses in your portfolio are compatible with each other. For example, the terms of the GPL license may conflict with the terms of the EPL (Eclipse Public License) or Apache license in some circumstances. Figure 7 shows the LCR screen.
Figure 7. The License Compatibility Report (LCR) screen
The Export Control Classification Number (ECCN) report gives you the ECCN numbers for packages found in your software. You might need this information if your software is intended for export (Figure 8).
Figure 8. The Export Control Classification Number (ECCN) screen
The Security Vulnerability Report generates a comprehensive report of all security vulnerabilities associated with open source software components used in the project based on information found in the National Vulnerabilities Database (Figure 9).
Figure 9. The Security Vulnerability Report
Other report functions available include:
- The Encryption Package Report displays encryption packages and crypto information found in your portfolio.
- The Concatenated License List is a single file containing all open source projects in your product with their full license text. This list is ready to be shipped with your product.
- The Printable Report provides a full software bill of materials. This can be distributed throughout the organization or attached to the final product.
ProtecodeCloud fully supports Software Package Data Exchange (SPDX) standards. SPDX is an emerging industry standard, driven by the Linux® Foundation, for communicating information about the components, licenses, and copyrights associated with a software package both internally and across the software supply chain. SPDX was designed to streamline software supply chain management.
SPDX files can be used to communicate the composition of a software package between business partners for quality and compliance purposes, or within an organization for the additional objective of code portfolio management. Information included in an SPDX file includes who generated the license information, whether it was generated manually or automatically, who reviewed it, what the license is, authorship or copyright associated with the package, and the files within that package.
ProtecodeCloud can read SPDX files during analysis of a software package, allow a user to add comments, and automatically generate a new SPDX file for the analyzed package. Protecode's implementation of the SPDX standard allows complete flexibility through a number of pre-filled information panels that can be complemented by an auditor. Both declared and automatically discovered information, as well as entries and comments by an auditor can be captured and included in the final SPDX file, generated at a click and on demand. Users have a choice of SPDX output files and can select between RDF (Resource Description Framework), spreadsheet, or SPDX Tag Values.
Remember, application development doesn't just involve the creation of an app; it also includes successfully deploying that application. And the word "success" can mean legal success as much as it does technology success.
Organizations that develop applications can take full advantage of the benefits of open source software while complying with their obligations and managing quality risks. To maintain a complete picture of all open source and other third-party components in a software portfolio and to ensure compliance with organizational policies and open source software licenses, a systematic open source software adoption process or OSSAP is indispensable.
ProtecodeCloud enables organizations to implement key steps of OSSAP, complementing other software development activities within IBM SmartCloud environment. Over time, policy-based scanning and analysis creates a software inventory for the organization, keeping track of all internal and external software attributes. Various reporting capabilities such as a complete list of all third-party components, license obligations and license compatibility reports, along with reports on security vulnerabilities, export control and encryption content satisfies compliance and quality objectives with the organization.
Support for SPDX documents and automatic generation of the text of all licenses within a portfolio can smooth supply-chain interactions and reduce the effort required to comply with attribution requirements of various licenses. Managed adoption of open source software within IBM SmartCloud, enabled by the ProtecodeCloud scanning solution and its techniques allows the benefits of open source software to be fully realized without compromising compliance or quality in a software project.
- Learn more about ProtecodeCloud, available to use on IBM SmartCloud Enterprise.
- Learn more about cloud computing technologies at cloud at developerWorks.
- Learn more about open source technologies at open source at developerWorks.
- Follow developerWorks on Twitter.
- Watch developerWorks demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Access IBM SmartCloud Enterprise.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.