After years and much money invested in technology to record and store data on virtually every transaction and from the vast array of instrumented objects, customers want to get more mileage out of that information. Businesses want information that is more timely and useful, particularly if it can directly and positively affect growth and profitability.
Data analysis encompasses various problem domains including retail sales, fraud, consumer/client acquisition and retention, security, and financial services, and therefore many technologies. Key standards and technologies used to support the creation of solutions to the various problem domains are given along with the value they deliver.
For years, the IT industry has spent untold time and money creating systems to record data and transactions. In addition, the number of devices that produce data that is collected is growing exponentially. Furthermore, vast data storage systems are available to store this data, and fast networks exist to transmit it between data centers and machines that process it. Businesses want to take advantage of the investment in the available data to gain timely and useful insights to feed growth and profitability.
What is business analytics?
Business analytics is technology that delivers immediate and actionable insights into how a business is performing. It enables you to spot and analyze trends, patterns, and anomalies so you can plan, budget, and forecast resources. The goal is to make smarter decisions that lead to better and more profitable outcomes. The opportunity to create business value through data is enhanced by the sheer volume of available data. The challenge lies in producing analytic output that creates this value in a cost-effective manner. Business analytics refers to the analysis and organization of data and the delivery of meaningful business information on time and in convenient forms. For example, real-time alerts or executive dashboards are forms of presentation that show high-level measurements of corporate performance. By delivering information online, rather than in static reports, business analytic tools allow you to know relevant business facts sooner while allowing you to "drill down" to examine details by clicking a chart to see the numbers behind it.
Business analytics is not a single product or technology, but a technology domain that requires many products to interoperate. An analytics system analyzes data that is likely stored in disparate databases and warehouses in various data formats. In addition, the system might also incorporate real-time data feeds to analyze in conjunction with historical data. While the data is analyzed, rules might be applied, predictive or optimization models incorporated, and different forms of output produced depending on the scenario or problem being solved.
Consider a retail store trying to retain existing customers. The customer's product-buying history might be stored in one database while the customer's transaction history is in another. The retail store can glean what types of products are purchased, how much money a particular customer has spent on these products during different times of the year, and how purchasing offers influence buying decisions, and so on. The retail store also has real-time data that is not stored in the aforementioned databases, such as what is moving onto and off of its shelves now based on live sales data. Using all of this data, a predictive model can be built to determine with a level of confidence how likely a particular customer is to purchase incoming or existing products at the store. Based on these various factors, this model can be combined with business rules, customer demographics, and historical buying patterns and choices to make intelligent decisions. For example, a store might take action in real time through a special offer at the point of sale, or it might determine the best time to offer and advertise incentives and sales and who to target with them. Analytics can yield interesting and useful customer insights to understand customer trends and behavior and to make sure that customers know about specific and targeted offers.
Scenarios are made up of multiple databases with historical information, real-time data feeds, predictive or optimization models, business rules, and a user interface dashboard all working in concert with one another, but not designed or developed to necessarily solve the particular problem. Standards best address these complex interactions between the various products and systems due to the tight communication required. Standards provide customer benefit as they know that their data, rules, predictive models, and so on are stored in a format or are accessible in an open way and not controlled by a single vendor. Standards allow the freedom of action customers desire to not be locked into a particular tool set, data format, or protocol. In addition, standards allow disparate systems to work together without these systems being built with the other in mind.
The focus of business analytics is to develop new insights and an understanding of a business based on statistical methods and analysis applied to this data, leading to better and more informed decisions. Business analytics software can provide this and other types of actionable insights for these and other types of problems by analyzing huge amounts of data in a short period of time.
Analysis of data
Data analysis is not new; however, some of the challenges today include these:
- The vast amount of data that you must process, or you can process, to produce accurate and actionable results
- The speed at which you need to analyze data to produce results
- The type of data that you analyze—structured versus unstructured
Amount of data
Analytic systems today must be able to handle Internet-scale data volumes. Online data is growing rapidly, and terms like terabyte, petabyte, and exabyte are commonly used. (See Table 1.)
Table 1. Definitions and estimations of data volumes
|Gigabyte: 1024 megabytes||4.7 Gigabytes: A single DVD|
|Terabyte: 1024 gigabytes|
1 Terabyte: About two years worth of non-stop MP3s. (Assumes one megabyte per minute of music)
10 Terabytes: The printed collection of the U.S. Library of Congress
|Petabyte: 1024 terabytes|
1 Petabyte: The amount of data stored on a stack of CDs about 2 miles high or 13 years of HD-TV video
20 Petabytes: The storage capacity of all hard disk drives created in 1995
|Exabyte: 1024 petabytes|
1 Exabyte: One billion gigabytes
5 Exabytes: All words ever spoken by mankind
In 2002, there were about five exabytes of data online. In 2009, that total increased to 281 exabytes, a growth rate of 56 times in seven years. According to Forrester Research Inc., the total amount of data warehoused by enterprises is doubling every three years.
Internet-scale refers to the terabyte and petabyte age of data sizes and the ability to scale to meet the processing requirements to handle this amount of data in a timely manner. The amount of data to be processed includes stored data, as well as real-time streaming data. Virtually everything is electronically recorded today: video and audio surveillance, banking transactions, purchasing transactions, email traffic, instant messaging traffic, Internet searches, medical images and records, and more.
For example, consider the simple scenario of driving home from work and stopping to buy gas. As you leave your place of work and walk to your car, you are likely recorded on video surveillance cameras. As you drive, your cell phone might be sending GPS location information that is recorded. You then receive a text message while driving home. The time and content of these messages are stored by your carrier. You wait to answer it until you pull into the gas station, where another set of video surveillance cameras records the activity. Your gas purchase transaction is then recorded, along with your frequent buyer card that you scanned at the pump. The gas station happens to be in a high-crime area that the city is monitoring with technology such as ShotSpotter (see Resources for a link). ShotSpotter uses microphones positioned in various locations to record and listen for gunshots. If a gunshot is heard, authorities are notified immediately and video surveillance is taken of the area. Therefore, while you are at the gas station, audio is being analyzed and recorded.
A sizeable portion of the rise in warehoused data can and will be attributed to Electronic Medical Records (EMRs). EMRs and advances in medical imaging, along with the length of time they need to be stored (seven years according to U.S. federal law), will continue to contribute to the massive growth of warehoused data. This warehoused data creates data volumes at a scale previously unthinkable. In addition, video and audio feeds are extremely costly to store due to the large volumes of this type of data collected, coupled with its poor compression characteristics. This high volume makes real-time analysis of this type of data important, which enables a selective way to store only the pertinent parts.
Data is being recorded everywhere from virtually everything that moves, and many things that don't. In addition to a typically recorded transaction, many innocuous objects, such as parking lots, buildings, and street corners, are instrumented and record large volumes of data around the clock.
With the amount of stored data growing constantly and exponentially, so too is the amount of data that a business analytics system must process to produce relevant results. Consider that Twitter processes seven terabytes of data every day, while Facebook processes 10 terabytes each day. The CERN Hadron Collider generates 40 terabytes every second. Without analytic systems that scale to these volumes, the data collected loses value.
To put this volume in perspective, Yahoo! reported using Hadoop to sort one petabyte of data in about 16 hours (see Resources to learn more about these benchmarks). This sorting required about 3800 nodes with two quad core 2.5 Ghz processors per node. All other things being equal, sorting an exabyte on the same cluster would take about 1000 times longer, or almost two years.
Business analytic systems also process real-time streaming data that has not yet been stored. The speed at which the large sums of data and the real-time data is processed is critical to produce key insights in a timely manner. In some business analytics use cases, the correct insight or answer, but provided late in a non-timely fashion, can often be considered the wrong answer. The business analytics system must be able to handle large volumes of data, process it efficiently, and come to its result in a window of time that is relevant to the user. For example, a facial recognition system working off a real-time video feed is of much higher value if the system indicates that a wanted suspect is at a specific location one minute, instead of one day, after the fact.
Structured versus unstructured data
Most data produced today is unstructured. Unstructured means there is no latent meaning attached to the data such that a computer program can understand what it represents. Structured data is data that has semantic meaning attached, making it easier to be understood. For example, the following text message or email contains unstructured data:
Hi Joe, call me...my numbers are home – 919-555-1212, office – 919-555-1213, cell – 919-555-1213.
By reading this message, a human knows the latent meaning and that of the data and can tell you what the home, office, and cell numbers are. To represent the same data in HTML, the data now looks structured through its layout and how the HTML is organized in a nested fashion. The data, however, is unstructured to an analytical system because there is no meaning associated with it. HTML, emails, text messages, blogs, video, and audio all represent unstructured information. If the relevant phone number information is put into HTML, you might have this:
<h1>List of Numbers</h1> <b>HNumber: 919-555-1212</b> <b>ONumber: 919-555-1213</b> <b>CNumber: 919-555-1214</b>
The HTML looks structured as described here, but not the type of structure that applies the latent meaning to the data. This data is still unstructured as far as an analytics processing system is concerned. Furthermore, if you used XML without a schema, it would also be unstructured in the same way that the HTML is:
<List of Numbers> <HNumber>919-555-1212</HNumber> <ONumber>919-555-1213</ONumber> <CNumber>919-555-1214</CNumber> </List of Numbers>
XML is often referred to as semi-structured. There is structure in the relationships of the data, but the data is not structured with regard to the meaning of that data. With a schema, you can now say that the above XML is structured because you now have a way to attach meaning to the data. With a schema, you know that the HNumber, ONumber, and CNumber elements all represent different phone numbers for Home, Office, and Cell, respectively. Databases contain structured data as well. Data stored in rows and columns with a schema allow the meaning of the data to be understood by a computer program.
Some of the value of different analytics products is their ability to process large amounts of unstructured data to discover the latent meaning. Consider the text message, HTML, and schema-less XML examples above. A computer program can figure out that those are likely phone numbers because they match a pattern of three digits, followed by a separator [in the form of a hyphen (-), a period (.), or a space ( )], followed by three more digits, a separator, then four digits. More processing can be done to infer that the three numbers are from North Carolina due to the 919 area code. You can imagine a similar algorithm for an international number with a country code.
Structured data is simpler to process because more information is available to the program beforehand in order for it to determine the data's meaning. This approach is more efficient as opposed to spending compute cycles to figure it out. Much of the growth of data in today's age, however, is that of unstructured data, making it critical for systems to be able to process it efficiently and to correctly determine the meaning contained within it. For example, emails and text messages as well as audio and video streams are some of the largest categories of unstructured data today. This type of unstructured data continues to grow unabated, making the efficient processing of it critical to the continued success of business analytic processing systems.
While the amount, speed, and type of data are all challenges facing business analytic systems, great strides are being made in addressing these issues. Processing on huge datasets that used to take weeks now takes minutes. Real-time feeds can be processed efficiently while the data is still in motion, running on scale-out clusters with fail-over capability, and all performed on commodity machines. This kind of processing enables the creation of applications unthinkable just a few years ago. For this area of computing to have maximum benefit, software standards play an important role.
Predictive analytics is where software uses various historical data sources to make predictions about future events or behavior. The predictions are provided with a level of confidence for the prediction.
Data in motion analytics
Data "in motion" analytics is the analysis of data before it has come to rest on a hard drive or other storage medium. Due to the vast amount of data being collected today, it is often not feasible to store the data first before analyzing it. In addition, even if you have the space to store the data first, additional time is required to store and then analyze. This time delay is often not acceptable in some use cases.
Data at rest analytics
Due to the vast amounts of data stored, technology is needed to sift through it, make sense of it, and draw conclusions from it. Much data is stored in relation or OLAP stores. But, more data today is not stored in a structured manner. With the explosive growth of unstructured data, technology is required to provide analytics on relational, non-relational, structured, and unstructured data sources.
Rules are used to define or constrain some aspect of the business to make more intelligent decisions. Rules are stored outside of application logic, making it easy for a business person to add or modify the rules while not taking the system offline.
Reports take the form of user interface dashboards of varying degrees of complexity.
This section describes some of the key standards and their relevance and value to supporting data analysis.
UIMA (Unstructured Information Management Architecture) is an OASIS standard in which IBM was the chair of the technical committee (see Resources). UIMA is a framework to process unstructured information, discover the latent meaning, relationships, and relevant facts contained in that data, and represent those findings in an open and standard form. For example, UIMA can be used to ingest plain text and determine the people, places, organizations, and relationships, such as "is friends with" or "is married to" contained in the data. These findings are represented in a data structure defined by the UIMA standard.
UIMA defines four terms to help in understanding its role and purpose:
- Artifact—A piece of unstructured content
- Analysis—Assigns semantics to an artifact
- Analytic—Software that performs the analysis
- Artifact metadata—The result of analysis of an artifact by an analytic
Consider a large collection of fast food restaurant surveys, which amounts to a large amount of unstructured text. This information is analyzed to find the most common reasons for complaints, to identify the names and locations of stores with the most complaints, and for each type of complaint, to see which stores generated the most complaints. You can use UIMA to glean this type of information so you can see trends and the type of complaints. You can also see which complaint types become rarer and which increase.
Referring to Figure 1, the raw survey data represents the artifact (1), as it is unstructured content. The analysis assigns meaning to the artifacts (2). For example, stores 15 and 38 have the most complaints about the desserts, while store 27 has reduced its complaints by half since the last survey. The analytic is typically proprietary software that performs this analysis and produces the artifact metadata (3). The artifact metadata is contained in a data structure known as the Common Analysis Structure (CAS).
Figure 1. High-level view of UIMA
One goal of UIMA is to support interoperability of analytics. The CAS allows for the sharing of these results across analytics. This approach benefits customers by allowing them to share the data representations and interfaces between various tools and products that support UIMA. Given the example in Figure 1, an analytic could interoperate with a tool that performs the analysis on the artifacts if both supported UIMA. This ability enables various tools to interoperate and allows customers to choose different vendors for the analysis of their unstructured data.
UIMA supports a common data representation of artifacts and artifact metadata independently of the original representation of the artifact. It also allows for platform-independent interchange of artifact and artifact metadata while allowing you to discover, reuse, and compose independently developed analytics. Furthermore, UIMA provides interoperability of independently developed analytics. UIMA is the leading technology in this area and is backed by Apache open source implementations. The 1.0 specification is complete as of March 2009, with no further work planned. (For a link to the UIMA specification, see Resources.)
PMML (Predictive Model Markup Language) is an XML-based markup language developed by the Data Mining Group (DMG) in which IBM is a contributor. (See Resources.) PMML represents a predictive model that is created after analyzing historical data for various insights.
For example, assume that a telecommunications company wants to analyze historical data to predict, with some level of certainty, whether customers will drop their land-line service in favor of cell service. The algorithm (1 in Figure 2) looks at historical data and produces parameters for an equation across multiple input fields (age, salary, marital status, home owner or renter, level of education, and so on) that best can predict whether the customer is likely to drop the service. The algorithm produces a PMML model (2) which is the input to a scoring process (3). The scoring process outputs a prediction (4) on whether a particular customer is likely to drop the service along with an indicator of the confidence of this prediction. Higher confidence in the prediction that you will lose a customer might dictate a more aggressive response.
Figure 2. High-level view of PMML
PMML is a model exchange standard to share models between vendors. PMML provides applications with vendor-independent models with the goal that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. This is beneficial and allows users to develop models within one vendor's application and use another vendor's applications to visualize, analyze, evaluate, and use the models. Because PMML is an XML-based standard, the specification comes in the form of an XML schema.
Adoption of PMML in the industry is strong, as indicated by this list of current adoption in the industry. (For a link to a web page, see Resources.)
- Augustus / Open Data Group
- Pervasive DataRush
- Salford Systems
RIF (Rule Interchange Format) is a W3C standard in which IBM was the co-chair. RIF represents, in XML, the executable form of a business rule. Business rules can be used in business analytic systems in various ways. Rules are used to determine specific actions that the system takes based on various conditions and input. For example, a mortgage lending company would have rules to determine if a person qualified for a loan. Factors such as income, debt, and credit score would all play a role. The rules might be of the form: If borrower has income above X, debt less than Y, and a credit score above Z, the borrower qualifies for a given loan amount. Different vendors have their own proprietary way to write the rules, but RIF enables a common and interoperable format for their executable format.
RIF was designed primarily for the interchange of rules between rule engines. RIF delivers value because it provides interoperability between rule execution systems while preventing lock-in by rule vendors. This interoperability enables users to employ various tools to create their business rules but interoperate with various rule execution systems that support RIF.
RIF became a W3C recommendation in June 2010. Therefore, industry adoption is developing as this list of reference implementations of RIF indicates. (For a link to a web page, see Resources.)
- Oracle (OBR)
- STI Innsbruck (IRIS)
- WebSphere ILOG JRULES
These implementations were of the RIF standard as it was developed. Several of these companies might implement the full standard, although that is not assured.
XBRL (eXtensible Business Reporting Language) is an XML-based standard by XBRL International used for financial reporting. XBRL is relevant because it is mandated and/or adopted by various governments and countries as the standard format for providing financial reports. With its use growing, the analysis of XBRL documents and the data they contain becomes relevant.
Traditionally, reports are produced in HTML or PDF. These formats, while easy to read by a human, though, are not structured. XBRL is structured because it is provided in XML with a well-known schema, but it is not very human readable. Therefore, meaning can be inferred from the data making the document structured and more useful by a computer program.
Recently, the SEC began to require 500 of the largest public companies to begin filing their financial statements using XBRL. This requirement will gradually expand to include smaller public companies in the future. Companies with market capitalization above $5 billion began filing in XBRL in 2009, but this year they must submit financial statements with more detailed tagging of footnotes. Those with market capitalization above $700 million must make their initial submission in XBRL without detailed tagging of footnotes. All publicly held Korean firms have been required since October 2007 to electronically file their periodic and other financial reports in the XBRL format. Required XBRL filings are being used in Japan by the Tokyo Stock Exchange (TSE), which accounts for 90% of all trades made on Japanese stock exchanges. Since 2008, the TSE requires all listed entities to file their financial information with the TSE in the XBRL format.
XBRL has been adopted and mandated across several of the most mature world economies. Table 2 identifies several of the XBRL adoptions across the globe.
Table 2. XBRL adoption
|Netherlands||Dutch Tax Authority||Corporate tax returns|
|Australia||Australian Prudential Review Authority (APRA)||Prudential filings|
|Jamaica||Bank of Jamaica||Financial companies' registered filings|
|United States||Federal Financial Institutions Examination Council (FFIEC)||Call report modernization|
|United States||Securities and Exchange Commission||XBRL voluntary filer program|
|Belgium||National Bank of Belgium||Belgium companies' annual account filings|
|Japan||Bank of Japan||Financial services companies' filings|
|Spain||Bank of Spain||COREP filings|
|Canada||Ontario Securities Commission (OSC)||Voluntary filer program|
|Japan||Tokyo Stock Exchange (TSE)||TSE registrant financial report filings|
Web Ontology Language (OWL) is a high-level language for representing ontologies of information or models. For example, Joe is a human, is married to Jane, and is a male. Sam is a human, is married to Sue, is a male, and is a husband. Therefore, you can deduce that Joe is a husband. These interactions are being explored because XML Schema often has poor semantics and requires more human interactions to deduce similar facts. With OWL, you are able to more easily deduce knowledge programmatically, making OWL useful for exchanging models and using them in rule-based systems.
The following depicts a retail scenario that uses the various standards mentioned previously.
Figure 3 shows the high-level components in this scenario. The components consist of:
- Databases that contain historical data (data at rest)
- Feeds of real-time data (data in motion)
- Engines that perform the analytics on that data
- Predictive analytics
- Business rules
- User interfaces using dashboards to display results or alerts, while allowing user interactions
Figure 3. Components of the scenario
Figure 4 shows current and future key integration points between the different components (in Figure 3) where the various standards discussed previously interact and provide interoperability benefits. Historical data use a variety of standards, such as XML, CSV, XLS, PDF, DITA, and XBRL. The analytic engines frequently use UIMA. Predictive analytics and business rules commonly use the PMML and RIF standards, respectively.
Figure 4. Key integration points
The next several figures step through the scenario and explain the value that the standards bring. The standards play an important role, especially when you deploy this type of solution into an existing heterogeneous customer environment. This scenario depicts a large retail store solution that is attempting to use historical and real-time data to increase sales, retain existing customers, and attract new ones.
Figure 5 shows the retail chain's historical data in different databases and stored in various data formats. This scenario includes data such as customer transaction data, preferences, purchasing history, demographic information, survey data, customer call center notes and recordings, and so on. In addition, a real-time data feed is provided. This feed might include data such as up-to-the-minute transactions per store or region, live transaction data per customer or group of customers, live customer call center feeds, video surveillance feeds, products in route to various store locations, and so on.
Figure 5. Historical and real-time data
Each successive figure uses shading to indicate the new portion of the picture that was added. Figure 6 shows Hadoop used for historical data analysis to provide analytics on structured and unstructured data. For example, the analysis of this historical data might reveal information about buying patterns for particular customers, purchasing preferences, attitudes on competing retailers, and more. Note the introduction of the UIMA standard to share the analytical output with other systems to enable interoperability.
Figure 6. Historical data analysis
Figure 7 shows the introduction of a real-time analysis engine. These engines can ingest and process real-time in-motion data that is structured or unstructured. In addition, you can feed results from the historical analysis into the real-time engine to help discover additional insights. For example, consider a historical analysis that shows sales of a particular product are best during the weekend days but sluggish otherwise. Furthermore, the real-time analysis shows that the particular product is low in inventory and that the weekend is approaching. An alert can be raised about this situation in hopes of correcting it.
Figure 7 also shows a two-way connection between the real-time analysis engine and the historical data in the databases. The engine might use historical data to correlate with the real-time data and might also store data periodically. For example, assume that the real-time data contained audio feeds from customer call centers. You would not want to store every minute of every call, but maybe you want to store random calls for quality review later. Calls where the system detects an angry customer could be recorded for later review and analysis.
Figure 7. Real-time data analysis
Figure 8 shows predictive analytics as part of the scenario. (View a larger version Figure 8.) Modeling tools can be used to create a predictive model in PMML. This PMML model can be stored in the database and understood by a real-time analysis engine. For example, you might use the predictive PMML model in this case to determine the likelihood that a particular set of facts from the real-time and historical data will lead to a customer switching loyalties and shopping at a competitor. As the real-time analysis engine processes data, it can use this model to score the facts it is uncovering. This scoring allows the engine to make additional and further insights about the data it is processing.
Figure 8. Predictive analytics
Figure 9 shows that you can inject new PMML models into the analysis engine in real time. (View a larger version Figure 9.) This injection is a powerful concept as you can create and deploy new models while the system is running and based on the data currently being collected.
Figure 9. Real-time PMML model injection
Figure 10 depicts the introduction of business rules into the scenario. (View a larger version Figure 10.) As the real-time analysis engine is processing incoming and historical data looking for sales trends, it can invoke rules created with a business rules management system to make additional intelligent decisions. For example, a rule might say: "If customer A, B, or C (part of your Gold customers) hasn't had a purchasing transaction in the last N number of days, and if their survey data indicates that they may move to a competitor, offer them a specific discount."
Figure 10 also shows the standard, RIF. RIF is used to represent an executable form of a rule. This form enables vendor's rule systems to share rules allowing customers not to be locked into a particular rule vendor.
Figure 10. Business rules deployment
Figure 11 shows how dashboards and visualization features are utilized. (View a larger version Figure 11.) You can create these features by combining the real-time information being processed and historical data stored on traditional or OLAP databases and surfaced as a real-time alert or as an informational dashboard.
Figure 11. Dashboards and visualization
With the explosion of collected and available data, coupled with the expectation of gaining new and additional insights from that data, the pressure is on to handle, efficiently process, and make sense of data in volumes previously unimaginable. To achieve these goals requires multiple systems and technologies, both legacy and new, working together. This integration between technologies calls for standards to enable the interoperability required to integrate the data, products, and technologies to efficiently achieve the goals expected by business and consumers.
- ShotSpotter: Visit the ShotSpotter website.
- PMML Powered: Visit list of companies that have adopted PMML, courtesy of the Data Mining Group.
- Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 SecondsRead more about the results and rules for sort benchmarks.
- Implementations - RIF: View a summary of the implementation reports received by the W3C.
- OASIS Unstructured Information Management Architecture (UIMA) TC: Read more about standardizing semantic search and content analytics in the UIMA project and specification at OASIS
- Apache UIMA: Learn about the Apache UIMA project through its documentation and source code.
- PMML 4.0 - General Structure of a PMML Document: Explore how to use XML to represent mining models in the PMML project and specification at The Data Mining Group.
- RIF: Check out the RIF project and specification at W3C.
- XBRL: Visit the XBRL project and specification at XBRL International to learn more about this language for the electronic communication of business and financial data.
- The Apache Hadoop project: Learn about the Hadoop framework that allows for the distributed processing of large data sets across clusters
- OWL Web Ontology Language Overview: Read further about the Web Ontology Language at W3C.
- New to XML? Get the resources you need to learn XML.
- XML area on developerWorks: Find the resources you need to advance your skills in the XML arena. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.