Where to start data mining in wholesale distribution

Apply predictive analytics in distribution from start to big data


Predictive analytics and its associated analytical processes are the subject of much writing, and if you think you're seeing more of it in many different forums, you're correct. These writings aren't restricted to dry, technical discussions, either. You can read about predictive analytics (an updated and expanded term for data mining) in IT and technical magazines and websites, business operations and distribution industry magazines, and even in common news magazines.

Although predictive analytics cannot be considered mainstream in the wholesale distribution industry, yet, it is becoming more widely used. As with many technologies, there is a flow of adoption starting with the largest enterprises and flowing to mid-sized companies. Concurrent with increasing adoption is the proliferation of tools both commercial and open source. There are so many tools at this point that anyone not versed in the subject might get lost in the process of picking a tool set to move forward with.

Defining predictive analytics

To get started, let's look at what predictive analytics is not:

  • Not reporting. Presenting summarized information from a transactional database is useful, but it is not predictive analytics. Predictive analytics uses statistical processes to present business users with information that cannot be gleaned using traditional reporting.
  • Not online analytical processing (OLAP), data cubes, or in-memory databases. Although the advent of nonrelational data storage technologies is a boon for delivering information to business users, it is not predictive analytics. Not to belittle the advancements in performance of in-memory databases and OLAP engines, but merely placing historical information in these formats does not increase insight for the business decision-maker.
  • Not spreadsheets. This one is on the fence. The most popular spreadsheet application does have some statistics that go beyond the normal max, min, sum, and average types of calculations. (Few people are familiar with using the more advanced statistical functions, though.) It can perform several types of regression that are useful for predicting future trends. That notwithstanding, spreadsheets have severe limitations in the amount of data they can handle, the speed they can do it with, and their ability to apply predictions (that is, to make a prediction on new data and communicate that prediction to others).

As to what predictive analytics is, that can vary depending on whom you speak with. My generic definition is that predicative analytics is the process of analyzing data using automated statistical processes and summarizing results into useful information. The form of the useful information can vary greatly, as well, but for the distributor, it should be in a format that is actionable by business decision-makers or can be coded into applications for automatic inclusion in enterprise resource planning (ERP)-based business logic.

Predictive analytics is useful, because there is too much data for a single person to absorb, analyze, and act on in your ERP system and other, non-ERP databases. The order history data, customer relationship management (CRM) data, and purchasing and inventory data comes in and is accumulated in the ERP systems at a pace that is steady and manageable for the servers. You have reports that summarize this information, and executives and line-of-business (LOB) users refer to those reports continually. However, this historical information by itself provides neither predictive nor prescriptive recommendations. That is where predictive analytics goes to work.

The concepts, techniques, and tools that large distributors use can be successfully applied to operations and data at mid-sized distributors. Let's discuss where you can leverage the data in your ERP application using predictive analytics. Then, you can explore tools for performing and deploying predictive analytics and big data as well as how you can use the tools and concepts of big data on unstructured or semi-structured data.

Examples of predictive analytics in distribution

Quick Internet searches with relevant terms yield many examples of applying predictive analytics in different functional departments. Here are a few of my favorites.


A long-used application of predictive analytics is in purchasing optimization. Mid-sized distributors have often installed systems that observe the inventory and order history of individual products to recommend purchasing quantities and schedules. The net effect is to reduce inventory levels.

Seasonality is often a hidden component in overstocking. Predictive analytics spots seasonality trends. More impressively, some distributors use predictive analytics to identify sequential seasonality. For example, a distributor of holiday decorations may find through the predictive analytics process that the sale of artificial Christmas trees and strands of decorative lights show the same seasonal trends but are five days apart in their peaks and progression.


Customer credit is always a tough subject to deal with. When your company builds a history, you can apply predictive analytics to your CRM and accounts receivable (AR) files to monitor individual and groups of customers. Usually, credit is extended based on a report on the customer from an outside agency when the customer is new to you. Rarely is that report reviewed until there is a problem with the customer's AR in days outstanding or total credit extended. Predictive analytics models can examine the history of customers that have gone bad and look for warning signs. Some distributors have combined the AR and CRM files in a predictive model and found that an increasing delay in customers returning phone calls is a severe warning sign.


Breaking customers into groups is a natural thing that most organizations do to plan and target. Although a heuristic segmentation is easy to adopt, applying predictive analytics techniques can create a more fine-tuned segmentation system. The segmentation model is then applied to prospective and new customers. Specific uses include getting new customers' ordering patterns to more quickly matching that of longstanding, good customers in the same segment.

Another great use of a customer segmentation model in distribution is for customer life cycle management. Knowing how customers progress through different stages of being your customer will help your company design programs and incentives that help maintain those customers.


Some of the concepts of using predictive analytics in marketing come directly over to the sales department, too. Knowing the life cycle of a customer helps the salesperson in identifying places where the distributor may be slowly losing business.

Sales departments have many other potential uses for predictive analytics. My favorite is the cross-sell model—an automated or semi-automated system that presents products that the customer is most likely to purchase but has not already. Adding lines to an order is one of the best ways distributors can most directly increase margins. True predictive analytics is more than just displaying the top products in a department for cross-selling. Oftentimes, the customer will already be purchasing those. The best cross-sell models act almost as personal shopping assistants in recommending products that have a positive but non-obvious association.

The cross-sell model is also useful in converting lower-value customers to higher value. Think of it as enabling the salesperson to increase what a customer is ordering one item at a time. Such a method is subtle and effective.

IBM tools and software solutions

As mentioned, there are many tools that can do some or all of what you need to get done. There are hundreds of products out there—some old, some new, some commercial, and some open source. If you are looking for an IBM-based tool, there is a clear progression for how you can proceed from test to production to big data as the (current) ultimate application of data mining to business data.

IBM® SPSS® Statistics is the base package you need to start out in predictive analytics. Many courses (maybe you even remember your college stats course) are available to guide you through the processes of analyzing data. As you progress, you add other modules to SPSS. The intent with using SPSS Statistics is to get a base statistics knowledge and be able to apply that knowledge to your data. I can assure you that even though this is the starting step, you and your company will reap a lot of value from the insight into patterns and trends that you discover.

After demonstrating within your company the utility of using statistical analysis to gain insight, you will see the absolute return on investment (ROI) and be easily able to get management to move to the next level. That next level is IBM SPSS Modeler. This is where you will be able to get down and dirty with applying predictive analytics to your data in a production environment.

There are two versions of SPSS Modeler that apply to structured data: Professional and Server. As a mid-sized company, you will almost certainly start with Professional, and then move to Server as you deploy the predictive models into automated processing tasks. (See Related topics for more information.)

Along the way, you will also see the benefits of a data warehouse. Nothing can slow down the predictive analytics process like bad data (or lead to erroneous models). Using a cleansed data source (that is, a data warehouse or data mart) takes a huge step out of a data mining project. IBM helps here in providing a family of data warehouse packs, all within the IBM InfoSphere® product line. There are several warehouse packs you can use depending on the areas of the company you're working on. Check out InfoSphere Warehouse Packs for Customer Insight, Market and Campaign Insight, and Supply Chain Insight.

The above progression gets you playing with the Fortune 500 in predictive analytics. Read on for a look at the future.

Big data

Big data refers to analysis of unstructured and semi-structured data. In the general discussion, the term often refers to text analytics, including looking at Twitter, Facebook, and other social media sites to glean information. This kind of data does not fit the traditional mold of transaction data and so is different to work with.

Unstructured data can take other forms for the distributor, as well. Think of radio-frequency ID tags in products and their movement records in the warehouse, around the warehouse, and to the customer. If your company runs its own delivery fleet, the truck sensors can provide another source of valuable information to help optimize the delivery process.

Analyzing this mass of unstructured data is the realm of big data.

IBM provides a clear path to big data. Within the InfoSphere product family is the BigInsights product line. This line is built on several open source products, including Apache Hadoop for data storage and several administrative and query languages designed to work with massive amounts of unstructured data.

You can start at no cost by downloading and installing the IBM InfoSphere BigInsights Basic Edition (see Related topics). As you progress in the amount of data you have and the types of analysis you want, you have some choices to make.

The next step for software is IBM InfoSphere BigInsights Enterprise Edition. Here, you can use all the power of the included tools to extract insight. Your choice is where to deploy it.

By design, the underlying technologies in BigInsights are designed to run on commodity hardware (with certain exceptions for some control nodes). Not everyone has a lot of hardware just hanging around, so IBM can also host BigInsights in the IBM cloud. You can configure how many storage and processing nodes you want and only pay for the hourly processing you use.


Predictive analytics will make a difference in your distribution organization. Sifting through the success stories, you see benefits in almost every department. Many times, the application of predictive analytics leads to thousands or millions of small, incremental improvements. Each single improvement is almost unnoticeable, but when multiplied across thousands of customers or millions of transactions, the net for the company is huge.

It's difficult to believe that there will be less data in your company in the future or that there will be less analysis. Getting the skills and knowledge of the tools now is the best way to stay relevant in your organization. Getting into predictive analytics now means you will lead in your company but can lean on the experiences of people outside your organization.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Big data and analytics
ArticleTitle=Where to start data mining in wholesale distribution