Contents


Gain keen insights from big data with Datameer on IBM SoftLayer

How prebuilt analytics apps can change your life

Comments

Analyze and visualize data with Datameer

Datameer lets you easily integrate all of your data into Hadoop. It's an end-to-end platform that eliminates the complexity of big data analytics tasks. You can arrive at data-driven decisions in minutes, not months. Datameer is the one-stop shop to get all of your data into Hadoop, analyze that data, and visualize the insights in your preferred format.

The Datameer Analytics App Market is the world's first marketplace for prebuilt analytic applications that let you simply plug in your own data and see the final results graphically. You don't have to build anything.

If you have huge data that is collected from distributed sources, has different structures, has a growing scope, and has varying speed, Datameer can help you achieve data "virtualization." If your data is in the cloud, in legacy databases, and in spreadsheets on your desktop, Hadoop is helpful but not sufficient to make sense out of distributed data. Now, with Datameer, you can integrate all of your data into Hadoop as easily as following a wizard. With built-in connectors to all common structured and unstructured data sources, big data integration is streamlined. You simply indicate in Datameer:

  • What data to bring into Hadoop and how
  • Whether it's a one-time import or streamed in as new data is added
  • Import on a schedule that you determine

Analytics

With Datameer, big data analytics is as simple as using a spreadsheet. To build an analysis, use the wizard to:

  • Select which data to work with in a spreadsheet
  • Choose from over 250 prebuilt analytic functions
  • Use iterative point-and-click analytics at the speed of thought with Datameer's Smart Sampling technology

Datameer is tied in with multiple Hadoop platforms such as Cloudera, Hortonworks, and MapR. Datameer uses IBM BigInsights®, which is a dependable and enterprise-ready implementation of Apache Hadoop. Datameer and Cloudera together provide a complete big data analytics solution. With Cloudera's enterprise-scale data hub, you can centralize and cost-effectively store all of your data in its original fidelity in Hadoop. Any standards-compliant big data analytics platform can be seamlessly attached with the Datameer platform.

Visualization

Data analytics tools help to reveal pragmatic insights that should be presented in a user-preferred format. Datameer's WYSIWYG Business Infographic, packaged with Designer, provides drag-and-drop visualizations regardless of the data type, size, or source. You start with a blank HTML5 canvas to design Infographic reports that will automatically update every time your data updates. You can import any image, embed a video, write free-form text, and customize ad infinitum. Thanks to HTML5, your visualizations are consumable on any device.

System requirements

Recommended hardware for a production environment includes:

  • 1U server
  • 2 quad core CPUs
  • 8+ GB RAM
  • 2 x 1 TB hard drives (recommended available disk space is 250 GB)
  • RAID - 0 stripping
  • RAID - 1 mirroring
  • Redundant power
  • Failover requires a standby server with the same configuration

Table 1 shows the supported operating systems for Datameer.

Table 1. Operating systems that support Datameer
Operating systemVersionComments
Ubuntu 1010.04 LTSMySQL 5.1.41
Ubuntu 1212.04 LTSMySQL 5.5
Debian 5 (Lenny)5.0.5MySQL 5.1.47
Solaris 1010MySQL 5.1.30
Red Hat Enterprise Linux (RHEL)5.5, 6.xMySQL 5.0.77
Fedora13
14
MySQL 5.1.48
MySQL 5.1.60
CentOS5.5
6.x
MySQL 5.0.77
MySQL 5.1.61
Scientific Linux6.1MySQL 5.1.52

Provisioning a CentOS server on IBM SoftLayer

To provision the virtual machines in the SoftLayer cloud, use the following IP details:

  • Public IP: 158.85.184.55
  • Server IP: 10.122.153.190
  • Server name: datameerpoc.softlayer.com
  • Address: 10.122.153.190 / 158.85.184.55
  • User: root / xxxxx

Installing Datameer

  1. Download Datameer from the Datameer website.
  2. Drag the datameer_apache_1.0.3-4.5.0-1.noarch.rpm file into VM in a directory by using WinSCP or FillZilla.
  3. Copy the Datameer software to the usr/local directory, as shown in Figure 1, then give the necessary permissions by entering the following command:
    chmod -R 777 datameer_apache_1.0.3-4.5.0-1.noarch.rpm
    Figure 1. Set file permissions
    command in listing 1 on VM screen
    command in listing 1 on VM screen
  4. From the VM command line, export the package by entering the following command:
    export INSTALL_LOCATION=/usr/local
  5. Before you install Datameer, check whether the Java™ programming language is installed by entering the java -version command.
  6. If the Java language is not installed, install it by entering the following command:
    sudo yum install java-1.7.0-openjdk-devel

    A message displays which version of the Java language was installed along with the dependencies that were installed, as shown in Figure 2.

    Figure 2. Successful Java language installation message
    Java, dependencies installed
    Java, dependencies installed
    Java, dependencies installed
    Java, dependencies installed

    After the installation of the Java language, you can start the Datameer installation.

  7. Expand the archive by entering the following command:
    rpm2cpio datameer_apache_1.0.3-4.5.0-1.noarch.rpm | cpio -idmv

    Files in the archive are listed, as shown in Figure 3.

    Figure 3. Expanded archive
    directory listing
    directory listing

    Refresh the directory path.

Start the Datameer application server

To start the Datameer application server:

  1. Switch to Datameer by entering the commands in Listing 1.
    Listing 1. Switch to Datameer user and start server
    su – datameer
    cd /usr/local/Datameer-trial-5.0.1-apache-1.0.3
    cd bin
    ./conductor.sh start
  2. After you start the Datameer server, open a browser session with the URL http://158.85.184.55:8080, which takes you to the Datameer software agreement, as shown in Figure 4.

    Select I agree with the license terms, then click Continue.

    Figure 4. Software agreement
    text of software agreement
    text of software agreement
  3. You should see the Datameer Dashboard, as shown in Figure 5, which has tabs for Home, Browser, App Market, and Administration.

    On the left side of the Datameer dashboard, the options are Filter, Admin, Analytics, Data, Examples, Images, Users, and Visualization.

    Select Admin on the left of the window.

    Figure 5. Welcome screen
    tutorials to load, analyze, visualize data
    tutorials to load, analyze, visualize data

Upload the CSV file into the Datameer server

To start uploading the CSV file, click the Browser tab, click the + icon, as shown in Figure 6, then select Data > File upload.

Figure 6. Icon to add items
Icon at upper left above Filter heading
Icon at upper left above Filter heading
  1. From the New File Upload window, as shown in Figure 7, click Browse. In the File Type field, select CSV/TSV to use our example file, then click Next.
    Figure 7. Specify file type
    specify file type from pulldown
    specify file type from pulldown
  2. Figure 8 shows the Define Fields tab for our example. The Datameer team provided the data in the sample application. The file shows the ages of people in different cities.
    Figure 8. Define fields
    define fields, rescan schema
    define fields, rescan schema
  3. On the Data Details page, you can enter the Delimiter, Schema, and Column names from the Data Details tab, as shown in Figure 9. In this article, we kept the default data because we don't have any customer schemas here.
    Figure 9. Data details
    set delimiter/schema/ignore lines, Data Details tab
    set delimiter/schema/ignore lines, Data Details tab
  4. For sample size, leave the Sample Records field 5000 in the Sample tab, as shown in Figure 10, then click Next.
    Figure 10. Sample
    set sample record size, Sample tab
    set sample record size, Sample tab
  5. Provide a brief description of the data, as shown in Figure 11, then click Save.
    Figure 11. Save
    describe data, Save tab
    describe data, Save tab
  6. Figure 12 shows that the file is loaded successfully in the tool. Select Drop record, leave the other fields as is, then click Next.
    Figure 12. Placeholders
    placeholders, how to handle invalid data
    placeholders, how to handle invalid data
  7. You should see the uploaded file under the Data tab. Specify the file name and click Save. As shown in Figure 13, you can then see all the saved files.
    Figure 13. Saved files
    Saved files
    Saved files
  8. Double-click on the saved file (FileUpload in Figure 13) to see the current status, as shown in Figure 14.
    Figure 14. Current status of file
    last execution, records, preview, total data
    last execution, records, preview, total data
  9. Click Link data in new workbook and Browse Data to see the results, as shown in Figure 15.
    Figure 15. Results
    numbered columns with name/age/city
    numbered columns with name/age/city
  10. Click Download to see the decision tree, as shown in Figure 16.
    Figure 16. Decision Tree
    create decision tree sheet
    create decision tree sheet
  11. Click the Link data in new workbook tab in Figure 14 to see your choices for working with the example data, as shown in Figure 17.
    Figure 17. Options for analysis
    option icons to use smart analytics
    option icons to use smart analytics
  12. Select the Decision Tree Sheet icon, highlighted in the red box in Figure 17, to go to the Settings window shown in Figure 18. Here you can create the sheets or settings you want. Then, drag the columns and drop them in the setting box.
    Figure 18. Settings
    Data and simple, or Advanced settings
    Data and simple, or Advanced settings
  13. Click Create Sheet to see the output, as shown in Figure 19.
    Figure 19. Spreadsheet
    Name/age/city/prediction columns
    Name/age/city/prediction columns
  14. Select from the toolbar option, highlighted in the red box in Figure 20, to create a Clustering Sheet, Decision Tree Sheet, Recommendation Sheet, Column Dependencies Sheet, and Flip Sheet. (Our example provides just one sample sheet as an introduction to the software.)
    Figure 20. Select sheets
    icons for types of sheets
    icons for types of sheets
  15. The data that you loaded will be stored under the Analytics folder in Workbooks, as shown in Figure 21. To see the data, select the Home tab, then select Analytics.
    Figure 21. Stored data
    SkyTestData file highlighted/type .wbk/status ?
    SkyTestData file highlighted/type .wbk/status ?

Analyzing the data

To start analyzing data:

  1. In Datameer, click the App Market tab, as shown in Figure 22.
    Figure 22. App Market
    latest/top/installed/My apps choices
    latest/top/installed/My apps choices
  2. Select and install the LinkedIn Pro Network. Click Authorize Datameer to retrieve data, as shown in Figure 23. You will be asked for your LinkedIn profile authentications.
    Figure 23. Install LinkedIn Pro Network
    Provide OAuth token info
    Provide OAuth token info

    After providing the relevant details, click OK as prompted. After login to the LinkedIn Pro Network, click Save & Run, as shown in Figure 24.

    Figure 24. Save & Run
    example token info, highlighted Save/Run button
    example token info, highlighted Save/Run button
  3. Figure 25 shows the first screen of the LinkedIn Pro Network and whether your connections are successful.
    Figure 25. LinkedIn Pro Network
    App ready, results available
    App ready, results available
  4. Wait until the data is fully loaded, then click Open infographic to see the LinkedIn Statics screens, as shown in Figure 26. LinkedIn has sorted and visualized your data. For example, you can see how many friends are in your LinkedIn profile, how many mutual friends, where they are in the world, and so on.
    Figure 26. Linkedin Statistics
    people/companies network, top industries
    people/companies network, top industries

    Figure 27 shows your friends' locations from all over the world.

    Figure 27. Linkedin Statistics
    US job locations/countries/job durations
    US job locations/countries/job durations

Examples

This section walks through an example from the Datameer App Market.

Click the App Market tab (shown in Figure 22), then install the Tutorial Email Word app. The app gets data from your LinkedIn profile and filters such things as what times you are logged in, how many times you use the program, and so forth.

The time required to load the app varies based on your networking speed. When you see Install tutorial Email Word Complexity, click Run.Figure 28 shows the progress of the installation.

Figure 28. Starting Tutorial Email Word Complexity
retrieving data/analyzing, hadoop emails checked
retrieving data/analyzing, hadoop emails checked

To see all of the data uploaded into the application, the check mark symbols should be green, as shown in Figure 29.

Figure 29. All data uploaded
data/hadoop emails, analytics/email analytics
data/hadoop emails, analytics/email analytics

Click Open Infographic to see the visualization of the email content, as shown in Figure 30.

Figure 30. Infographic
word used together, top words by time of day
word used together, top words by time of day

To add data and link them to each other:

  1. Click the Browser tab (shown in Figure 22).
  2. Click the + icon at the upper left of the window.
  3. Select Analytics > Workbook.

    You should see the Add Data window, as shown in Figure 31.

  4. Select Users > Admin > Applications > Resources, then click Add Data.
Figure 31. Add data
add preview of data to the workbook
add preview of data to the workbook

From the Simple tab, select partitions to show and download partitioned data, as shown in Figure 32, then click Select All.

Figure 32. Filter by partitions
click section on graph to choose data
click section on graph to choose data

The data is loaded, as shown in Figure 33. The columns are filled per the business point of view. You can see the user data from year to year, month to month, day to day, and hour to hour.

Figure 33. Example data
columns of data
columns of data

Add more data

Add more data by going back to the Add Data window. Select Resources > Customer ..., then click Add Data, as shown in Figure 34.

Figure 34. Add data
select resources, Customer file, click Add Data
select resources, Customer file, click Add Data

As shown in Figure 35, you should see the list of IDs, Users, Email, and so on.

Figure 35. New data
columns for ID/User/Email/Role/Activated/purchase
columns for ID/User/Email/Role/Activated/purchase

Here you can join two different data sheets. Click Join Sheet on the toolbar to create a joined sheet, as shown in Figure 36.

Figure 36. Select sheet & column
select sheet, column. drag col to define join
select sheet, column. drag col to define join

As shown in Figure 37, select remoteUser > User > Clickstream_Data > Customer_Profile/User, then click Create Joined Sheet.

Figure 37. Create joined sheet
new sheet contains data from two or more sheets
new sheet contains data from two or more sheets

Figure 38 shows the combined data sheet.

Figure 38. Combined data sheet
data from 2 sheets based on a key column
data from 2 sheets based on a key column

Now that you have added two sheets together, click Add additional Sheet from the current sheet. You should see the Formula Builder window. Select the first column, called Group, highlighted in the red box in Figure 39. Select Grouping and GROUPBY, then click OK.

Figure 39. Formula Builder
select function, create formulas, enter arguments
select function, create formulas, enter arguments

Select the second column and repeat the previous steps to see the data in Figure 40. The second column is based on the selected objects in the first column. (The second column will display the related attributes of the first column objects.)

Figure 40. Visitors data
visitors, status, traffic, impress...
visitors, status, traffic, impress...

To filter the data, click Apply Filter. Select your conditions, then click Create, as shown in Figure 41.

Figure 41. Apply filter to sheet
result contains only record matching conditions
result contains only record matching conditions

To save the data, click Save from the toolbar, give the file a name, then click Save again, as shown in Figure 42.

Figure 42. Save Workbook
saves workbook in specified folder
saves workbook in specified folder

All the saved data gets stored in the Workbooks folder, as shown in Figure 43.

Figure 43. Saved data
Sky New Testdata file in Analytics/Workbooks
Sky New Testdata file in Analytics/Workbooks

Visualize the data graphically

To visualize the information, click the + icon at the top left of the window and select Visualization > Infographic. You should see the window shown in Figure 44.

Figure 44. Saved files

Drag the pie chart widget onto the canvas. Drag the data file onto the pie chart to see the results shown in Figure 45.

Figure 45. Infographic
pie chart ABCDE, column data
pie chart ABCDE, column data

Click Save from the toolbar, then click Save in the window. Figure 46 and Figure 47 display the items you selected.

Figure 46. Visualize the data graphically
save infographic
save infographic

From Figure 47 you can select Browser.

Figure 47. Select Browser
save infographic
save infographic

Administration

If you need to start the application again, enter bin/conductor.sh start.

To stop the application, enter bin/conductor.sh stop.

Conclusion

There are multiple platforms and tools to help extract big insights out of big data, but it's essential to have an end-to-end platform to speed up the analytical process. Datameer is being positioned as the next-generation big data analytics platform for on-premise and off-premise environments. You can mitigate many complexities associated with big data analytics with cloud-based Datameer. Using a sample application, this article showed how to migrate Datameer to the IBM SoftLayer cloud and configure it for optimized performance.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Cloud computing
ArticleID=1020468
ArticleTitle=Gain keen insights from big data with Datameer on IBM SoftLayer
publish-date=11122015