Gain keen insights from big data with Datameer on IBM SoftLayer
How prebuilt analytics apps can change your life
Analyze and visualize data with Datameer
Datameer lets you easily integrate all of your data into Hadoop. It's an end-to-end platform that eliminates the complexity of big data analytics tasks. You can arrive at data-driven decisions in minutes, not months. Datameer is the one-stop shop to get all of your data into Hadoop, analyze that data, and visualize the insights in your preferred format.
The Datameer Analytics App Market is the world's first marketplace for prebuilt analytic applications that let you simply plug in your own data and see the final results graphically. You don't have to build anything.
If you have huge data that is collected from distributed sources, has different structures, has a growing scope, and has varying speed, Datameer can help you achieve data "virtualization." If your data is in the cloud, in legacy databases, and in spreadsheets on your desktop, Hadoop is helpful but not sufficient to make sense out of distributed data. Now, with Datameer, you can integrate all of your data into Hadoop as easily as following a wizard. With built-in connectors to all common structured and unstructured data sources, big data integration is streamlined. You simply indicate in Datameer:
- What data to bring into Hadoop and how
- Whether it's a one-time import or streamed in as new data is added
- Import on a schedule that you determine
With Datameer, big data analytics is as simple as using a spreadsheet. To build an analysis, use the wizard to:
- Select which data to work with in a spreadsheet
- Choose from over 250 prebuilt analytic functions
- Use iterative point-and-click analytics at the speed of thought with Datameer's Smart Sampling technology
Datameer is tied in with multiple Hadoop platforms such as Cloudera, Hortonworks, and MapR. Datameer uses IBM BigInsights®, which is a dependable and enterprise-ready implementation of Apache Hadoop. Datameer and Cloudera together provide a complete big data analytics solution. With Cloudera's enterprise-scale data hub, you can centralize and cost-effectively store all of your data in its original fidelity in Hadoop. Any standards-compliant big data analytics platform can be seamlessly attached with the Datameer platform.
Data analytics tools help to reveal pragmatic insights that should be presented in a user-preferred format. Datameer's WYSIWYG Business Infographic, packaged with Designer, provides drag-and-drop visualizations regardless of the data type, size, or source. You start with a blank HTML5 canvas to design Infographic reports that will automatically update every time your data updates. You can import any image, embed a video, write free-form text, and customize ad infinitum. Thanks to HTML5, your visualizations are consumable on any device.
Recommended hardware for a production environment includes:
- 1U server
- 2 quad core CPUs
- 8+ GB RAM
- 2 x 1 TB hard drives (recommended available disk space is 250 GB)
- RAID - 0 stripping
- RAID - 1 mirroring
- Redundant power
- Failover requires a standby server with the same configuration
Table 1 shows the supported operating systems for Datameer.
Table 1. Operating systems that support Datameer
|Ubuntu 10||10.04 LTS||MySQL 5.1.41|
|Ubuntu 12||12.04 LTS||MySQL 5.5|
|Debian 5 (Lenny)||5.0.5||MySQL 5.1.47|
|Solaris 10||10||MySQL 5.1.30|
|Red Hat Enterprise Linux (RHEL)||5.5, 6.x||MySQL 5.0.77|
|Scientific Linux||6.1||MySQL 5.1.52|
Provisioning a CentOS server on IBM SoftLayer
To provision the virtual machines in the SoftLayer cloud, use the following IP details:
- Public IP:
- Server IP:
- Server name:
root / xxxxx
- Download Datameer from the Datameer website.
- Drag the datameer_apache_1.0.3-4.5.0-1.noarch.rpm file into VM in a directory by using WinSCP or FillZilla.
- Copy the Datameer software to the usr/local directory,
as shown in Figure 1, then give the necessary permissions by entering the
chmod -R 777 datameer_apache_1.0.3-4.5.0-1.noarch.rpm
Figure 1. Set file permissions
- From the VM command line, export the package by entering the following
- Before you install Datameer, check whether the Java™ programming language is installed by
- If the Java language is not installed, install it by entering the following command:
sudo yum install java-1.7.0-openjdk-devel
A message displays which version of the Java language was installed along with the dependencies that were installed, as shown in Figure 2.
Figure 2. Successful Java language installation message
After the installation of the Java language, you can start the Datameer installation.
- Expand the archive by entering the following command:
rpm2cpio datameer_apache_1.0.3-4.5.0-1.noarch.rpm | cpio -idmv
Files in the archive are listed, as shown in Figure 3.
Figure 3. Expanded archive
Refresh the directory path.
Start the Datameer application server
To start the Datameer application server:
- Switch to Datameer by entering the commands in Listing 1.
Listing 1. Switch to Datameer user and start server
su – datameer cd /usr/local/Datameer-trial-5.0.1-apache-1.0.3 cd bin ./conductor.sh start
- After you start the Datameer server, open a browser session with the
URL http://22.214.171.124:8080, which takes you to the
Datameer software agreement, as shown in Figure 4.
Select I agree with the license terms, then click Continue.
Figure 4. Software agreement
- You should see the Datameer Dashboard, as shown in Figure 5, which
has tabs for Home, Browser, App Market, and Administration.
On the left side of the Datameer dashboard, the options are Filter, Admin, Analytics, Data, Examples, Images, Users, and Visualization.
Select Admin on the left of the window.
Figure 5. Welcome screen
Upload the CSV file into the Datameer server
To start uploading the CSV file, click the Browser tab, click the + icon, as shown in Figure 6, then select Data > File upload.
Figure 6. Icon to add items
- From the New File Upload window, as shown in Figure 7, click
Browse. In the File Type field,
select CSV/TSV to use our example file, then click
Figure 7. Specify file type
- Figure 8 shows the Define Fields tab for our
example. The Datameer team provided the data in the sample
application. The file shows the ages of people in different cities.
Figure 8. Define fields
- On the Data Details page, you can enter the Delimiter,
Schema, and Column names from
the Data Details tab, as shown in Figure 9. In this
article, we kept the default data because we don't have any customer
Figure 9. Data details
- For sample size, leave the Sample Records field 5000
in the Sample tab, as shown in Figure 10, then click
Figure 10. Sample
- Provide a brief description of the data, as shown in Figure 11, then
Figure 11. Save
- Figure 12 shows that the file is loaded successfully in the tool.
Select Drop record, leave the other fields as is,
then click Next.
Figure 12. Placeholders
- You should see the uploaded file under the Data tab.
Specify the file name and click Save. As shown in
Figure 13, you can then see all the saved files.
Figure 13. Saved files
- Double-click on the saved file (FileUpload in Figure 13) to see the current status, as shown in Figure 14.
Figure 14. Current status of file
- Click Link data in new workbook and Browse
Data to see the results, as shown in Figure 15.
Figure 15. Results
- Click Download to see the decision tree, as shown in
Figure 16. Decision Tree
- Click the Link data in new workbook tab in Figure 14 to see your choices for working with
the example data, as shown in Figure 17.
Figure 17. Options for analysis
- Select the Decision Tree Sheet icon, highlighted in the red box in
Figure 17, to go to the Settings window shown in Figure 18. Here you
can create the sheets or settings you want. Then, drag the columns and
drop them in the setting box.
Figure 18. Settings
- Click Create Sheet to see the output, as shown in
Figure 19. Spreadsheet
- Select from the toolbar option, highlighted in the red box in Figure 20, to create a Clustering Sheet, Decision Tree Sheet, Recommendation
Sheet, Column Dependencies Sheet, and Flip Sheet. (Our example
provides just one sample sheet as an introduction to the software.)
Figure 20. Select sheets
- The data that you loaded will be stored under the Analytics folder in
Workbooks, as shown in Figure 21. To see the data, select the
Home tab, then select Analytics.
Figure 21. Stored data
Analyzing the data
To start analyzing data:
- In Datameer, click the App Market tab, as shown in
Figure 22. App Market
- Select and install the LinkedIn Pro Network. Click
Authorize Datameer to retrieve data, as shown in Figure 23. You will be asked for your LinkedIn profile
Figure 23. Install LinkedIn Pro Network
After providing the relevant details, click OK as prompted. After login to the LinkedIn Pro Network, click Save & Run, as shown in Figure 24.
Figure 24. Save & Run
- Figure 25 shows the first screen of the LinkedIn Pro Network and
whether your connections are successful.
Figure 25. LinkedIn Pro Network
- Wait until the data is fully loaded, then click Open
infographic to see the LinkedIn Statics screens, as shown
in Figure 26. LinkedIn has sorted and visualized your data. For
example, you can see how many friends are in your LinkedIn profile,
how many mutual friends, where they are in the world, and so on.
Figure 26. Linkedin Statistics
Figure 27 shows your friends' locations from all over the world.
Figure 27. Linkedin Statistics
This section walks through an example from the Datameer App Market.
Click the App Market tab (shown in Figure 22), then install the Tutorial Email Word app. The app gets data from your LinkedIn profile and filters such things as what times you are logged in, how many times you use the program, and so forth.
The time required to load the app varies based on your networking speed. When you see Install tutorial Email Word Complexity, click Run.Figure 28 shows the progress of the installation.
Figure 28. Starting Tutorial Email Word Complexity
To see all of the data uploaded into the application, the check mark symbols should be green, as shown in Figure 29.
Figure 29. All data uploaded
Click Open Infographic to see the visualization of the email content, as shown in Figure 30.
Figure 30. Infographic
To add data and link them to each other:
- Click the Browser tab (shown in Figure 22).
- Click the + icon at the upper left of the window.
- Select Analytics > Workbook.
You should see the Add Data window, as shown in Figure 31.
- Select Users > Admin > Applications > Resources, then click Add Data.
Figure 31. Add data
From the Simple tab, select partitions to show and download partitioned data, as shown in Figure 32, then click Select All.
Figure 32. Filter by partitions
The data is loaded, as shown in Figure 33. The columns are filled per the business point of view. You can see the user data from year to year, month to month, day to day, and hour to hour.
Figure 33. Example data
Add more data
Add more data by going back to the Add Data window. Select Resources > Customer ..., then click Add Data, as shown in Figure 34.
Figure 34. Add data
As shown in Figure 35, you should see the list of IDs, Users, Email, and so on.
Figure 35. New data
Here you can join two different data sheets. Click Join Sheet on the toolbar to create a joined sheet, as shown in Figure 36.
Figure 36. Select sheet & column
As shown in Figure 37, select remoteUser > User > Clickstream_Data > Customer_Profile/User, then click Create Joined Sheet.
Figure 37. Create joined sheet
Figure 38 shows the combined data sheet.
Figure 38. Combined data sheet
Now that you have added two sheets together, click Add additional Sheet from the current sheet. You should see the Formula Builder window. Select the first column, called Group, highlighted in the red box in Figure 39. Select Grouping and GROUPBY, then click OK.
Figure 39. Formula Builder
Select the second column and repeat the previous steps to see the data in Figure 40. The second column is based on the selected objects in the first column. (The second column will display the related attributes of the first column objects.)
Figure 40. Visitors data
To filter the data, click Apply Filter. Select your conditions, then click Create, as shown in Figure 41.
Figure 41. Apply filter to sheet
To save the data, click Save from the toolbar, give the file a name, then click Save again, as shown in Figure 42.
Figure 42. Save Workbook
All the saved data gets stored in the Workbooks folder, as shown in Figure 43.
Figure 43. Saved data
Visualize the data graphically
To visualize the information, click the + icon at the top left of the window and select Visualization > Infographic. You should see the window shown in Figure 44.
Figure 44. Saved files
Drag the pie chart widget onto the canvas. Drag the data file onto the pie chart to see the results shown in Figure 45.
Figure 45. Infographic
Click Save from the toolbar, then click Save in the window. Figure 46 and Figure 47 display the items you selected.
Figure 46. Visualize the data graphically
From Figure 47 you can select Browser.
Figure 47. Select Browser
If you need to start the application again, enter
To stop the application, enter
There are multiple platforms and tools to help extract big insights out of big data, but it's essential to have an end-to-end platform to speed up the analytical process. Datameer is being positioned as the next-generation big data analytics platform for on-premise and off-premise environments. You can mitigate many complexities associated with big data analytics with cloud-based Datameer. Using a sample application, this article showed how to migrate Datameer to the IBM SoftLayer cloud and configure it for optimized performance.