IBM Support

Austin Meetup on TensorFlow for Machine Learning

Technical Blog Post


Abstract

Austin Meetup on TensorFlow for Machine Learning

Body

Last week, I participated in a special event!

IBM has been holding various "Hackathons" and "Meetups" as a new way to reach out to prospective clients. IBM sponsored a meetup at the Austin Executive Briefing Center (EBC) to discuss Machine Learning with TensorFlow on IBM Power systems, October 26, 2017.

This was a joint event, co-sponsored by [IBM Watson/Cognitive Austin] and [Big Data/AI Revealed] meetup groups. Special thanks to my colleague Cathy Cocco, IBM Executive IT Architect with the IBM Austin EBC, for coordinating this event with their organizers.

meetup_logo
(What is a Meetup? [Meetup.com is an online social networking website that facilitates in-person local group meetings. Meetup allows members to find and join groups unified by a common interest, such as books, games, pets, technology, careers or hobbies. In 2017, there are 32 million users with 280 thousand groups available across 182 countries.)

Here was the agenda for the event:

Time Description
6:30-7:00pm Registration, Pizza & Soft drinks
7:00-7:30pm Tensorflow 101 presentation
7:30-7:45pm Demo: Using TensorFlow for Financial Market Predictions on IBM POWER Systems
7:45-8:00pm Lightning Talk: IBM Data Science Experience
8:00-8:30pm Networking
Clarisse Taaffe-Hedglin: Intro to TensorFlow on IBM Power servers

Our guest speaker was my colleague Clarisse Taaffe-Hedglin, IBM Cognitive Senior Technical Architect, part of the same Worldwide Client Centers team that I work in. She flew in from Charlotte, NC.

Her topic was TensorFlow, an open source [Machine Learning] framework. TensorFlow was originally developed by Google, but was made open source in November 2015.

Machine Learning is popular in a variety of industries, from self-driving cars and trucks, speech recognition and video surveillance, to what movie to watch next on Netflix. There are three aspects to Machine Learning:

one-internet-minute-2017
  • Data: Start with the data you want to analyze. This could be IoT sensor data, security logs, or social media feeds. Check out all that happens in an "Internet Minute"!
     
  • Compute: While mathematical computations can be performed on traditional CPUs, some frameworks are optimized and accelerated with Graphical Processing Units (GPU). These GPU can perform Teraflops of single and double precision calculations.
     
  • Technique: As methodology have gotten more complicated over the years, frameworks have evolved to match.

The [TensorFlow] framework is now one of the most popular among data scientists. You can download it for free at [Github].

Clarisse showed the various programming/calculation tools used by data scientists. The top five were: Python, R, SQL language, MapReduce, and Microsoft Excel.

Mathematical models come in many flavors. Clarisse explained they can be used to identify clusters of data that might have similar properties, or to perform classification, or linear regression. The results can be "descriptive", gaining a better understanding of what already is, or "predictive" for what might be.

Some frameworks like Chainer or Torch are more flexible, using a dynamic Build-by-Run approach. However, these do not scale well. Theano and TensorFlow, on the other hand, employ a Define-then-Run approach, which scales better for larger projects. With the growth in popularity with TensorFlow, the Theano framework has been "functionally stabilized".


 
Clarisse Taaffe-Hedglin: Financial Markets Demo
Austin-Meetup-20171026_191703

For the demo, Clarisse had historical stock closing data for USA, Australia and Asian stock markets. The hypothesis: We can determine a Buy/Sell for USA stocks based on the closing results of non-American stock results? This is a classic "Binary Classification" model. The other stock markets close 4-16 hours before the U.S. markets open, so this has real-world applicability.

Since the data was in different monetary units, she did some cleanup to normalize the data, removing out the trends, and converting everything to U.S. Dollars (USD).

Clarisse used "Supervised Learning" on 80 percent subset of the data, and then used the other 20 percent remaining data to validate how well it did.

As with any model, you measure how good it is by how close it results in the correct answer. Wrong answers are weighted by how bad they are. This is often referred to as "Loss" or "Cost". Different models can therefore be compared by minimizing the loss.

Using a simple y=wx+b mathematical model, she ran 30,000 iterations. After 5,000 iterations, the model was already guessing correctly 55 percent of the time, by the time we hit 30,000 this was up to 68 percent accuracy.

TensorFlow also supports "hidden layers", basically intermediate variables that are then used in subsequent layers for more complicated calculations. This is the way our brain works with neural networks. With two added layers, she re-ran the 30,000 iterations, and now was up to 73 percent accuracy.

Normally, this kind of analysis would take hours or days, but since TensorFlow takes advantage of the IBM Power8 CPU and NVidia Tesla K80 GPU in the IBM Power server, the whole thing finished in five minutes!

The demo is available to IBMers and IBM Business Partners on the [IBM Systems Client Experience Portal].


 
Tuhin Mahmed: Lightning Talk on IBM Data Science Experience (DSX)

Tuhin Mahmed, IBM Software Developer, is the organizer for the Big Data/AI meetup group. He wants to promote the idea of "Lightning Talks" where each person presents for just 10-15 minutes. This is a variant of the popular [Pecha Kucha] events.

To get things started, he presented 10-15 minutes on [IBM Data Science Experience], or DSX for short. Taking Multiple Listing Service (MLS) real estate data of closing prices on houses sold in a range of zip codes from the Austin Area, he mapped these on x-y axis. The x axis was square feet, and the y axis was closing price.

Using DSX, he was able to develop a mathematical model that estimates house closing prices based on their zip code and square footage.

This was a simple example, but it showed the power of Jupyter Notebooks, and how anyone can get a 30-day free trial of DSX for their own experimentation.

Currently, being a data scientist is more of an art than a science. This is one of those fields that takes only a few months to learn, but years to master.

Rather than building a model from scratch, data scientists can take existing models, and modify them to fit their needs. There are a variety of existing models available in what is called the "Model Zoo". Google has over 2,000 projects already.

Those interested in trying this out TensorFlow for themselves were directed to [Nimbix], a Cloud Service Provider that offers POWER servers with NVidia GPUs.

There were about 50 attendees, more than half identified themselves data scientists. As the first inaugural sponsored event for the IBM Austin EBC, I think this was a success!

If you are in the Austin area, the next meetup will be at the [Capital Factory] on Brazos Street on November 30, 2017.

technorati tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW206","label":"Storage Systems"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16156819