As the end of June rushes towards me, I find myself reflecting on my career at IBM, my colleagues at the IBM Research – Almaden lab, and the many other good things that I will miss! I am retiring from IBM on June 30th and will head across country to take on a new challenge: Dean of the new College of Information and Computer Sciences at UMass Amherst.
I started at IBM Research 36 years ago (!), when the research team was down on “plant site” (Cottle Road) in Building 28. The database team at the time was about fifteen people; only Ted Codd, the inventor of relational databases, was an IBM Fellow. IBM had just shipped its first relational database system, SQL/DS. DB2 was in process, but not yet birthed. We worked on big mainframe systems, on CRTs (cathode ray terminals), debugging in hexadecimal. Our research projects were focused on distributed databases and highly available systems – topics that turned out to be years ahead of their time, but have become essential to our modern world.
In August, Laura Haas will leave California behind to join the University of Massachusetts Amherst as Dean of the College of Information and Computer Sciences
Fast forward 36 years, and a lot has changed. We sit now in a beautiful lab surrounded by nature; a lab and setting so beautiful that it turns every work day into a magical experience. Deer, turkeys, cows, and our own totally confused roadrunner are just a few of the creatures we share this site with. Walk out the door at dusk to the sound of coyotes baying in the hills, and cares melt away (or at least fade to the background for a while).
IBM researcher Laura Haas was named an IBM Fellow in 2009
The database team has grown and changed with the field. We have produced a dozen more Fellows — I’m proud to be one of them. Now we cover a broad range of interesting topics, and we are embedded in teams throughout the Almaden Lab. While some of us specialize in particular data types (text, e.g., or radiological images), or in particular domains (genomic data, or financial), my focus has been on the integration of diverse types of data, often from distributed stores.
From R* (pronounced R-star), the first homogeneous distributed relational database, to Starburst, an extensible relational database that could store diverse types of data, came the idea for Garlic – a distributed database for heterogeneous data types and stores. Garlic was one of the first federated database systems, pioneering a type of “lazy” integration, where data was integrated only as needed, in sharp contrast to the reigning integration engines of the day, which built carefully planned warehouses that collected data together “eagerly” – in other words, ahead of time. Warehouses were expensive to plan and build, so Garlic and its ilk provided a valuable, lighter-weight alternative for more dynamic applications, where the data needed might change over time, or where end users were exploring the data.
Laura Haas, center, receives the Anita Borg Institute’s Technical Leadership Award in 2010
Experience with Garlic, and especially, working with clients, made me realize that we needed to make it dramatically easier to do the integration. So I turned my attention from integration engines to integration tools. One challenging problem in integrating data is to specify how data should be combined and what the semantics of the result should be. In thinking about this problem, I wished out loud that I could just draw lines from one schema to the other and the system would figure it out from there. From that rough description, Clio was born – and a whole branch of database theory was developed to support and enable it. Many papers, prototypes, products and awards resulted.
Most recently, with the Accelerated Discovery Lab, we’ve taken the idea of helping people integrate and analyze data to the next level, creating physical and virtual environments and tools to help people gain insights for data faster. With this project, I’ve had the opportunity to work with a diverse group of people (chemists, biologists, computer scientists of all ilks, data scientists, product teams, marketing and business people) to please and delight a diverse group of clients in such industries as retail, agriculture, airlines, finance, mining, healthcare, medical research — and that’s just to name a few!
So a lot has changed, but many things have not – and I hope never will. There are still new and exciting ideas floated in the halls every day. We create and lead projects that can revolutionize science and make the world a better place. We work together across disciplines and boundaries to be “Famous for our science, vital to IBM — and the world.”
To address the problem of ordinal impacts, our team at IBM T. J. Watson Research Center has developed OGEMs – or Ordinal Graphical Event Models – new dynamic, probabilistic graphical models for events. These models are part of the broader family of statistical and causal models called graphical event models (GEMs) that represent temporal relations where the dynamics are governed by a multivariate point process.
In our recent work, we detail an AI and machine learning mechanism able to assist in correlating a large body of text with numerical data series used to describe financial performance as it evolves over time. Our deep learning-based system pulls out from large amounts of textual data potentially relevant and useful textual descriptions that explain the performance of a financial metric of interest – without the need of human experts or labelled data.