Sharing knowledge: finding the right tools to help data scientists communicate with your business
By Greg Filla
Sharing results with Notebooks and Shiny apps
It’s often said that data science is a team sport. Everyone now uses data in their day-to-day work—which means data scientists, data engineers, developers and business analysts need to find effective ways to work together to deliver the insight that the business needs. However, many organizations are still struggling to put the tools and processes in place to support seamless collaboration between these different skill sets.
At a higher level, the data science team also needs to be able to communicate with their fellow data scientists and the rest of the organization—which is not always an easy task. To someone who only has a shallow knowledge of statistics, the work of the data science team may seem arcane and incomprehensible unless it is explained in clear business terms. For data science to be a genuine source of competitive advantage, its findings need to be actionable. If business-people can’t understand the results, they won’t be able to use them to support decision-making.
So how can we help data scientists communicate better, both within their own teams, and with stakeholders throughout the organization? The right technology can act as an enabler for more productive collaboration.
Sharing knowledge within the data science team
For data scientists to benefit from working together, they need to be able to understand not only the results of each other’s work, but also the methods that produced them. As in any science, if you can’t reproduce the outcome of an experiment, you can’t validate that the results are correct. Reproducibility
is an insurance policy that helps the data science team confirm that the guidance they are giving the business is sound, and not based on faulty research or a freak result.
Understanding the methodology of other data scientists’ work is also vital to encourage learning and development within the team, and to avoid reinventing the wheel. If two different studies rely on the same dataset, it’s possible some of the data preparation work performed during the first study will also be relevant to the second. By enabling data scientists to build on each other’s efforts instead of starting from scratch with each new project, the team can save time and become more productive.
IBM® Data Science Experience offers the perfect toolset to help data scientists share their results and methods with their peers. Integrated Jupyter Notebooks allow users to combine data sets and code with visualizations and documentation in a single user-friendly environment.
Using these notebooks, data scientists can not only share the results of their research, but also narrate and demonstrate the whole process of producing those results through embedded text markdown. Their colleagues can then take the notebook, examine the methodology, review the code, and re-run all the calculations against the data instantly. This makes it much easier to identify and correct any shortcomings, and to ensure the findings are robust before communicating them to the business.
On top of the standard open-source Jupyter Notebook functionality, Data Science Experience provides enterprise features that help you share notebooks more easily and securely. With a few clicks, you can publish a notebook to the company’s data catalog, push it to Github, or even share it on social media. At the same time, you can specify which parts of the notebook should be accessible by which users—protecting sensitive data and allowing you to customize the output for different audiences. Once a notebook is published, users can also add comments, which are automatically timestamped and tracked to provide a full lineage of how the commentary around the notebook has evolved over time.
Sharing knowledge with the business
While Jupyter Notebooks are a very powerful knowledge-sharing tool for members of the data science team, they may not be the best way to communicate with a business audience. Decision-makers often don’t need to know the methods behind the data, and don’t have the statistical expertise to understand them.
If you ask most line-of-business executives how they would prefer to interact with data, it’s likely that they will request the same kind of web-based interactive reports and dashboards that they are accustomed to seeing from their organization’s data warehouse. There’s just one problem: your data scientists should be spending their time working on research projects, not developing business intelligence tools.
That’s where Shiny apps come in. Shiny is a package for the open-source R statistical programming language, and is integrated with RStudio within IBM Data Science Experience. It enables data scientists to create web applications to showcase their results with just a few simple lines of code.
By combining Shiny with flexdashboard, another R package, you can quickly build interactive dashboards that allow users to filter data and change parameters, so that they can answer questions for themselves.
Again, Data Science Experience makes it easy to use Shiny and flexdashboard by providing a simple way to deploy and share these web apps and dashboards. Once the data science team is ready to communicate its results to the business, it can rapidly deploy an app and share links with the rest of the organization.
Democratizing data science
If data science is to truly be a team sport, then you need to give your players the right equipment to succeed. To deprive your data science team of collaboration tools like Jupyter Notebooks would be the equivalent of depriving a football team of its playbook—making it much more difficult to co-ordinate
around a common objective and support each other’s efforts.
To go one stage further, a data science team that fails to communicate with the business is like a sports team playing in an empty stadium. The greatest victories count for nothing if nobody is watching. Shiny apps act like a 100-foot jumbotron, presenting the highlights of your research in super-high definition—and helping you win new fans across the business.
If you want to give your data science team the best possible chance of a winning record, sign up for a trial of IBM Data Science Experience today, and explore how its collaboration and sharing capabilities can help you become more productive.