Weather Company Data Limited Edition |
The Weather Company® |
Included with Cloud Pak for Data |
- About this offering
- 90-day access to cloud-based APIs that enable you to obtain historical weather data, current
conditions, and forecast conditions.
- Use cases
- You can use weather data to optimize operations, reduce overhead costs, increase safety, and
uncover new revenue opportunities. For example, you can:
- Predict power outages with greater accuracy so that you can restore power to customers
faster
- Reduce utility costs with smarter vegetation management
- Improve flight safety, efficiency and performance
- Keep policyholders safe while reducing insurance claims and fraud
- Improve supply chain visibility and minimize weather-related disruptions
- Transport people and goods more safely
- Industry accelerators
- The following industry accelerators can help you get started with this data set:
- Get started
- For details, see https://www.ibm.com/weather.
|
Document Layout Analysis Data (Image analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of two data sets:
- The PubLayNet data set contains images of research papers, articles, and annotations that
identify elements such as text, titles, lists, tables, and figures..
File types:
JPG, JSON
- The PubTabNet data set contains tables in image and HTML format.
File types:
PNG, JSON
- Use cases
- Build models that can:
- Identify the layout of unstructured documents, such as PDF files
- Interpret the structure and content of image-based tables
- Get started
- For details, see:
|
Visual Question Answering Data (Image analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of a single data set.
The VizWiz - Visual Question Answering data
set contains numerous images for training, testing, and validating your model. Each training image
and validation image has a set of questions and answers that are associated with the
image.
File types: JSON
- Use cases
-
- Build applications that can interpret images for visually impaired people
- Create educational or recreational applications that can generate a set of questions and answers
about an image
- Create an image retrieval system
- Get started
- For details, see the Viz Wiz - Visual Question Answering data set on the IBM Developer site.
|
Finance and Contract Report Data (Natural language processing)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of two data sets:
- The Contracts Proposition Bank data set contains approximately 1000 compliance sentences from
IBM's publicly available contracts. The sentences focus on 60 predicates that are specific to
contract compliance.
File types: CoNLL-U
- The Finance Proposition Bank data set contains approximately 1000 finance sentences from IBM's
publicly available annual financial reports. The sentences focus on 40 predicates that are specific
to financial reporting.
File types: CoNLL-U
- Use cases
- Build natural language processing models to:
- Analyze contracts to identify content related to agreement terms, intellectual property
protection, limitation of liability, warranty terms, and so on.
- Analyze financial reports to identify content related to market risk, investment outcomes,
financial health, and so on.
- Get started
- For details, see the following data sets on the IBM Developer site:
|
Rich Text Data (Natural language processing)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of more than 10 data sets:
- The WikiText-103 data set contains ore than 1 million tokens from good or featured
articles on Wikipedia.
File types: TXT
- The Groningen Meaning Bank data set contains more than 1 million multi-sentence texts with
annotations for parts-of-speech, named entities, lexical categories and other natural language
structural phenomena.
File types: TXT
- The IBM Debater® Mention Detection Benchmark data
set contains 3000 sentences that are annotated with mentions so that they can be mapped to the
relevant concepts in a knowledge base.
File types: ANN
- The Forum Classify data set contains 100 discussion threads. Each message in a thread is
classified as a question, repeat question, clarification, further details, solution, positive
feedback, negative feedback, or junk.
File types: XML
- The Forum Summarize data set contains more than 113,000 discussion threads. The data set
contains information about the structure and metadata (title, posts, user IDs, and so on) of each
thread.
File types: XML
- The IBM Debater Sentiment Lexicon of IDiomatic
Expressions (SLIDE) data set contains 5000 idioms that are annotated with sentiment analysis.
File types: TSV
- The IBM Debater Claim Sentences Search data set
contains 1,490,000 sentences. The sentences contain information about a series of preselected
topics. The sentences can be used as claims (phrases that are used to support an
argument).
File types: CSV
- The IBM Debater® Wikipedia Category Stance data set contains information about more than 4600
Wikipedia pages. Each page discusses one of 132 concepts. The pages are annotated with their stance
(for or against) on the concept.
File types: CSV
- The IBM Debater Thematic Clustering of Sentences data set contains 692 Wikipedia articles. The
sentences in each article are annotated with the thematic cluster that they belong to.
File types:
CSV
- The IBM Debater Wikipedia Oriented Relatedness data set identifies the relatedness between
different concepts on Wikipedia. The data set is composed of more than 19,000 pairs of related
concepts.
File types: CSV
- The IBM Debater Multi Word Term Relatedness Benchmark data set identifies the relatedness
between multiword terms, including acronyms and named entities. The data set is composed of more
than 9,800 labeled pairs of terms.
File types: CSV
- The Nutch data set includes execution log files from Nutch, an open source web crawler
application. The log files are generated before changes were committed to the application and after
changes were committed to the application. The data set highlights the difference in execution
behavior based on the changes that were committed.
File types: CSV,
JSON
- The IBM Debater Labeled Emphasized Words in Speech data set contains more than 4000 sentences
from speeches that were given to an audience. The sentences are annotated with the words that were
emphasized most when the sentence was spoken.
File types: TXT
- The IBM Debater Sentiment Composition Lexicons data set contains more than 66,000 words and
262,000 two-word phrases. Each word and phrase is annotated with a positive or negative sentiment
score. For example, absolute bliss has a positive sentiment score, but absolute chaos
has a negative sentiment score.
File types: TXT,
XSLX
- The IBM Debater Concept Abstractness data set contains 100,000 words, 100,000 two-word phrases,
and 100,000 three-word phrases that represent concepts. Each word and phrase is annotated with the
degree of abstractness. For example, a bad dream is more abstract than a hammer.
File types:
CSV
- Use cases
- Build natural language processing models to:
- Discover the content of documents
- Search the contents of documents
- Classify and organize documents
- Generate article or product recommendations
- Determine the topic of a document
- Retrieve relevant information
- Identify similarities between documents
- Detect plagiarism
- Analyze customer sentiment
- Create plans based on customer sentiment
- Create applications that better emulate human speech by predicting which words should be
emphasized when converting text to speech
- Analyze log files to identify differences in behavior between two versions of an
application
- Get started
- For details, see the following data sets on the IBM
Developer site:
.
|
Speech Command Data (Audio analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of a single data set.
The TensorFlow Speech Commands data set
contains a set of audio files that contain core words, auxiliary words, or background
noises.
File types: WAV
- Use cases
- Build systems that are capable of recognizing spoken commands. For example, you can build:
- Voice-activated assistants
- Voice-operated IoT devices
- Get started
- For details, see the TensorFlow Speech Commands data set on the IBM Developer site.
|
IBM Debater®
Data (Audio analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of three data sets:
- The IBM Debater Recorded Debating #1 data set contains 60 argumentative speeches given by expert debaters. The data set covers 16 controversial topics. The data set also includes transcripts of the recordings and an annotated list of claims that could be used to support the argument.
File types:
WAV, CSV, TXT
- IBM Debater Recorded Debating #2 data set contains 200 argumentative speeches given by expert debaters. The data set covers 50 controversial topics. The data set also includes transcripts of the recordings and an annotated list of claims that could be used to support the argument.
File types:
WAV, CSV, TXT
- The IBM Debater Recorded Debating #3 data set contains audio recordings of 400® argumentative speeches given by expert debaters. The data set covers 200 controversial topics. The data set also includes transcripts of the recordings and an annotated list of claims that could be used to support the argument.
File types: WAV, CSV,
TXT
- Use cases
- Build systems like IBM Debater that are capable of
understanding and rebutting arguments. For example, you could use the system to understand potential
counter arguments for legal cases, legislation, and public policy.
- Get started
- For details, see the following data sets on the IBM Developer site:
|
Historical Weather Data (Time series analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of a single data set.
The NOAA Weather Data – JFK Airport data set
includes over 114,000 hourly observations of weather data from JFK Airport. The weather data
includes visibility, temperature, wind speed and direction, humidity, dew point, and
pressure.
File types: CSV
- Use cases
- Build models that can generate weather predictions.
- Get started
- For details, see the NOAA Weather Data – JFK Airport data set on the IBM Developer site.
|
Activity Verification Data (Video analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of a single data set.
The Video-Text Compliance data set is a
series of videos that show atomic activities. The videos are accompanied by text instructions and
compliance labels.
File types: MP4, CSV
- Use cases
- Build models that can determine whether the person being monitored is performing a task
according to an associated set of text-based instructions.
- Get started
- For details, see the Video-Text Compliance data set on the IBM Developer site.
|
Core Science Data (Video analysis)
|
IBM |
Included with Cloud Pak for Data |
- About this offering
- This offering is comprised of a single data set.
The Double Pendulum Chaotic data set is a
series of videos that show the motion of a double pendulum over the course of 21 different runs. The
data set also includes an annotated list of frames.
File types: H.264,
CSV
- Use cases
- Build models that can generate spatiotemporal predictions for the behavior of a chaotic system.
- Get started
- For details, see the Double Pendulum Chaotic data set on the IBM Developer site.
|
People data |
People Data Labs |
Separately priced |
- About this offering
- Access more than 1 billion profiles of people from around the world. The data covers more than
150 data points and includes information such as professional experience, interests, social
profiles, and more.
You can purchase a bulk data license or you can purchase access to the APIs.
- Use cases
- Build models that help you:
- Enrich inbound sales
- Identify new prospects
- Deduplicate existing data
- Get started
- For details, see the following pages on the People Data Labs website:
.
|