IBM co-organizes ISWC Semantic Web Challenge on answer type prediction

Share this post:

The International Semantic Web Conference (ISWC) 2020, the premier international forum for the Semantic Web and Linked Data Community, is being held November 1 – 6, 2020. IBM Research AI is proud to participate in this conference as a platinum sponsor.

This year, IBM Research — in collaboration with Smart Data Analytics (SDA) group, Fraunhofer IAIS, University of Bonn, and University of Paderborn — co-organized a challenge, SeMantic AnsweR Type prediction (SMART), as part of its broader effort to create more intelligent, flexible Question Answering (QA) systems through neurosymbolic-based techniques (i.e., the combination of deep learning and symbolic reasoning).

Each year, the ISWC conference accepts a set of Semantic Web Challenge proposals to establish competitions that will advance state of the art solutions in any given problem domain. IBM Research’s SMART challenge represents a novel aspect in that it is designed with a granular answer type classification using two well-known Semantic Web ontologies, DBpedia (~760 classes) and Wikidata (~50K classes).

In QA, the goal is to answer a natural language question (going beyond the document retrieval or search). Answer type prediction is proven to be useful in Knowledge-base question answering (KBQA), in which the answer is extracted from a structured knowledge base and TableQA,  in which the answers are extracted from tables.

Question or answer type classification plays a key role in QA. The questions can be generally classified based on Wh-terms (Who, What, When, Where, Which, Whom, Whose, Why). Similarly, the answer type classification is determining the type of the expected answer based on the query. Such answer type classifications in literature are performed as a short-text classification task using a set of coarse-grained types, for instance, either 6 or 50 types with TREC QA task.

Through this challenge, IBM Research hopes to advance new best practices in the field of QA. A total of eight teams from different universities across the United States, Europe and Japan participated in this challenge and the leaderboard is shown below. The participants will present their work on November 5 at the ISWC conference in sessions 6A and 8A.

DBpedia Dataset Leaderboard 

System Accuracy NDCG@5 NDCG@10
Setty et al 0.98 0.80 0.79
Nikas et al 0.96 0.78 0.76
Perevalov et al 0.98 0.76 0.73
Kertkeidkachorn et al 0.96 0.75 0.72
Ammar et al 0.94 0.62 0.61
Vallurupalli et al 0.88 0.54 0.52
Steinmetz et al 0.74 0.54 0.52
Bill et al 0.79 0.31 0.30


Wikidata Dataset Leaderboard 

System Accuracy MRR
Setty et al 0.97 0.68
Kertkeidkachorn et al 0.96 0.59
Vallurupalli et al 0.85 0.40


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


Principal Research Scientist and Manager, AI, Knowledge Induction

Nandana Mihindukulasooriya

contact information Post-Doctoral Researcher, MIT-IBM Watson AI Lab

More AI stories

Getting AI to Reason: Using Neuro-Symbolic AI for Knowledge-Based Question Answering

Building on the foundations of deep learning and symbolic AI, we have developed a software able to answer complex questions with minimal domain-specific training. Initial results are encouraging – the system achieves state-of-the-art accuracy on two datasets with no need for specialized training.

Continue reading

IBM Research at EMNLP 2020

At the annual Conference on Empirical Methods in Natural Language Processing (EMNLP), IBM Research AI is presenting 30 papers in the main conference and 12 findings that together aim to advance the field of natural language processing (NLP).

Continue reading

DualTKB: A Dual Learning Bridge between Text and Knowledge Base

Capturing and structuring common knowledge from the real world to make it available to computer systems is one of the foundational principles of IBM Research. The real-world information is often naturally organized as graphs (e.g., world wide web, social networks) where knowledge is represented not only by the data content of each node, but also […]

Continue reading