Wikipedia is renowned for its thoroughness, widespread accessibility and the trust it has engendered. Key to these characteristics is its community-based creation and maintenance. This massive compilation of knowledge—to the tune of 300 languages and 25 billion monthly views—is a reliable, collaborative and open source of information used by countless people every day.

However, with the rise of AI, machine accessibility posed a new challenge to the organizations that develop and support Wikipedia. Wikidata, the linked, open platform that makes Wikipedia data available to thousands of developers across the open source landscape, needed to make this massive, multilingual data knowledge graph (with about 120 million entries and 2.4 billion edits to date) more accessible and usable by large language models (LLMs).

After test-driving several vector databases, Wikimedia Deutschland, the organization that develops Wikidata, turned to DataStax Astra DB on IBM watsonx.data. Compared to computing vectors locally, the highly scalable, low-latency Astra DB boosted query speed—a critical factor for retrieval augmented generation (RAG) apps—by 30 times. Development time at Wikimedia Deutschland saw a 90% reduction, as its development team can now focus on innovation rather than hosting and maintaining data infrastructure.