Real Web 2.0: The Wikipedia family

Enrich your Web site with the lesser-known fruits of the Wikimedia project

You know Wikipedia, but do you know of the dozens of related sites that provide user-generated content that is just as valuable? Many of the related sites under the Wikipedia umbrella are very useful to Web developers. Learn how to enrich your information space with resources beyond Wikipedia, including examples of widgets applying data from these sites.

Share:

Uche Ogbuji, Partner, Zepheira, LLC

Uche OgbujiUche Ogbuji is Partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies. Mr. Ogbuji is lead developer of 4Suite, an open source platform for XML, RDF and knowledge-management applications and lead developer of the Versa RDF query language. He is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia.



04 November 2008

Also available in Chinese

Wikipedia ranks as one of the most popular and well-known Web sites ever. Everyone from kids looking to get a leg up on homework to Web developers tapping the power of user-generated content makes Wikipedia the first stop. But in terms of useful information, Wikipedia is merely the centerpiece of a much larger setting. The Wikimedia Foundation is the organization that runs Wikipedia, and much more. Its home page says: "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment." That's a big claim, and it takes more than one even gigantic encyclopedia to fulfill it. You're probably aware that there are numerous language versions of Wikipedia. (I was surprised and gratified to find the respectable number of Wikipedia articles in Igbo, my father tongue.) But do you know how often useful information is present in other languages that has not been translated to English? Have you heard of Wiktionary, Wikinews, Wikibooks, Wikisource, Wikiversity, and the like? Have you considered some of the benefits you could gain for your Web project by tapping into this vast pool of information? In this article I'll show you around the greater Wikimedia and present code that helps your own site's users "freely share in the sum of all knowledge".

The sites

Here is a quick summary of the sites in the Wikimedia family, besides the well-known Wikipedia.

Wiktionary

Wiktionary is the dictionary counterpart of Wikipedia. Many have expressed skepticism of the practicality of an open content encyclopedia, and it would seem to be an even more daunting task for a much less glamorous endeavor such as a dictionary. The French version is the largest, in terms of the number of "good" entries, closely followed by the English one, which has by far the most overall entries and edits. After that it's a significant drop to the Turkish version, but there are nine language versions with at least 100,000 "good entries", and many versions with close to that number, adding up to an astonishing body of work. Some of the versions grew by using robots to import entries from free sources, such as the French Wiktionary, which includes many entries copied from old, freely licensed dictionaries, such as the Dictionnaire de l'Académie française. Many Wiktionary entries include translations to other languages, so another trick is to bulk-import translations listed in other language versions. Entries range from stubs with no real content (obviously these are not classified as "good" entries) to rich entries that include etymology, examples of use, pronunciation (in phonetic alphabet and sound files), cross references, synonyms, antonyms, variant grammar forms, translations, and even appearance analyses from important textual bodies such as Project Gutenberg.

Wikinews

Wikinews is an outlet for articles on news and current events, with the idea that people knowledgeable of events and involved in events can collaboratively fill in the relevant pages. The guidelines are that stories should be written from a neutral point of view. Wikinews can include stories, multimedia reports, interviews, and more. Coming soon is Wikimedia Radio, eventually to be a constant streaming audio broadcast of various programs and news, drawn largely from Wikinews and other Wikimedia projects. Naturally, Wikinews coverage tends to be slanted towards regions and topics with many interested contributors, which does not lend itself to being comprehensive. In addition, Wikipedia's popularity means that there are usually rapid updates to its articles, even at a pace suitable for news articles, which has often stolen thunder from the Wikinews project.

Wikibooks and Wikiversity

The obvious expansion of an encyclopedia article is to a full book on the topic, and this is the domain of Wikibooks. It includes Wikijunior, a collection of text for children and child education, which might become its own full project soon. Wikiversity was also once a subsection of Wikibooks, which has become a full Wikimedia site. Wikiversity encourages learning in a group or community setting, with participants editing learning project pages in accompaniment to any hands-on activities that support understanding. Organized into faculties, it focuses on all the many support resources that combine with textbooks in an educational setting. Wikibooks hosts the textbooks and also supports collaborative community development, with outlines of Wiki pages getting expanded piecemeal into full books. Books and faculties range from learning languages to computer science, from organic chemistry to law. Educators in the biological sciences should also take note of Wikispecies, a taxonomic directory of life forms, like a modestly structured Wikipedia of organisms.

Wikisource

Working back from all these secondary information sites to original documents, Wikisource, also known as The Free Library, gathers source texts, annotations, translations, and supporting materials. The texts can be works of fiction or non-fiction, historical records, civic documents, or anything else noteworthy and free from copyright restrictions.

Wikiquote

Wikiquote is an open reference site for quotations from history and culture, in multiple languages. There has been some recent controversy about Wikiquote, with some arguing it should be disbanded due to objectionable content and copyright violations. Some think quotes should be added to the role of Wikisource. Many others, however, think that if there are any content issues at Wikisource, the community should first at least try to resolve these before taking the drastic step of disbanding a wiki. Certainly there seems no likelihood that this will happen any time soon.

Wikimedia Commons

Wikimedia Commons is a companion site for the Wikimedia family that hosts images, video, audio, and any other free media files. It's a large repository, containing millions of files. It's also intended to be a cultural repository of such media and seeks to further this through categorization and recognition of notable images.

Working across Wikimedia

The breadth and height of activity in the Wikimedia space opens up many opportunities for cross-pollination and useful applications beyond what the foundation itself provides. This is the spirit of Web 2.0. Users can take presently unintegrated streams of open data and turn them into fresh applications beyond the imagination or ambition of the original publishers.

Google custom search widget

It's not obvious at present how you might search across all the various Wikimedia properties. Some separate projects provide such an aggregate search, but they have various levels of usefulness, and there is no reason not to roll your own. One of Google's initiatives, Google Co-op, includes a custom search engine (CSE) tool which allows you to define and create a search facility tailored to your specifications, and even curated by you through detailed annotations. I created a custom search, "Wikimedia plus", including all the supported languages, and some other interesting, related sites. The main criteria for that search is in Listing 1.

Listing 1. Custom search criteria
*.wikipedia.org/*
*.wiktionary.org/*
*.wikibooks.org/*
*.wikiversity.org/*
*.wikinews.org/*
*.wikimedia.org/*
*.wikiquote.org/*
*.wikisource.org/*
*.wikia.com/*
*.uncyclopedia.org/*

Notice the wildcard form, one of the features of CSE. Google provides a widget form for CSEs. See Listing 2 for a usage example.

Listing 2. Example usage of the "Wikimedia plus" search engine widget
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
<head>
<title>Hello Wikimedia world</title>
</head>
<body>
<p>Use the following gadget to search Wikimedia and selected Wikia sites.</p>
<hr/>
<!-- Remove line breaks in the following script element before using -->
<script src="http://www.gmodules.com/ig/ifr?url=
http://www.google.com/coop/api/016404950850373629424/cse/1bius8lhc7g/gadget
&amp;synd=open&amp;w=320&amp;h=75&amp;title=Wikimedia+plus
&amp;border=%23ffffff%7C3px%2C1px+solid+%23999999&amp;output=js"></script>
</body>
</html>

I split up Google's long URL into several lines for formatting purposes. Just remove line breaks in the script element before trying or adapting the listing. Figure 1 is a snapshot of the resulting page. Google's JavaScript replaces the element with an iframe containing the search form. When you enter a search, the results open in a separate window.

Figure 1. Sample page using the Google Wikimedia plus widget
Sample page using the Google Wikimedia plus widget

Word of the day widget

As a second example I'll show how you can create your own widget using the word-of-the-day feed on Wiktionary. Each day the editors feature an interesting word. This widget combines the chosen word with the results of a search for that word on Wikimedia commons (images, sound files, videos, and so on). Listing 3 (wotd.js) is the widget JavaScript code.

Listing 3 (wotd.js). JavaScript word-of-the-day widget
/*
Word of the day widget code.

Requires jQuery: http://www.jquery.com (tested with version 1.2.6)
*/

var wotd = 
{
//Some variavles global to the wotd namespace
wotdfeedurl: "wotd_feed_proxy",
commonssearch: "commons_search_proxy/",
commonsbase: "http://commons.wikimedia.org",
mediatargetid: "target",
wotdtargetid: "wotd",

word: "",
worddesc: "",
wiktionarylink: "",

//Invoked once the main HTML DOM is ready
loadPage: function()
{
wotd.medianode = $("#" + wotd.mediatargetid);
wotd.wotdnode = $("#" + wotd.wotdtargetid);
$.get(wotd.wotdfeedurl, wotd.processFeed);
},

//Invoked with the result of the AJAX call to the Wiktionary feed
processFeed: function(feed)
{
var item = $("item:first", feed);
var title = item.find("title").text();
wotd.word = title.substring(title.indexOf(':')+2);
wotd.wikionarylink = item.find("link").text();

//Update the target spot on the main page with the word of the day link
wotd.wotdnode.append(wotd.word);
wotd.wotdnode.attr("href", wotd.wikionarylink);

$.get(wotd.commonssearch + wotd.word, wotd.processSearch);
},

//Invoked with the result of the AJAX call to the Wikimedia commons search
processSearch: function(result)
{
//Narrow in on the search results page
narrowed = $(result).find(".mw-search-results");
//Fix up relative link and image URLs
narrowed.find("a, link").attr("href", function (arr) {
return wotd.commonsbase + $(this).attr("href");
});
narrowed.find("img").attr("src", function (arr) {
return wotd.commonsbase + $(this).attr("src");
});
//Update the target spot on the main page with the search results
narrowed.find("table td a").each(function (){
$('<div></div>').html($(this)).appendTo(wotd.medianode);
})        
},
}

$(document).ready(function()
{
wotd.loadPage();
});

The code is well commented. It uses the jQuery library for Ajax calls and to manipulate the resulting pages. Widgets such as these have to have a server-side component as well, because security limitations prevent you from making an Ajax request from one domain to another. The JavaScript invokes a relative URL such as wotd_feed_proxy, which is basically a proxy for the remote Wikimedia site pages. This is a common pattern which you can implement with the server tools of your choice. I used the Python/CherryPy server code in Listing 4 (wotd_server.py).

Listing 4 (wotd_server.py). Python/CherryPy server-side code for word-of-the-day widget
# encoding: utf-8
"""
wotd_server.py

Requires: CherryPy http://cherrypy.py"""

import os
import urllib2
import cherrypy

#Wikimedia commons search doesn't like bots, so pretend to be a browser
HEADERS = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' }

#Tell the server where to find local, static files such as index.html
LOCAL_DIR = os.path.join(os.getcwd(), os.path.dirname(__file__))

#The URLs to be proxied
WOTD_FEED_URL = "http://toolserver.org/~cmackenzie/wotd-rss.php"
COMMONS_SEARCH_BASE = "http://commons.wikimedia.org/wiki/Special:Search?search="

#The server code class
class Wotd:
#Set up a local, static file server
default = cherrypy.tools.staticdir.handler(
section="/", dir=LOCAL_DIR)
#Proxy the Wiktionary feed
@cherrypy.expose
def wotd_feed_proxy(self):
cherrypy.response.headers['Content-Type'] = 'text/xml'
return urllib2.urlopen(WOTD_FEED_URL).read()
#Proxy Wikimedia commons search
@cherrypy.expose
def commons_search_proxy(self, word):
url = COMMONS_SEARCH_BASE + word
req = urllib2.Request(url, None, HEADERS)
response = urllib2.urlopen(req)
cherrypy.response.headers['Content-Type'] = 'text/html'
return response.read()

#Launch the server
cherrypy.server.socket_port = 8888
cherrypy.server.socket_host = 'localhost'
cherrypy.quickstart(Wotd())

Again the code is well commented. Listing 5 (index.html) is the demo host page for the widget.

Listing 5 (index.html). Demo host page for word-of-the-day widget
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
<head>
<title>Wikimedia word of the day</title>
<script src="jquery-1.2.6.js" type="text/javascript"></script>
<script src="json2.js" type="text/javascript"></script>
<script src="wotd.js" type="text/javascript"></script>
</head>
<body>
<p>Wiktionary word of the day: <a id="wotd"></a></p>
<p>Related items on Wikimedia commons:</p>
<div id="target"></div>
</body>
</html>
Figure 2. Sample page using the word-of-the-day widget
Sample page using the word-of-the-day widget

Wrap up

If you are interested in the technology and organization for these sites, the place for you is Meta-Wiki, "a website devoted to the coordination and documentation of the Wikimedia Foundation's projects and their related affairs." The Wikimedia Foundation is a non-profit organization, but it has inspired some commercial ventures with similar open information aims (commercial Wiki farms, as they are called). One of the biggest of these is Wikia, which was co-founded by one of the Wikimedia co-founders and hosts a variety of wikis on topics which would not fit in Wikimedia sites (for example the humorous Uncyclopedia, which offers fictional descriptions of real topics). Wikileaks is a wiki for activists, whistle-blowers, and others who want to anonymously publish sensitive documents kept secret by governments, corporations, or other organizations. Because of its special sensitivities, Wikileaks is not a traditional wiki but a hybrid that requires internal review before publishing.

Thanks to the use of Creative Commons Attribution license in Wikimedia sites, you can freely use all this material, even in commercial work, so long as you link back and clearly attribute to the source. The goals of Wikimedia are staggering, but so is the progress. The creative Webmaster has a tremendous amount of information to put to work.

Resources

Learn

  • Learn more about the Wikimedia Foundation, which "operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, one of the 10 most visited websites in the world." The Wikimedia blog includes useful information and milestones across all these sites. If you're more interested in the technology platform and community organization, get acquainted with Meta-Wiki.
  • Wiktionary is an impressively complete dictionary in 172 languages.
  • Wikibooks is a collection of free content textbooks and annotated texts that anyone can edit. Wikiversity builds on this to support learning communities, their learning materials, and resulting activities.
  • Wikinews is an open content, collaborative site for publishing articles on news and current events.
  • Wikispecies is a taxonomic directory of life forms.
  • Wikisource is a repository of source texts, including annotations, translations, and supporting materials.
  • For a good introduction to jQuery, see "Simplify Ajax development with jQuery", by Jesse Skinner.
  • Check out the Google custom search Wikimedia plus created by the author.
  • Wikia is a commercial organization that hosts numerous public wikis for open information, using the same MediaWiki technology as Wikimedia sites. One of Wikia's most popular sites is the humorous Uncyclopedia.
  • Wikileaks is a site for publishing anonymous leaks and sensitive documents, supporting activism and whistle-blowing.
  • Learn more about Creative Commons in the developerWorks article "Real Web 2.0, Part 8: Mastering the Creative Commons".
  • Expand your site development skills with articles and tutorials that specialize in Web technologies in the developerWorks Web development zone, including previous installments of this column.
  • Stay current with developerWorks technical events and webcasts.

Get products and technologies

  • The Wikimedia sites use the open source MediaWiki implementation, which you can use to host any wikis of your own, if you choose.
  • jQuery is a wonderful JavaScript library with Ajax features, among others.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=349848
ArticleTitle=Real Web 2.0: The Wikipedia family
publish-date=11042008