Real Web 2.0: Wikipedia, champion of user-generated content

How Wikipedia encourages users to contribute content

Encourage user contribution to your Web site by learning from Wikipedia. Wikipedia builds on open source and respects the geographical variety and potential accessibility needs of its users. It provides tools to help users contribute, but also fosters an atmosphere where contributions are verified and discussed by the broader community.

Uche Ogbuji (uche@ogbuji.net), Partner, Zepheira

Uche Ogbuji 的照片Uche Ogbuji is Partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies. Mr. Ogbuji is lead developer of 4Suite, an open source platform for XML, RDF and knowledge-management applications and lead developer of the Versa RDF query language. He is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia.



04 September 2007

Also available in Chinese Russian

Wikipedia is one of the widely known and discussed Web 2.0 Web sites. It's a darling of those who believe that opening up data for free use, reuse, and contribution by users advances society and civilization. It's the very devil to those who believe that such open data leads to a mess of unreliable and uneven knowledge, and it's a gift to outright kooks and spammers. Whether you believe in the information commons or in centralized control, you'll find plenty of debate on the matter—but not in this article. One accomplishment of Wikipedia no one disputes is the huge amount of user contribution. If you are developing a Web site for an organization trying to reap some of the benefits of the open data revolution (a Web 2.0 buzzword), this is probably what you hope for most of all. You want people to contribute and thus add value to your offering, leading to what marketing professionals call the network effect , where more user contribution creates more value, which attracts more users, and so on. Wikipedia is a great example of a site that has succeeded in generating such an effect, and in this article you will learn elements of Wikipedia's architecture that you can use to grow your own site with user-generated content.

Wikipedia in a nutshell

Wikipedia's article on itself does a good job of giving the gist of the project in its two first paragraphs.

[Wikipedia] is a multilingual, web-based, free content encyclopedia project. Wikipedia is written collaboratively by volunteers from all around the world. With rare exceptions, its articles can be edited by anyone with access to the Internet, simply by clicking the edit this page link. The name Wikipedia is a portmanteau of the words wiki (a type of collaborative website) and encyclopedia. Since its creation in 2001, Wikipedia has grown rapidly into one of the largest reference Web sites.
In every article, links will guide the user to associated articles, often with additional information. Anyone is welcome to add information, cross-references or citations, as long as they do so within Wikipedia's editing policies and to an appropriate standard. One need not fear accidentally damaging Wikipedia when adding or improving information, as other editors are always around to advise or correct obvious errors, and Wikipedia's software, known as MediaWiki, is carefully designed to allow easy reversal of editorial mistakes.

Some concepts from this description are immediate lessons for developers of user-generated content sites:

  • "multilingual" — Wikipedia is a very well-internationalized project and goes to great lengths to make people comfortable contributing from every corner of the globe. You don't have to go as far as Wikipedia does, but remember that a less ambitious project is more likely to have a less diverse group of users, and you should consider how to avoid closing doors to diverse contributors. Make any internationalization weakness a known and documented trade-off, and try to leave things open so you can quickly address such weaknesses as your site grows. The same goes for accessibility. If you want users to contribute you must show that you respect their own needs, and this includes the needs of the disabled, and those using limited Web technology such as mobile device browsers.
  • "collaboratively" — Wikipedia's success is not just a matter of publishing an "add your contribution here" form and hoping people do so. It has an overall setup that encourages people to collaborate to generate content. A large part of this is in the discussion pages that are attached to each article, and the editorial tagging system, which allows people to attract the attention of others who might want to help improve some bit of content.
  • "edited by anyone" — Wikipedia does encourage contributors to create accounts, but allows them to chip in without doing so. This is perhaps farther than you'd be willing to go because abuse is truly a problem, but it is a tremendous part of Wikipedia's network effect. You'll probably never be as juicy a target as Wikipedia, but Wikipedia's popularity also means there are many contributors who work on cleaning up abusive updates.
  • "editing policies" — Publishing and enforcing adherence to a well-considered policy is key. If you take action against abuse, you want to be able to refer to the clauses in terms of use that support your action. If users ever get the impression that you are arbitrary in dealing with their contributions they will be far more reluctant to participate. The first policy to shape is probably your privacy policy, not just because users demand it, but in many cases it's a legal requirement.
  • "MediaWiki" — The software used as the foundation of Wikipedia is open source available for other projects. Consider opening up at least part of the software you use in your own site as a goodwill gesture for users who are often conscious that the sites they contribute to in turn contribute to the commons of information.

Picking the right engine

Wikis have been around for a while, often used by software development teams as a crude project management and communications tool. MediaWiki was developed with the sort of large-scale collaboration envisioned for the Wikipedia project. It's a PHP program that uses MySQL or PostgreSQL to store data. One of the most powerful features is what MediaWiki calls templates, which are like tags in other collaborative Web systems, and not unlike stereotypes in object-oriented design. They allow one to flag that the content of a page is disputed, or that it does not properly cite sources, for example. It's not the easiest project to set up and administer, but if your collaborative software needs are suitable for a Wiki approach, you might want to consider adopting this well-tested software.

Lower barriers to contribution

User contribution to Web sites is often a spontaneous affair. A person might visit a movie review site, find they disagree completely with reviews already posted, and decide to write their own to balance the points of view. If the person has to fill in a long registration form or jump through any other hoops first, this might kill the impulse. Sometimes you absolutely need some information from would-be contributors, but it's important to gather this information in an unobtrusive manner, if possible. Perhaps for initial registration you could just require name, password, and e-mail address. You could make some advanced features available to users who provide address and occupational information. Perhaps you could have promotions where you offer rewards for users who fill in surveys. The best way to manage this systematically is to create a map of all the information you might be able to collect for a user, and organize it according to the degree of effort, the likely expectation of privacy, and the risk (for example credit card numbers have high risk) for each bit of information, and design your system to address reward and security according to the user information map.

Editing aids

You can get richer information from users if your tools make this easy. Wikipedia tries to give users such powerful editing tools, one of which is wikEd. By adding a simple template {{subst:wikEd}} to your user profile you can enable an editor that makes it easier to enter good wiki markup. Figure 1 shows an editing session under my Wikipedia account, editing the IBM developerWorks entry. All the extra buttons and highlighting are features enabled by wikEd, otherwise the user just gets a plain text entry and has to remember all the relevant Wiki markup and other features. wikEd can be used with other MediaWiki installations as well.

Figure 1. Editing session on Wikipedia, using wikEd
Editing session on Wikipedia, using WikEd

Trust but verify

Making it easy to contribute using tools and smart design shows your users you trust them to provide valuable content. Unfortunately the commons of the Internet means many users will abuse that trust, and you must plan for this right away. A lot of abuse comes from content generated by robots, and some sites use techniques such as visual puzzles to try to ensure it's a person filling out a form rather than a robot sending a direct Web request. When considering such techniques, be sure you think of Web accessibility. Blind users can't see your visual puzzles, and users on portable devices might have difficulty with distortion. Your best bet is to build in layers of verification systems, including tests for URLs known to be promoted in spam, or by deploying an editorial team to monitor and verify contributions. If the editorial team members also sometimes post comments on their own, they can serve as a valuable seed for further conversation, increasing the atmosphere of trust with their personal touch. On the most successful sites, the editorial team is the entire community. Wikipedia enhances this interchange by offering a back-channel to each page—a parallel discussion page where folks can negotiate the content of an entry and further explain tagging and other editorial actions.


Integration technology

These days users have some general expectations of modern Web sites, and meeting these expectations is key to nurturing a culture of collaboration. For one thing, users expect Web feeds for all bits and pieces of a site they care about. Wikipedia and the MediaWiki software is particularly rich on Web feeds for tracking updates. This way if a user is obsessive about an article, they can keep a close eye on related developments. Listing 1 is an example from the live Wikipedia feed for an IBM developerWorks article, with only one entry, representing the latest modification, and with many minor tweaks for formatting purposes.

Listing 1. Snippet from update feed for Wikipedia IBM developerWorks article
    <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
 <id>http://en.wikipedia.org//w/index.php?title=IBM_DeveloperWorks&amp;action=history</id>
 <title>IBM DeveloperWorks - Revision history</title>
 <link rel="self" type="application/atom+xml"
href="http://en.wikipedia.org//w/index.php?title=IBM_DeveloperWorks&amp;action=history"/>
 <link rel="alternate" type="text/html"
href="http://en.wikipedia.org/w/index.php?title=IBM_DeveloperWorks&amp;action=history"/>
 <updated>2007-08-09T22:55:35Z</updated>
 <subtitle>Revision history for this page on the wiki</subtitle>
 <generator>MediaWiki 1.11alpha</generator>
  <entry>
   <id>
http://en.wikipedia.org/w/index.php?title=IBM_DeveloperWorks&amp;diff=138026693
&amp;oldid=prev</id>
   <title>Chris Chittleborough: Undid revision 137942824 by 76.21.123.38 (talk) -
    old URL works, no need for query string</title>
   <link rel="alternate" type="text/html"
    href="http://en.wikipedia.org/w/index.php?title=IBM_DeveloperWorks&amp;diff=138026693
&amp;oldid=prev"/>
   <updated>2007-06-14T00:18:33Z</updated>
   <summary type="html">
&lt;p&gt;&lt;a href=&quot;/wiki/WP:UNDO&quot; 
title=&quot;WP:UNDO&quot;&gt;Undid&lt;/a&gt; 
revision 137942824 by &lt;a
href=&quot;/wiki/Special:Contributions/76.21.123.38&quot; 
title=&quot;Special:Contributions/76.21.123.38&quot;&gt;76.21.123.38&lt;/a&gt;
(&lt;a href=&quot;/w/index.php?title=User_talk:76.21.123.38&amp;amp;action=edit&quot;
class=&quot;new&quot; title=&quot;User talk:76.21.123.38&quot;&gt;talk&lt;/a&gt;)
 - old URL works, no need for query string&lt;/p&gt;
   </summary>
   <author><name>Chris Chittleborough</name></author>
 </entry>
</feed>

It's in Atom format, and you can see how the title conveys the description of the modification, and the summary provides fine details (this is just a trimmed version of the summary from the actual feed, which is very long.) Given that the summary content is auto-generated, it would be nice to use XHTML rather than HTML text type so other code can more easily access this detailed information. Perhaps that's a refinement you could make for your own site.


Wrap up

It can be scary to open up your systems to user-contributed content. You have new usability, legal, policy, and data quality issues to deal with, and that's just dealing with users who come in good faith. You also open yourself up to abusive agents, who can wreak all kinds of havoc. As for the latter problem, many people have pointed out that being targeted for abuse is just a part of having a valuable service in the participation economy. A pithy quote called Rafe's Law states that "an Internet service cannot be considered truly successful until it has attracted spammers." Internet pundits such as Cory Doctorow have made similar observations. In general, by using architectural foundations and reusing code and tools from successful projects such as Wikipedia, you can reduce some of the worry about opening up to user contributions.

Give your users all the paraphernalia of modern Web sites so they can write their own code and mash-ups. You'll find several examples of how to do this in previous installments of the Real Web 2.0 series, and in fact several folks have produced Wikipedia search bookmarklets very similar to the IBM developerWorks bookmarklet I presented in the last article. The more ways in which they can use your site, the more opportunity and encouragement they'll have to contribute. Users like to feel some ownership of their data, including the data they produced on someone else's server. By making it available to them to reuse elsewhere you reduce some of their fear that you plan to lock up their information for good. And this is the sort of confidence that motivates users to boost the network effect on your Web site.

Resources

Learn

Get products and technologies

  • Try out MediaWiki, the engine of Wikipedia.
  • wikEd is a tool to make editing Wikipedia more convenient, and can be used with other MediaWiki installations as well.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=253122
ArticleTitle=Real Web 2.0: Wikipedia, champion of user-generated content
publish-date=09042007