Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Capture the collective Web intelligence of an enterprise

How social tagging, rating, commenting, and sharing of Web pages can grow organizational knowledge

Thom Burris (thomburris@in.ibm.com), Senior Software Engineer and Innovation Team Leader, IBM
Thom Burris author photo
Thom Burris is a senior software engineer and innovator at IBM. His current focus is on Enterprise 2.0 enablement. Thom speaks often at IBM conferences, evangelizing the ideas and projects he is passionate about. He currently lives in Pune, India.

Summary:  Every enterprise has talented, experienced employees focused on creating value from information. The Web is often the primary source of that information. In this article, learn about a system that enables employees to interact with Web pages and captures their interactions. The result is a valuable repository of Web-based information focused on the business of the enterprise. Also explore a high-level architecture for the mechanisms that create the repository.

Date:  02 Feb 2010
Level:  Introductory PDF:  A4 and Letter (158KB | 16 pages)Get Adobe® Reader®
Also available in:   Chinese  Korean

Activity:  13002 views
Comments:  

Introduction

As employees do their work, they find and use a collection of Web pages that forms a unique subnet of their intranet and the Internet. It is a collection of Web pages that is directly relevant to the business and mission of the enterprise. If you could capture the user's intent in finding a Web page, or capture his or her thoughts about the subject (or relevance of the page to his or her work), the collection of Web pages would become more like a library—a unique library catalogued and cross referenced by the users themselves. Think of this library as the collective Web intelligence of an enterprise.

There are existing Web mechanisms that let us capture the thoughts and intent of users in the context of the Web page. The mechanisms are:

  • Social tagging: Associating a word or phrase with a Web page.
  • Rating: Scoring a Web page on a specific scale (for example, 5 stars).
  • Commenting: Entering text on a Web page as part of a conversation about the page content.
  • Sharing: Suggesting a Web page to colleagues.

Users interact with Web pages through these mechanisms. By interacting, users augment the static information on the page with "social" metadata. This social metadata is the connective tissue that connects employees with Web pages in all kinds of interesting and useful ways.

Enabling an enterprise to capture its collective Web intelligence is a fairly straight-forward technical challenge. Before discussing approaches to designing and implementing such a system, the next section explores why an enterprise should bother.


Investment in Web-based knowledge

The knowledge embedded in employee Web usage constitutes an important and previously untapped source of organizational knowledge. An enterprise hires smart, knowledgable, and experienced people at a significant cost. Those smart people use the Web (intranet and Internet) to find information that helps them do their jobs. The information they find is a valuable asset owned by the enterprise, and it reflects an aspect of the business of the enterprise. For example, it may relate to researching a sales lead, an interesting piece of industry news, or exploring an idea. The time an employee spent finding the Web page is a tangible investment. Furthermore, the time that other employees spend finding—or not finding—the same information squanders that investment.

A library of collective Web intelligence will capture the organizational knowledge and make it reusable. Services such as content recommendation and expertise discovery can leverage this knowledge. An even more fundamental service is the intranet search. In most enterprises, intranet search is performed with the same technology used for Internet search, and, generally speaking, it’s not very effective. It uses crawling, indexing, link analysis (which on an intranet is of marginal use), and so on to collect and sort search results.

Contrast the intranet/Internet search with the social search: a search that leverages the social data generated by the employees' social interaction with Web pages. For example, when an employee associates the tag "agile" with a Web page using mechanisms for social tagging, two important pieces of information are generated which the social search can leverage:

  • The page is useful.
  • It is unambiguously relevant to the term "agile."

The first piece of information is based on the good assumption that an employee wouldn’t interact with the page unless he or she found it useful. The second is based on trust in the expertise and professionalism of the employee.

People matter!

On the Web, you cannot rely on user identity. If someone with a username thom112 tags a piece of content, the extent to which you can trust it depends on how much you trust thom112.

In an enterprise, however, you can trust user identity. You can also make positive assumptions about the motivations and professional standards of a person. The enterprise community is especially well suited for a library of collective Web intelligence. Reliable user identity is the basis for expertise and community-of-interest discovery.

The power of social search lies in the human capability to process information and filter the viable search results. It doesn't matter whether one employee or 50 tagged the page with the word "agile." A social search would return the page in the search results. This is, frustratingly, often not the case in traditional intranet searches.

To understand how a library of collective Web intelligence of an enterprise might be used to power search, let's look at its structure.

Enterprise social data

When an employee finds a useful Web page, it becomes a piece of valuable property to the enterprise. Why is it valuable?

  • Employees know what they are looking for: a Web page that provides the information they need to do their job.
  • The Web page was found useful by a specific employee. This is important information. Employees are not just information processors; they're members of teams, members of communities, and agents of connection between people, projects, and ideas within an enterprise.
    As employees locate Web pages that are of interest to them, they are aggregators of information. As both connection agents and aggregators of information, a person is a conduit through which information can flow to others in the organization.

When users interact with a Web page by tagging, rating, commenting, and sharing, they're associating themselves with the page. They're also associating social metadata with that page. This social metadata is the card catalogue of the library of collective Web intelligence; it provides the connections between information.

Social tagging

Social tagging is when a person associates one or more words or phrases with a Web page. The user is empowered to tag a Web page either by a widget of some sort on the page itself, or with a toolbar or plug-in associated with the Web browser. Fundamentally, a social tagging system captures:

  • The URL of the Web page
  • Tags that a user associated with the page
  • User identity

The social data associated with the act of tagging a Web page is shown in Figure 1:


Figure 1. A user tags a useful Web page
A diagram                                of three cirles, representing Person, URL, and Tags. They                                are connected to each other in a triangle.

Assume the enterprise has a mechanism for users to tag pages and a mechanism for saving the resulting social data (a system similar to the one described later in this article). Every time an employee tags a Web page that he or she found useful, this trio of data is collected and stored.

And, every time users tag a Web page, they are tagging themselves. For example, when you tag a page with "agile," you're tagging yourself with "agile." In the former, you're explicitly claiming that a Web page relates to the tag "agile." In the latter, the implicit inference is that you relate to "agile" in some sense, be it interest, job role, task, and so on. For this reason, the figure above is closed. The tags are associated with both the URL and the person.

Each of the other mechanisms for user-interaction with pages—commenting, rating, and sharing—have associated graphs connecting a person, URLs, and other social data. For example, consider sharing a page. If an employee finds a page relevant to the work of his or her team members and shares it with them, the representation looks like Figure 2.


Figure 2. A user recommends a useful Web page to colleagues
A diagram showing                     circles for Person, URL, Person A, Person B, and Person C. The Person                     circle is connected to URL and each of the other three Person circles.                     The URL circle is also connected to each of the Person circles.

In this case, two things are captured:

  • Data that can serve to associate Persons A-C with a Web page (URL).
  • Information that can be used to identify a community of interest.

Just as with social networking sites, these networks can grow and be leveraged by a user's interaction with mechanisms on the page itself.

Even with the fairly common mechanisms of social tagging and sharing, you can see how a wide and deep network of socially connected data might begin to grow within an enterprise. This network, a subnet of the overall data in the intranet and Internet, consists of Web pages that employees found useful. But, it's much more than that. These Web pages are connected by the employees themselves and with the social data they create when they interact with the pages. It is indeed the beginnings of the library of an enterprise's collective Web intelligence. It is a network of people and URLs and tags unambiguously connected via human information processing.

The next section looks at how this network of data might be used.


Social search and the graph of social data

In our simple scenario, the graph of socially networked data consists of three types of nodes: Person, URL, and Tag. In a mature network, a given tag may be connected to thousands of URLs and persons. Consider a few basic clustering possibilities with this sort of network of data. The most basic grouping is a cluster of URLs around a tag. Keep in mind that each URL was used by someone, and tagged by him or her, with a term of his or her choosing. The example tag is "agile."


Figure 3. A cluster of URLs tagged with the word 'agile'
A diagram with                     Tag: agile in the middle. Surrounding it and connected to it are                     multiple URLs.

If you sort this collection of URLs based on how frequently they were tagged by users with the word "agile," you'd have a simple social search result where the query term is "agile."

Assume you want to leverage the collective Web intelligence of your organization even further to refine the results to just a few pages that will tell you what you need to know about agile development methodologies. One way to do this is to consider the grouping of URLs around a person (a specific person of your choosing). You choose Amy, a colleague who has experience with software development methodologies. This cluster of URLs would consist of the Web pages Amy has socially interacted with by tagging, sharing, commenting, and so on.


Figure 4. A cluster of URLs interacted with by a user
Similar to diagram in                     Figure 3. Person: Amy is in the middle. Surrounding are URLs.

You query the emerging network of collective Web intelligence with something like:

 
                “Give me all Web pages relating to TAG=agile and PERSON=Amy”

You're essentially combining these two clusters to get an intersection. The intersection might look like Figure 5:


Figure 5. A simple intersection constituting a useful search
Figures 3 and 4 in                     one view. The same URL is connected to both 'Tag: agile' and 'Person:                     Amy'.

Here are the two Web pages you can review to learn about agile programming. You might have found them by other means, but probably not without a fair amount of time and effort. In a standard intranet search, the pages could have been buried way down in the search results or they may have been left out altogether—especially if they were sites from outside the corporate firewall.

By using the social connections between the data in the network of collective Web intelligence, your search can be extremely granular and unerringly accurate. It can also be extended to include useful and intuitive criteria. In the example, the search intent was for pages related to "agile" that your respected colleague also interacted with. If the search returned too many results, you could refine the query to include other people, pages, tags, rating ranges, and so on. You could also potentially use attributes of people (geographic location, job role, or any other attribute captured by an enterprise) to narrow the search. For example, you could search for pages interacted with by employees based in New York in the sales department focused on the banking industry.


Content and community-based services

Using the network of collective Web intelligence for searching is just the tip of the iceberg. Clustering tags, people, URLs, and so on can go a long way to enabling other useful services, such as:

Content recommendation
Given a user's history of social interaction (tags, ratings, the pages they've interacted with), recommend pages he or she might be interested in.
Page suggestion
Something akin to Amazon's suggested products. For example, "People who interacted with this page also interacted with the following pages…"
Expertise discovery
Help an employee find an expert on a particular topic.
Community-of-interest discovery
What groupings of people can be inferred from their social interaction with Web pages? Or, suggest a virtual team around the topics, pages, or attributes.
Business intelligence mining
What predictive trends can be extracted from the Web usage of an organization?

These services can be implemented in a variety of ways, ranging in sophistication from simple clustering and intersections to techniques of collaborative filtering and statistical analysis.

A key to the usefulness of the network is obviously critical mass. You need enough data to create intersections, and to make statistical analysis meaningful. However, a mitigating aspect of social data is that low active participation can create high value for everyone. Echoing the example given earlier, all it takes is one user to associate the tag "agile" with a Web page for that page, and that user, to become a transit point on the network of collective Web intelligence relating to "agile." The distribution of users contributing their knowledge to the network is more important than the volume of user interactions.

Now that you've seen the power of collective Web intelligence, the next step is to explore how one might design a system to collect it.


Architecture of a page-based social service application

The goal is to design an application that: lets users interact socially with Web pages; harvests and saves these interactions; and enables aggregation of any social data across the enterprise into a central repository of enterprise social data. The primary features to be implemented are:

  • Page-based or browser-based user interface (UI) that lets users socially interact with any Web page.
  • A service layer that handles requests from the UI controls.
  • A data store with a data model appropriate to the kind of data you're storing.
  • An aggregation component that collects social data from applications in the intranet that already collect social data (so you have a single, central repository).

Since the overall system includes services for all four types of page interaction (tagging, rating, commenting, and sharing), you need to customize the architecture for those parts that will be shared across all the services. One suggested implementation is to use a single database, a single aggregation subsystem, and a single public API for all the services. Figure 6 shows a high-level view of what this architecture might look like.


Figure 6. High-level architecture of the page services
A diagram showing:                     User interface on the left, including Tagging, Rating, Commenting, and                     Sharing. In the middle are boxes for Tagging service logic, Rating                     service logic, Commenting service logic, Sharing service logic. These                     all point to Api, Data, and Aggregation boxes. Stand-alone social                     appliations point to the 'Aggregation' box.

A tagging service application

To see how you might implement this architecture end to end, consider building it for one type of social data: social tagging. You'll need to implement a UI, a service layer, a database, and an aggregation component. A high-level view of the central tagging application architecture is shown below.


Figure 7. An architecture for a social tagging service
The same diagram as                     in Figure 6, with these changes: There is a tagging widget box with a                     text field and 'Go' button pointing to the 'Tagging service logic'                     box. All other boxes are greyed out except data, aggregation, and                     stand-alone social applications.

The tagging UI

The tagging UI must:

  • Enable users to tag the page.
  • Show the tags that others have associated with the current page.
  • Provide a means by which the current user can be identified.
The details of writing a comprehensive tagging UI are beyond the scope of this article.

Ideally, you want employees to be able to tag any page they interact with, whether it be on the enterprise intranet or an external site. To accomplish this, you can create a UI at the browser level so it can be used with any page.

There are two choices for creating a browser-level UI. You can either:

  • Create and deploy a browser extension, or
  • Create a bookmarklet that pops up a browser window that contains your UI.

The bookmarklet is the easiest to create and distribute. To create and distribute a bookmarklet, you can simply create a link on a Web page. Most major browsers allow users to drag links to the bookmark toolbar at the top of the browser. To distribute the bookmarklet, you need only to make the link available on a site with instructions to users for deploying it on their browser.

Once the bookmarklet is deployed, users click on it when they want to tag the page. The bookmarklet opens a browser window with the HTML markup of your tagging UI (sourced from your central tagging application). Figure 8 shows an example.


Figure 8. Tagging pop-up window from a bookmarklet
A pop-up window with                     fields for Title and Tags. There are Submit and Cancel buttons at the                     bottom.

A third approach to the tagging UI is page-based widgets that load on the enterprise intranet's pages from a central service. These UI controls would only surface on intranet pages; they would not enable interaction with external sites. However, for enterprises with significant investment in their intranet, it can be worth the trouble. It puts the UI for page interaction front-and-center to the employees' experience of the intranet. Every page on the intranet can contain a unified interface for page interaction, making it more likely that employees will, in fact, interact with the page. To the extent that an intranet contains Web-based information used by employees to get their work done, the Web intelligence collected using this page-based approach can be significant.

Since the page-based widget would essentially be a service provided by your tagging application to sites on your enterprise intranet, it would be loosely bound to the intranet pages. It would not be coded directly into the page's HTML. This is crucial for adoption and central administration, because it enables sites on your intranet to adopt the widgets with minimal changes to their own pages.

Regardless of which UI you choose to implement, you need the identity of the people doing the tagging. To get user identity, you need some means of enabling users to log in (authenticating themselves to the system). In many enterprise intranets, there is at least one mechanism by which employees can log in. Typically, a cookie is written to the user's browser; it can be used to identify the user across multiple sessions. In that case, your tagging UI code could check for the presence of the cookie and verify that it can resolve the user's identity. If the UI code cannot resolve user identity, it should give a message to users that they should log in. Implementations will vary, but the point is that your tagging UI needs to know who is tagging the page.

The central service application

The central tagging application must be able to receive and process requests from your browser-side UI over HTTP. For example, when a user tags a page, the UI sends a request to the service to add a new tag. The service logic receives the tag and writes it to a database, along with the URL and the user's identity.

At a minimum, you'll need to implement the following services:

  • Get all tags for a given URL
  • Add a tag
  • Delete a tag

Servicing browser-side UI requests is fairly straightforward Web application stuff. You might implement this with a simple servlet receiving HTTP GET requests from the widget and returning appropriate responses formatted as JSON. The JSON would then be interpreted by the client-side JavaScript and the appropriate modifications made to the browser-side UI.

The aggregation subsystem

The aggregator component will be shared by all the social service applications. It aggregates tag data (or rating, commenting, or page sharing data) from any systems within your intranet that have their own mechanisms for collecting social data (for example, applications that don't use the service's widget). Consider an intranet that has a social enablement suite such as IBM Lotus Connections® deployed. Most Connections applications enable users to tag Connections pages via built-in controls. Our central tagging service aggregator would collect these tags from Connections so you can provide federated services, such as social search across the entire intranet, including Connections.

The aggregator component will run on schedule based on how often a particular source needs its data aggregated. Figure 9 shows a sample model.


Figure 9. Tag aggregation model
A diagram with boxes                     for API, Data, and Aggregation on the left. The boxes for 'Stand-alone                     applications' are on the right. These are connected to the Aggregation                     box via a line that says 'http'. Next to each stand-alone application                     box is a small icon representing tags.

Rather than force all the applications within your intranet to use your browser-side tagging UI, you simply aggregate the tagging data from any applications that collect their own social data. How do you do this? First, the source application needs to make the data available in a manner that you can access it. The most common way that applications (such as Lotus Connections) make this data available is with syndicated feeds that are accessible over HTTP.

Assume that the example source application is tooled to provide an Atom feed that contains the following information:

  • The URL of the page that was tagged.
  • The tags associated with the URL by a particular person.
  • The e-mail address of the person who created the tags.

The aggregation component of the tagging service needs to be able to request the feed from the source application's API. Once it has the feed, it must parse it and write the relevant data into the tagging service's database. It should also have some sort of scheduler to go get the feeds at appropriate intervals.

The database

The data model for social tagging can be quite simple. It requires at least a URL, a person, and a tag table. The table that contains a user's tagging interactions with the page (the TAGGING section in the figure below) will have foreign keys (FK) into the URL, person, and tag tables. Table 1 shows an example.


Table 1. Simplified data model for a tagging application
URLTAGPERSONTAGGING
URL_IDTag_IDPerson_IDURL_ID (FK)
URLTagemailPerson_ID (FK)
Title Display_nameTag_ID (FK)
Description  Created_ts

Each row in the TAGGING table represents a single interaction of a user tagging a Web page.

This example is mainly for illustrating the bare minimum of data model requirements. Your data model will be more extensive; you will almost certainly want to store timestamps and other information.


Other considerations

When implementing the system, you might also want to consider normalization and data analysis.

Normalization

Assume you have the situation where some URLs are tagged with the words "org chart" and others are tagged with "org_charts." When you search your tagging application, you want the results to include URLs tagged with any version of the tag. Tag normalization makes this possible.

Tag normalization is the process of modifying a tag to make it as generic as possible. The normalized version of a tag should ideally be the same for all versions of the tag, regardless of case, plurals, or stems. Tag normalization techniques can get very sophisticated, but simple techniques go a long way. For example, simply lower-casing and swapping spaces and dashes for underscores can take care of the majority of cases.

You might also consider URL normalization. It is fairly common for Web pages to be resolved by more than one address. This seems to be especially true in portal environments. For the tagging application, this can be a problem if you query it for the names of everybody who tagged a given URL. If you don't implement URL normalization, you might not get correct results because people may have tagged different versions of the same URL.

To remedy this potential issue, you can store the "raw" URL and a normalized version of it. URL normalization techniques should take W3C URI standards into account (see Resources).

Data analysis

The extent to which a distinct analysis component is required depends on the sophistication you require for the federated services provided by the system. For example, you could implement tag-based search by creating a simple search UI that makes requests directly against the database. This approach may be sufficient for your needs. However, if you implement the entire system (all four social service applications) and have data gathered from tagging, rating, commenting, and sharing, this simplistic approach doesn't really take full advantage of your ability to implement sophisticated services leveraging the collective Web intelligence of your enterprise.

If you need more sophisticated services, you should implement an analysis component. For example, your analysis component could pre-calculate all the URLs related to a specific tag and write this pre-calculated data to a dedicated table for fast lookup. The pre-calculation can involve sophisticated techniques of statistical analysis, collaborative filtering, and so on. There are open source frameworks available, such as Apache Mahout, that provide useful implementations of these techniques (see Resources).


Conclusion

Once you've implemented a system that empowers employees to interact with the Web pages they use, the only thing missing is … to get them to interact with the Web pages they use! This is, of course, not a technical problem but an issue of corporate culture and management.

Creating collective Web intelligence should be an enterprise-wide collaboration. Each page tagged, and each comment entered, is an act of participation in this collaboration. The ideal is to get everyone from all corners of an enterprise to contribute their Web intelligence to the collective. If participation is sparse, but active and well distributed across the enterprise, a valuable network of Web knowledge will still grow. And, as is typical with good social applications, adoption builds momentum slowly as users get used to the model and start to see the benefits of participation.


Resources

Learn

  • DOJO and JQuery are popular JavaScript frameworks that you can use to create widgets and perform cross-domain Ajax.

  • Learn about Ajax history, technologies, rationale, drawbacks, and more.

  • Check out W3C specifications for Uniform Resource Identifiers (URIs). This wiki has some useful criteria for URL normalization.

  • Apache Mahout has information about collaborative filtering and other forms of analysis of social data.

  • developerWorks technical events and webcasts: Stay current with the latest technology.

  • This Web page includes the social features mentioned in this article: tagging, commenting, rating, and sharing. Give them a try! And you can also personalize your developerWorks experience by creating a profile and exploring all of our social offerings at My developerWorks.

Get products and technologies

About the author

Thom Burris author photo

Thom Burris is a senior software engineer and innovator at IBM. His current focus is on Enterprise 2.0 enablement. Thom speaks often at IBM conferences, evangelizing the ideas and projects he is passionate about. He currently lives in Pune, India.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=466245
ArticleTitle=Capture the collective Web intelligence of an enterprise
publish-date=02022010
author1-email=thomburris@in.ibm.com
author1-email-cc=bwetmore@us.ibm.com