 | Level: Introductory Todd Sundsted (todd-p2p@etcee.com), Chief Architect, PointFire, Inc.
01 Mar 2001 Peer-to-peer (P2P) computing promises to be the paradigm with mindshare sufficient to push a number of interesting distributed computing technologies from the shadows into the spotlight. To better understand P2P technology, Todd Sundsted begins this series with a trip back in time to the early 1980s, when the first popular P2P applications came into existence. He explains where P2P computing fits into the broader distributed computing landscape. Finally, he presents the world's simplest peer and, as a means of foreshadowing what is to come, points out its deficiencies. Armed with this information, you will be able to build your own P2P applications in the Java programming language or adapt specific techniques to your own programming ends. Unless you've been asleep at the wheel for the last nine months, you've heard of peer-to-peer (or P2P) computing. If you believe, as many do, that its use is limited to file sharing or that it is the most important development in the history of computing since the invention of the Internet, you can be forgiven -- the hype generated by some of its more extreme proponents is to blame. In spite of the hype, P2P computing is important, and it's beginning to look like the paradigm with a large enough slice of mindshare to move a number of promising technologies from the wings into the limelight. Therefore, it's important to understand where P2P computing fits into the broader technological landscape. It's also important to have live examples of P2P code to learn from and to work with, so I will provide code written in the Java programming language. Over the course of the coming months, I'm going to explore the P2P landscape. To avoid becoming bogged down in peripheral discussions (of which there are many), I'll limit my exploration with respect to the following:
- Due to the nature and use of many popular P2P applications, discussion of P2P technology is often hopelessly entangled with questions of legality, freedom, privacy, control, and copyright. While these questions are all interesting, they shouldn't necessarily drive every discussion of peer-to-peer technology. Therefore, unless one of these issues drives the requirements for a particular application, I'll avoid discussing them entirely.
- The problems that confront the engineer designing and building P2P applications are interesting but not necessarily new. Therefore, in addressing those challenges, I'll borrow from other areas of computer science, as necessary.
In a nutshell, expect this column to focus on the technology -- specifically, content and resource management, trust and security, ownership and rights, communication models, distributed processing, and searching and querying -- and to ignore, for the most part, the social, political, and legal problems that boil around many current P2P applications. A brief history
Peer-to-peer computing didn't spring into existence in its current form. Rather, it is the child of a number of different parents. Let's consider two of the most significant. First and most important, P2P computing is the natural result of decentralizing trends in software engineering intersecting with available technology. From an engineering perspective, the trend over the last decade, driven by forces such as enterprise application integration, has clearly been away from monolithic systems and toward distributed systems. This trend was inhibited somewhat by the ease of managing centralized applications, but the growth of the Internet, followed by the rise in importance of B2B transactions, made full-scale distributed computing a business necessity. Intersecting this trend is the growth in the availability of powerful networked computers and inexpensive bandwidth. To be effective, P2P computing requires the availability of numerous, interconnected peers. These two trends combined to form the perfect playground for P2P application research. Nontechnical social issues were also important. Most of what's driving the current interest in P2P computing unarguably arose as a result of the popularity of products like Napster, Scour, Gnutella, and others of their ilk. They provided the "killer apps" that put a subset of P2P technology in the hands of lots and lots of end users. That first-hand experience, in turn, raised awareness of the power of the P2P paradigm. However, I must point out that the first P2P applications appeared nearly two decades ago, and many are still in existence. These applications often fail to receive the recognition due them because, while they are in their heart P2P, most people never see and feel that part of the application.
Early attempts
P2P computing isn't all that new. The term P2P is, of course, a new invention, but basic P2P technology has been around at least as long as USENET and FidoNet -- two very successful, completely decentralized networks of peers. P2P computing may be even older (I hereby issue a challenge to my readers to find the earliest P2P application -- extra points will be awarded if it's still in use), but these two examples suffice to demonstrate its age. The bottom line is that many of the people using P2P applications today weren't even using computers when the first P2P applications appeared. USENET, born back in 1979, is the distributed application that provides most of the world with its newsgroups (my favorites are rec.arts.int-fiction and rec.games.int-fiction). Its earliest incarnation was the work of two graduate students named Tom Truscott and Jim Ellis. At the time, nothing like the "on-demand" Internet we know today existed. Files were exchanged in batch over phone lines, often at night when long distance rates were lowest. Consequently, there was no effective way to centralize the function of the USENET. The natural result was an extremely decentralized, distributed application -- a structure it retains to this day. The other outstanding early P2P success is FidoNet. FidoNet, like USENET, is a decentralized, distributed application for exchanging messages. FidoNet was created in 1984 by Tom Jennings as a way to exchange messages between users of different BBS systems. Because it filled a need, it quickly grew and, like USENET, it remains in use today. Both USENET and FidoNet are interesting because both of them, years ago, ran into and overcame many of the problems that modern P2P applications are running into today, with scalability being the most obvious, but security and a host of other problems being addressed as well. For P2P computing to succeed, its proponents must be willing to learn from history.
The technological landscape
Now that we've looked at the origins of P2P, let's play the categorization and classification game. I think almost everyone would agree that P2P computing is a subset of distributed computing. I think almost everyone would also agree that not all distributed computing is peer-to-peer computing. The name "peer-to-peer" suggests an egalitarian relationship between peers and, more importantly, suggests direct interactions between peers. P2P applications consist of a number of peers, each performing a specific role in the P2P network, in communication with each other. Typically, the number of peers is large and the number of different roles is small. These two factors explain why most P2P applications
are characterized by massive parallelization in function. The best example that most of you will be familiar with is the Gnutella network, which consists of a large number of essentially identical peers. In P2P applications, the interesting problems lie in the interaction between the peers and, to a lesser extent, in the peers themselves. The problems to be solved in P2P computing overlap to a considerable degree with the problems faced in distributed computing -- coordinating and monitoring the activities of independent nodes and ensuring robust, reliable communication between nodes. But not all distributed computing is P2P computing. Distributed applications like SETI@home and the various distributed.net projects exhibit little interesting peer-to-peer interaction, and are therefore not really P2P, according to the definition above. However, because of the overlap in the problem set, it's worth learning about how they work; we'll look at them in a later column. Incidentally, if you need a name for distributed applications that aren't really peer-to-peer applications, I suggest peer-oriented.
The world's simplest peer (and what's wrong with it)
Building a rudimentary P2P application in the Java language is almost too easy, so I will leave you with one to play with. I provide this application to make a point. It's easy to shove files and messages around in a network. It's difficult to built a robust platform for P2P computing. My P2P application lacks important features. Some of these missing features (security, for example) are lacking in most popular P2P applications available today. Other missing features (message routing and distributed queries) are available in one or more applications. Gnutella, for example, supports both simple message routing and simple distributed queries. Figure 1 illustrates the general design of the application. I didn't want to limit the scope to file sharing, so this P2P application manages interactions with abstract resources, represented by the Resource interface. A resource can be anything addressable -- a filesystem, a database, a directory, a phone book. Figure 1. The broad design

The MessageServer class is the heart of the application. This class accepts connections from other peers and routes their messages to the appropriate resource. The class files are stored in p2p.jar (see Resources to download this file). You start the application from the command line by entering java -jar p2p.jar. The application looks for a properties file named p2p.properties in the directory from which you launched the application. The properties file defines the resources to load and the peers the application knows about. The jar file contains a sample properties file suitable for editing. A user interacts with the application through a simple command-line interface, as shown in Figure 2. The prompt displays the selected peer (if one has been selected) and the remote resource being accessed (if one is being accessed). Figure 2. Initial screen and prompt

At any time, the user can get a list of options by entering a question mark (?), as shown in Figure 3. If the user has not selected a peer, this action displays a list of known peers. Otherwise it displays a list of the resources available on that peer. Peers and resources are selected by entering their name. Figure 3. Listing peers and resources

In Figure 3, the user selected a peer named "guppy" and then displayed the resources available on guppy. After selecting the peer and a resource, the user can then browse and access objects managed by that resource, as shown in Figure 4. In the case of a filesystem resource, for example, accessing an object would cause that object to be copied from the remote peer to the local peer. Figure 4. Listing managed objects

At any time, the user can deselect a resource or a peer by typing a double period (..). Figure 5. Navigating

Conclusion
A real P2P application requires much more than my simple application provides. My application needs at least the following elements to be fully functional: proper security including authentication and authorization; reliable message routing and delivery; content and resource management; distributed queries; and naming. We will talk more about these requirements in future columns and, as always, will provide live examples and working code. I will begin next month with security.
Resources
About the author  | |  | Todd Sundsted has been writing programs since computers became available in desktop models. Though originally interested in building distributed applications in C++, Todd moved on to the Java programming language when it became the obvious choice for that sort of thing. In addition to writing, Todd is cofounder and chief architect of PointFire, Inc. Contact Todd at todd-p2p@etcee.com. |
Rate this page
|  |