Say a file server has a really popular file that everyone wants to download all at once. (My latest blog posting?! No, something important, and huge, like the latest release of RedHat Linux.) How can a server (or even a cluster) possibly have enough bandwidth? Who wants to pay for all those servers and network connections?
The answer is "download swarming." Swarming enables a server that's being hit with numerous requests for the same file to upload that file with more bandwidth than the server actually has. Impossible, right?
A swarming download is first of all a segmented download, whereby the server breaks the file into parts; parts can be downloaded concurrently and enable an interrupted download to be resumed. Swarming takes segments one step further: The way a swarming download works is that as the client downloads segments, it must also upload segments to other clients that are downloading the file. (The main server redirects new clients to these established clients.) Thus a swarming download has a multiplier effect: the original server uploads to a few clients, which in turn upload to more clients, and so on.
Why should clients upload and not just download? Because upload throughput regulates download throughput; you have to upload more to download more. As one FAQ puts it: "You could hack the source to not upload, but then your download rate would suck. Downloaders engage in tit-for-tat with their peers, so leeches have very little success downloading."
There's an open-source program, BitTorrent, which is all the rage for swarming. It's the brainchild of Bram Cohen, recently praised in "Downloading Hollywood." According to the docs, "Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the downloaders upload to each other, making it possible for the file source to support very large numbers of downloaders with only a modest increase in its load." Wikipedia has an incredibly thorough write-up with pictures and everything. (Although another page then seems to say that segmented downloading and swarming downloading are the same thing. Oops.)
A similar effort is Peer Distributed Transfer Protocol (PDTP): "PDTP decreases the amount of bandwidth a server needs to effectively serve files to a large number of clients by having the clients distribute portions of files to each other whenever possible." Interestingly, Onion Networks claims to have invented the swarming approach to file transfer and to have multiple pending patents.
As usual, the collective wisdom of Slashdot was on this years ago: Finally Real P2P With Brains. It's potentially a way to help handle the Slashdot effect. (Again, a problem no one accuses my blog of ever having caused.)
IBM has Download Director, a Java applet that performs segmented downloads transparently. But it doesn't do swarming (nor would customers want that, I suspect; "I have to upload to download?!"). IBM also has an experimental grid downloader, creatively called "downloadGrid," that runs on an experimental grid computing network called "intraGrid." The grid approach is not just downloading segments concurrently from one server, but from lots of servers, whichever currently can serve you best. But that's still not swarming.
For more information about grid computing, visit the developerWorks Grid Computing zone. We don't have a Swarming Computing zone, at least not yet.
Bobby Woolf 120000HQ53 391 Visits