 | Level: Introductory Aashish Patil (aashishp@usc.edu), M.S. candidate, Department of Computer Science, University of Southern
California
01 Sep 2001 When you hear the word synchronization, you probably imagine synching your Palm OS or Pocket PC device with a desktop machine. But the synchronization process is crucial to the success of most forms of mobile and pervasive computing. In this article, Aashish Patil explains exactly what synching means, demonstrates various techniques of synching data, and outlines different levels of synching. We will see how wireless network characteristics affect the process of synching and, ultimately, the user and the developer.
Pervasive computing
Trends in the computing industry are moving us toward a future in
which users expect almost any kind of tangible device to be networked. While
we haven't reached that state yet, we already have access to a lot of computing
devices that can be carried around and that possess sufficient computing
power to log on to networks: personal digital assistants, mobile phones,
laptops, etc. And the devices are only getting smaller: Current industry
trials are testing the feasibility of wearable computers. Users who own
these devices expect to be able to access networked data from almost any
place and in almost any situation. And since these users would find a physical,
wired connection to a network impossibly cumbersome, in all probability
they will access network data wirelessly. This is what is popularly known
as pervasive computing (PvC). Pervasive computing networks are huge
distributed systems whose characteristics differ slightly from traditional
distributed systems.
 |
Synchronization and the disconnected approach
Certain features are characteristic of wireless networks: high latency,
high costs for a connection (e.g., cellular networks), a network connection
that is sometimes very poor or intermittent, and low bandwidth. Although
there are exceptions, the devices that participate in these networks are
usually resource poor, with low memory, low storage capacity, a weak processor,
and dependent on a finite power source such as a battery. All these characteristics -- both of networks and devices -- mean that
developers and users cannot expect the connectivity of PvC devices to a
network to be continuous. There will be times when these devices will work
in isolation from the network -- when the network connection is poor or
nonexistent, for instance. However, a user will continue doing work during
such periods of poor connectivity. This kind of work in isolation is usually
called the disconnected approach, and is standard operating procedure
for PDA users. Once a network connection has been restored, our user will merge his
work with the main storage system. There will probably be other people
who were doing concurrent work on the same data in isolation. When they
merge their work with the main storage system, conflicts will arise and
must be resolved. This process of merging individual data sets and producing
a conflict-free main storage can be called
data synchronization.
When to synchronize?
Pervasive devices need not always be in isolation. Sometimes the devices
will be connected to the network and performing changes online. In this
case, synchronization will take place as a continuous process between the
device and the central storage. However, there might be another device
working concurrently, in isolation, on the same data set. In that case,
the conflicts will be resolved later, when the isolated device syncs with
the central store. The user who was initially connected to the data store
might get a notifications stating that some of his changes have been invalidated
due to conflicts. A user working in isolation can explicitly connect a device to the network
and request synchronization. This is how most PDAs synchronize: Users physically
attach a PDA to its cradle and then request a sync. However, synchronization
can also be done transparently. In this case, the device is responsible
for detecting a network connection and initiating a sync. When faced with
an intermittent network connection, it is the device's responsibility to
switch between online and isolated modes; the user shouldn't have to notice
whether he's working in isolation or online. He should continue doing his
work as usual. This is the kind of sync that will be needed for pervasive
computing to succeed. However, this process does introduce complexity in
software and thus increases demands on the resources of the pervasive device.
How to synchronize
For a disconnected or intermittently connected system, the central
data store must be in some way replicated on the pervasive device. Though
it would be convenient to have local access to the entire data store, resource
constraints usually mean that only part will be available on the device.
If the device loses its network connection, the user can continue to work
on the local copy of the data store. The local copy is called a replica,
and the process of replication is essential for pervasive computing
to work. The changes made on the replica are usually noted in a change
log. However, there may be other methods to resolve conflicts. Timestamps
can be used to note the time at which the update was made to the local
replica. This timestamp is used to resolve conflicting changes to the central
data store. One important thing to keep in mind when using timestamps:
The time of the device must be adjusted to the main store's time clock
and the timestamps evaluated in light of this time.
This
could be done by normalizing the client's time clock with respect to the
servers and storing this value in a server side client-proxy proxy, so
that the client does not have to renormalize everytime he comes out of
isolation and resyncs with the server. The process of normalization could
be carried out during the initial client login. You could implement the
proxy as a thread within the main server side sync module process (module
explained below). You could imagine this proxy to be an extension of the
client on the server side. An expiry time has to be attached for a proxy
and, hence, all the information it stores. Otherwise, if the client gets
disconnected for a long period of time, the proxy could be unnecessarily
occupying server resources. To serialize access to the central data store, locks can be used also.
However, considering the nature of a wireless network, it is dangerous
to lock the data store from the client itself. A situation might arise
in which the client locks a data item and becomes isolated from the network
for a long period of time, either voluntarily or automatically, due to
a nonexistent network connection. In this case, the locked data item will
remain inaccessible to other clients. Instead of the client locking the
database, a proxy for the client on the server side could do the locking.
This scheme is similar to a multiered application. If synchronization is taking place online, what happens to the replica?
Synchronization is usually a bandwidth-intensive process; hence, it can
be a background process instead of the primary one. Only a part of the
bandwidth is consumed by the synching process, and the replica is synced in
a piecemeal manner. Parts of the replica will remain unchanged until their
turn arrives to be synced. The user will continue to make changes to the
replica simultaneously. This can be particularly useful for a poor or intermittent
network connection. It is the responsibility of the sync engine to adapt
to the changes in the network connection. Many popular database manufacturers have developed small-footprint
versions of their databases, such as IBM's DB2 Everyplace and Oracle's
Oracle Lite. These databases fit on a pervasive device and act as a replica
for the main central database. Syncing software is also provided for syncing
with the central database. For any application that you write, you must synchronize those data
items that can only be interpreted by your application. For example, if
you are using C++ or Java code, then you will have data items represented by
classes in these particular languages. These data items will need to be
synced with the central data store; a filesystem cannot understand the
meaning of these classes. You must take care of the syncing of these data
items. If you are using a database and these objects have been mapped to
entities in a database, it is the responsibility of the database system
to sync these. Your application needs to monitor the network connection as well. The
status of the connection will determine whether the app syncs data or caches
it on the local store. Although, as noted above, syncing should generally
be a background activity, you can make it a primary one for your application
if you feel that immediate data synching is critical. For instance, if
the network connection is typically only good for short periods of time,
you would want to make the best use of the bandwidth available. An app
in such a scenario should sync whenever possible so that the user has a
fresh local replica during the next period of disconnenction. You do not have to support disconnected operations; your application
could, for instance, tell the user that he is disconnected from the wireless
network and must stop working. However, considering the characteristics
of a wireless network, I feel that support for disconnected operations
is essential. It does not make sense to prompt the user to stop working
because he is disconnected, then ask him to resume a few minutes later.
The user should get the illusion of uninterrupted operation in the presence
of a poor or nonexistent network connection. You will also need a module on the server, which implements your application specific syncing code. With
DB2, for example, the DB2 Sync Server fulfills this purpose. Remember that DB2 is an application too; its designers have also written code to sync the DB2 data and have developed the DB2 Sync Server as a server side syncing module for the client. DB2 is designed to sync the data it understands, i.e. tables,records,etc.
Levels of syncing
There are different levels of syncing. On one level, the filesystem
is synced and conflicts to the filesystem are resolved. An example is the
Coda filesystem. Coda caches entire files or directories. There is a client
manager called Venus which handles all the remote filesystem calls and
uses the local filesystem as a cache. Coda stresses availability and hence
has strong support for disconnected operations. This filesystem does not
have good support for database management systems, however. However, there may be times when more than just filesystem conflicts
need to be resolved. For example, if you have a database management system
(DBMS) like DB2, concurrent writes to the same location in main storage
aren't the only sources of conflict. Let's say that two or more isolated
pervasive devices are automatically generating primary keys -- that will
cause a problem as well; the filesystem will not know what to do with duplicate
primary keys. The DBMS provides special software that resolves these conflicts.
A typical DBMS could generate a new primary key for one of the conflicting
keys and update the database transparently from the user, or it could inform
the user of the conflict and ask him to provide a new key. The DB2 Sync
Server is one that takes care of this sort of conflict. Every application will have a different definition of a conflict. When
you build an app, you must provide special software for resolving these
conflicts.
A common protocol
The different levels of syncing that are possible, diversity of applications
that need to be synced, and diversity of devices that these applications
must support for syncing necessitate the use of a common protocol for syncing.
This protocol should work on the different networks available and support
the resource-poor nature of pervasive devices; it must also support all
possible representations of data. SyncML is such an initiative propounded
by Ericsson, IBM, Palm, Motorola, Psion, and Nokia, and you can find more
information about it in the Resources section
below.
Synchronization, the user, and the developer
The ability to work in isolation is very convenient for the user. There
may be periods when the user does not need a continuous, costly connection
and would prefer to work in isolation, only connecting to the network for
the purposes of syncing. Although the disconnected approach is very convenient,
the user should not remain in isolation for long and should try to periodically
sync his replica with the central storage. Otherwise, he may remain under
the illusion that all his updates are being committed when they are not.
At the end of a long period of time, when he does sync, he may get a host
of conflicts and many of the changes he thought he had committed might
be rolled back. Another disadvantage of working in isolation is that the
user does not know the real state of the central data store. What if the
user is simply reading data from the local replica and not updating it?
If he remains disconnected for a long period of time, there is a good chance
that the data he is reading is stale. Your application could decide on
some method of informing the user of this possibility and ask him to try
to sync his machine with the central data store as soon as possible.
The application developer could define an isolation
interval.This means that it is safe for a user to work in isolation for
the duration of this interval. After the expiry of the interval, the changes
that the user makes are likely to produce conflicts. This does not mean
that there will be no conflicts during the isolation interval, but that
the probability of conflicts increases considerably after the expiry of
this isolation interval. The isolation interval is highly application-specific.
For an application whose rate of data change is high, the interval will
have to be small.
An example of this is when there are a number
of people working online, on a collaborative project, via their PDAs.
For such an application, the rate of data change is high because users
are continuously generating new data. Thus, if a user remains in isolation
for a considerable amount of time, say ten minutes, he is risking the possibility
of reading stale data; also, his changes or suggestions are not reaching
the entire group. If synchronization is being handled transparently, then
the user will never know that he is disconnected from the network; thus,
when the connection is re-established after an extended period of time,
he will get a lot of conflicts. This is very annoying for the user because
he has spent all this time possibly basing suggestions on the stale data
available to him. Thus, he should be provided some form of feedback. This
feedback could come after the expiry of the isolation interval, telling
him that any work he does may not be reflected. Feedback in this case is
more useful and necessary than transparency.
Now consider another example. Suppose stock inventory
is being carried out in a supermarket. In this case, one or more of the
employees could simply take their PDA, perform the inventory, and later
sync the data. The chances of a conflict occurring in this example are very
low, if not impossible. The syncing could have been done transparently,
too, and a disrupted connection would not have affected the consistency
of the central data store at all.
One problem that could arise if the isolation
interval is very small (say, a few seconds), is that the feedback to the
user can become a nuisance to him. What could be done is to initially prompt
the user after maybe n quantums of the isolation interval, and as time
progresses the number of quantums of isolation intervals prompting the
user are reduced, until it finally becomes a single quantum. Imagine the
number of quantums to be decaying like the decay process in radioactivity.
This approach may not be applicable if the isolation interval is large.
Another problem that could arise with an isolation
interval is as follows. Suppose a single database is being used to support
different applications that can have varying isolation intervals. In that
case a different isolation interval should be managed for every application.
A proxy for the client on the server side could do this. This could be
the same proxy mentioned in the paragraph above.
Security is a concern too. How long do you maintain
the server-side client proxy, and how often does the client have to authenticate?
A good approach is that the client authenticates every time he comes out
of isolation or the isolation interval. The setback here is similar
to that faced by user feedback for a small isolation interval. If the isolation
interval is small, the resource- and time-consuming authentication will
be taking place too often. Again, the decaying quantum process just described
could be used. The client does not necessarily refer to the physical user,
but to the client side process too.
A characteristic feature of these types of systems
occurs at the data storage level as well. There could be some clients who
are working on a data copy in isolation, while others are performing online
syncing of the same data copy. If the server commits changes received from
the online client, then it could be doing injustice to the client working
in isolation. Maybe the client in isolation has performed the changes before
the client performing online syncing and, due to a nonexistent network
connection, he cannot sync with the server. So what could be done about
this? A possible solution is that the server does not commit changes on
those objects that are replicated multiple times on different client machines.
It keeps all changes as pending. It could perform commits every timet
t = isolation interval units, and could be defined to be a commit cycle.
The assumption this solution makes is that all objects for whom commits
have been kept pending have the same isolation interval -- they belong
to the same application.
The isolation intervals of different clients could
overlap with one another as well as the servers. What the client considers
the end of an isolation interval may be the midpoint of the isolation interval
at the server. The starting points of the timers for the client and the
server could be approximately synchronized during client logon. The server
could add some additional buffer time to its isolation interval time t.
Thus, t would now be t = isolation interval time + buffer time. The buffer
time is approximate and can be determined empirically. It is not possible
to achieve an extremely high degree of synchronization between the client
and server timer starting points.
What if a client were working in isolation and
could not come online before the expiry of the isolation interval, but
then came online and presented the server with a host of updates? If the
timestamps of the user's updates are later than the timestamp of the last
commit cycle, then they are kept pending for the next commit cycle, or else
they are discarded. It does not seem very appealing or feasible to perform
a rollback and to recommit all changes with respect to the newly arrived
updates.
Conclusion
Developing applications to maintain synchronization in a wireless network
is a tedious task, but an essential one, in a pervasive computing environment.
If your application does not require special data representation, then
a DBMS could be used to considerably ease the task of maintaining
synchronization. You could use a mixture of your own synchronization code
and DBMS too. If the DBMS provides APIs to access its sync engine, then
you can use those to sync your special data. But the important thing is
not to neglect synchronization when designing your PvC application.
With future wireless networks moving towards the
"always on" approach (for example, GPRS and 3G) , syncing may not be that much
of a problem, especially given the probability that the user having to
remain in extended periods of involuntarily isolation may be reduced
considerably. But it is still prudent to prepare for the worst-case scenario.
Resources
About the author  | |  | Aashish Patil recently received a bachelor's degree in computer engineering from Thadomal Shahani Engineering College in Mumbai, India, and completed a project on WAP-enabled stock trading as a trainee at Tata Consultancy Services. He is currently pursuing a master's degree in computer science at the University of Southern California. You can contact Aashish at aashishp@usc.edu.
|
Rate this page
|  |