Skip to main content

skip to main content

developerWorks  >  Wireless  >

Sync traps

A good synchronization process is the key to a usable PvC app

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Aashish Patil (aashishp@usc.edu), M.S. candidate, Department of Computer Science, University of Southern California

01 Sep 2001

When you hear the word synchronization, you probably imagine synching your Palm OS or Pocket PC device with a desktop machine. But the synchronization process is crucial to the success of most forms of mobile and pervasive computing. In this article, Aashish Patil explains exactly what synching means, demonstrates various techniques of synching data, and outlines different levels of synching. We will see how wireless network characteristics affect the process of synching and, ultimately, the user and the developer.

Pervasive computing

Trends in the computing industry are moving us toward a future in which users expect almost any kind of tangible device to be networked. While we haven't reached that state yet, we already have access to a lot of computing devices that can be carried around and that possess sufficient computing power to log on to networks: personal digital assistants, mobile phones, laptops, etc. And the devices are only getting smaller: Current industry trials are testing the feasibility of wearable computers. Users who own these devices expect to be able to access networked data from almost any place and in almost any situation. And since these users would find a physical, wired connection to a network impossibly cumbersome, in all probability they will access network data wirelessly. This is what is popularly known as pervasive computing (PvC). Pervasive computing networks are huge distributed systems whose characteristics differ slightly from traditional distributed systems.



Back to top


Synchronization and the disconnected approach

Certain features are characteristic of wireless networks: high latency, high costs for a connection (e.g., cellular networks), a network connection that is sometimes very poor or intermittent, and low bandwidth. Although there are exceptions, the devices that participate in these networks are usually resource poor, with low memory, low storage capacity, a weak processor, and dependent on a finite power source such as a battery.

All these characteristics -- both of networks and devices -- mean that developers and users cannot expect the connectivity of PvC devices to a network to be continuous. There will be times when these devices will work in isolation from the network -- when the network connection is poor or nonexistent, for instance. However, a user will continue doing work during such periods of poor connectivity. This kind of work in isolation is usually called the disconnected approach, and is standard operating procedure for PDA users.

Once a network connection has been restored, our user will merge his work with the main storage system. There will probably be other people who were doing concurrent work on the same data in isolation. When they merge their work with the main storage system, conflicts will arise and must be resolved. This process of merging individual data sets and producing a conflict-free main storage can be called data synchronization.



Back to top


When to synchronize?

Pervasive devices need not always be in isolation. Sometimes the devices will be connected to the network and performing changes online. In this case, synchronization will take place as a continuous process between the device and the central storage. However, there might be another device working concurrently, in isolation, on the same data set. In that case, the conflicts will be resolved later, when the isolated device syncs with the central store. The user who was initially connected to the data store might get a notifications stating that some of his changes have been invalidated due to conflicts.

A user working in isolation can explicitly connect a device to the network and request synchronization. This is how most PDAs synchronize: Users physically attach a PDA to its cradle and then request a sync. However, synchronization can also be done transparently. In this case, the device is responsible for detecting a network connection and initiating a sync. When faced with an intermittent network connection, it is the device's responsibility to switch between online and isolated modes; the user shouldn't have to notice whether he's working in isolation or online. He should continue doing his work as usual. This is the kind of sync that will be needed for pervasive computing to succeed. However, this process does introduce complexity in software and thus increases demands on the resources of the pervasive device.



Back to top


How to synchronize

For a disconnected or intermittently connected system, the central data store must be in some way replicated on the pervasive device. Though it would be convenient to have local access to the entire data store, resource constraints usually mean that only part will be available on the device. If the device loses its network connection, the user can continue to work on the local copy of the data store. The local copy is called a replica, and the process of replication is essential for pervasive computing to work. The changes made on the replica are usually noted in a change log. However, there may be other methods to resolve conflicts. Timestamps can be used to note the time at which the update was made to the local replica. This timestamp is used to resolve conflicting changes to the central data store. One important thing to keep in mind when using timestamps: The time of the device must be adjusted to the main store's time clock and the timestamps evaluated in light of this time. This could be done by normalizing the client's time clock with respect to the servers and storing this value in a server side client-proxy proxy, so that the client does not have to renormalize everytime he comes out of isolation and resyncs with the server. The process of normalization could be carried out during the initial client login. You could implement the proxy as a thread within the main server side sync module process (module explained below). You could imagine this proxy to be an extension of the client on the server side. An expiry time has to be attached for a proxy and, hence, all the information it stores. Otherwise, if the client gets disconnected for a long period of time, the proxy could be unnecessarily occupying server resources.

To serialize access to the central data store, locks can be used also. However, considering the nature of a wireless network, it is dangerous to lock the data store from the client itself. A situation might arise in which the client locks a data item and becomes isolated from the network for a long period of time, either voluntarily or automatically, due to a nonexistent network connection. In this case, the locked data item will remain inaccessible to other clients. Instead of the client locking the database, a proxy for the client on the server side could do the locking. This scheme is similar to a multiered application.

If synchronization is taking place online, what happens to the replica? Synchronization is usually a bandwidth-intensive process; hence, it can be a background process instead of the primary one. Only a part of the bandwidth is consumed by the synching process, and the replica is synced in a piecemeal manner. Parts of the replica will remain unchanged until their turn arrives to be synced. The user will continue to make changes to the replica simultaneously. This can be particularly useful for a poor or intermittent network connection. It is the responsibility of the sync engine to adapt to the changes in the network connection.

Many popular database manufacturers have developed small-footprint versions of their databases, such as IBM's DB2 Everyplace and Oracle's Oracle Lite. These databases fit on a pervasive device and act as a replica for the main central database. Syncing software is also provided for syncing with the central database.

For any application that you write, you must synchronize those data items that can only be interpreted by your application. For example, if you are using C++ or Java code, then you will have data items represented by classes in these particular languages. These data items will need to be synced with the central data store; a filesystem cannot understand the meaning of these classes. You must take care of the syncing of these data items. If you are using a database and these objects have been mapped to entities in a database, it is the responsibility of the database system to sync these.

Your application needs to monitor the network connection as well. The status of the connection will determine whether the app syncs data or caches it on the local store. Although, as noted above, syncing should generally be a background activity, you can make it a primary one for your application if you feel that immediate data synching is critical. For instance, if the network connection is typically only good for short periods of time, you would want to make the best use of the bandwidth available. An app in such a scenario should sync whenever possible so that the user has a fresh local replica during the next period of disconnenction.

You do not have to support disconnected operations; your application could, for instance, tell the user that he is disconnected from the wireless network and must stop working. However, considering the characteristics of a wireless network, I feel that support for disconnected operations is essential. It does not make sense to prompt the user to stop working because he is disconnected, then ask him to resume a few minutes later. The user should get the illusion of uninterrupted operation in the presence of a poor or nonexistent network connection.

You will also need a module on the server, which implements your application specific syncing code. With DB2, for example, the DB2 Sync Server fulfills this purpose. Remember that DB2 is an application too; its designers have also written code to sync the DB2 data and have developed the DB2 Sync Server as a server side syncing module for the client. DB2 is designed to sync the data it understands, i.e. tables,records,etc.



Back to top


Levels of syncing

There are different levels of syncing. On one level, the filesystem is synced and conflicts to the filesystem are resolved. An example is the Coda filesystem. Coda caches entire files or directories. There is a client manager called Venus which handles all the remote filesystem calls and uses the local filesystem as a cache. Coda stresses availability and hence has strong support for disconnected operations. This filesystem does not have good support for database management systems, however.

However, there may be times when more than just filesystem conflicts need to be resolved. For example, if you have a database management system (DBMS) like DB2, concurrent writes to the same location in main storage aren't the only sources of conflict. Let's say that two or more isolated pervasive devices are automatically generating primary keys -- that will cause a problem as well; the filesystem will not know what to do with duplicate primary keys. The DBMS provides special software that resolves these conflicts. A typical DBMS could generate a new primary key for one of the conflicting keys and update the database transparently from the user, or it could inform the user of the conflict and ask him to provide a new key. The DB2 Sync Server is one that takes care of this sort of conflict.

Every application will have a different definition of a conflict. When you build an app, you must provide special software for resolving these conflicts.



Back to top


A common protocol

The different levels of syncing that are possible, diversity of applications that need to be synced, and diversity of devices that these applications must support for syncing necessitate the use of a common protocol for syncing. This protocol should work on the different networks available and support the resource-poor nature of pervasive devices; it must also support all possible representations of data. SyncML is such an initiative propounded by Ericsson, IBM, Palm, Motorola, Psion, and Nokia, and you can find more information about it in the Resources section below.



Back to top


Synchronization, the user, and the developer

The ability to work in isolation is very convenient for the user. There may be periods when the user does not need a continuous, costly connection and would prefer to work in isolation, only connecting to the network for the purposes of syncing. Although the disconnected approach is very convenient, the user should not remain in isolation for long and should try to periodically sync his replica with the central storage. Otherwise, he may remain under the illusion that all his updates are being committed when they are not. At the end of a long period of time, when he does sync, he may get a host of conflicts and many of the changes he thought he had committed might be rolled back. Another disadvantage of working in isolation is that the user does not know the real state of the central data store. What if the user is simply reading data from the local replica and not updating it? If he remains disconnected for a long period of time, there is a good chance that the data he is reading is stale. Your application could decide on some method of informing the user of this possibility and ask him to try to sync his machine with the central data store as soon as possible.

The application developer could define an isolation interval.This means that it is safe for a user to work in isolation for the duration of this interval. After the expiry of the interval, the changes that the user makes are likely to produce conflicts. This does not mean that there will be no conflicts during the isolation interval, but that the probability of conflicts increases considerably after the expiry of this isolation interval. The isolation interval is highly application-specific. For an application whose rate of data change is high, the interval will have to be small.

An example of this is when there are a number of people working online, on a collaborative project, via their PDAs. For such an application, the rate of data change is high because users are continuously generating new data. Thus, if a user remains in isolation for a considerable amount of time, say ten minutes, he is risking the possibility of reading stale data; also, his changes or suggestions are not reaching the entire group. If synchronization is being handled transparently, then the user will never know that he is disconnected from the network; thus, when the connection is re-established after an extended period of time, he will get a lot of conflicts. This is very annoying for the user because he has spent all this time possibly basing suggestions on the stale data available to him. Thus, he should be provided some form of feedback. This feedback could come after the expiry of the isolation interval, telling him that any work he does may not be reflected. Feedback in this case is more useful and necessary than transparency.

Now consider another example. Suppose stock inventory is being carried out in a supermarket. In this case, one or more of the employees could simply take their PDA, perform the inventory, and later sync the data. The chances of a conflict occurring in this example are very low, if not impossible. The syncing could have been done transparently, too, and a disrupted connection would not have affected the consistency of the central data store at all.

One problem that could arise if the isolation interval is very small (say, a few seconds), is that the feedback to the user can become a nuisance to him. What could be done is to initially prompt the user after maybe n quantums of the isolation interval, and as time progresses the number of quantums of isolation intervals prompting the user are reduced, until it finally becomes a single quantum. Imagine the number of quantums to be decaying like the decay process in radioactivity. This approach may not be applicable if the isolation interval is large.

Another problem that could arise with an isolation interval is as follows. Suppose a single database is being used to support different applications that can have varying isolation intervals. In that case a different isolation interval should be managed for every application. A proxy for the client on the server side could do this. This could be the same proxy mentioned in the paragraph above.

Security is a concern too. How long do you maintain the server-side client proxy, and how often does the client have to authenticate? A good approach is that the client authenticates every time he comes out of isolation or the isolation interval. The setback here is similar to that faced by user feedback for a small isolation interval. If the isolation interval is small, the resource- and time-consuming authentication will be taking place too often. Again, the decaying quantum process just described could be used. The client does not necessarily refer to the physical user, but to the client side process too.

A characteristic feature of these types of systems occurs at the data storage level as well. There could be some clients who are working on a data copy in isolation, while others are performing online syncing of the same data copy. If the server commits changes received from the online client, then it could be doing injustice to the client working in isolation. Maybe the client in isolation has performed the changes before the client performing online syncing and, due to a nonexistent network connection, he cannot sync with the server. So what could be done about this? A possible solution is that the server does not commit changes on those objects that are replicated multiple times on different client machines. It keeps all changes as pending. It could perform commits every timet t = isolation interval units, and could be defined to be a commit cycle. The assumption this solution makes is that all objects for whom commits have been kept pending have the same isolation interval -- they belong to the same application.

The isolation intervals of different clients could overlap with one another as well as the servers. What the client considers the end of an isolation interval may be the midpoint of the isolation interval at the server. The starting points of the timers for the client and the server could be approximately synchronized during client logon. The server could add some additional buffer time to its isolation interval time t. Thus, t would now be t = isolation interval time + buffer time. The buffer time is approximate and can be determined empirically. It is not possible to achieve an extremely high degree of synchronization between the client and server timer starting points.

What if a client were working in isolation and could not come online before the expiry of the isolation interval, but then came online and presented the server with a host of updates? If the timestamps of the user's updates are later than the timestamp of the last commit cycle, then they are kept pending for the next commit cycle, or else they are discarded. It does not seem very appealing or feasible to perform a rollback and to recommit all changes with respect to the newly arrived updates.



Back to top


Conclusion

Developing applications to maintain synchronization in a wireless network is a tedious task, but an essential one, in a pervasive computing environment. If your application does not require special data representation, then a DBMS could be used to considerably ease the task of maintaining synchronization. You could use a mixture of your own synchronization code and DBMS too. If the DBMS provides APIs to access its sync engine, then you can use those to sync your special data. But the important thing is not to neglect synchronization when designing your PvC application.

With future wireless networks moving towards the "always on" approach (for example, GPRS and 3G) , syncing may not be that much of a problem, especially given the probability that the user having to remain in extended periods of involuntarily isolation may be reduced considerably. But it is still prudent to prepare for the worst-case scenario.



Resources



About the author

Aashish Patil recently received a bachelor's degree in computer engineering from Thadomal Shahani Engineering College in Mumbai, India, and completed a project on WAP-enabled stock trading as a trainee at Tata Consultancy Services. He is currently pursuing a master's degree in computer science at the University of Southern California. You can contact Aashish at aashishp@usc.edu.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top