Skip to main content

skip to main content

developerWorks  >  Tivoli  >

Intermediate scalability with the IBM Tivoli Directory Proxy Server

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Ramakrishna Gorthi (rjgorthi@in.ibm.com), Staff Software Engineer, IBM India Software Lab, Pune
Darshan Donni (darshan_donni@in.ibm.com), Staff Software Engineer, IBM India Software Lab, Pune

23 Jul 2007

The IBM® Tivoli® Directory Proxy Server acts as a layer of abstraction over data distributed across multiple directory servers. There are several ways of distributing data across a set of back-end directory servers. This article is intended to highlight the pros and cons of the subtree based distribution of data.

Introduction

IBM introduced the proxy server in version 6 of the directory server (TDS). The proxy server is the most essential part of a distributed directory topology. For a given distributed topology, the data splitting mechanism depends on the type of data being stored. This article discusses scenarios wherein subtree-based splitting proves more beneficial compared to RDN Hash-based splitting. This article also describes the steps for setting up a subtree-based distributed directory.

The overall structure of the article is as follows:

  • Introduction to LDAP Proxy Servers
  • The IBM Tivoli Directory Proxy Server
  • Generic terms and concepts associated with TDS Proxy Server
  • Scenarios where subtree-based splitting is required
  • Benefits of subtree-based split over RDN Hash-based split
  • Drawbacks of subtree based split with regards RDN Hash-based split
  • Subtree-based splitting using the TDS 6.0 Proxy Server
  • Advanced proxy scenarios

Tabulated below are some acronyms used throughout the article. These may/may not be the official names associated with the respective products.

LDAP:Lightweight Directory Access Protocol
TAM:IBM Tivoli Access Manager
TDS:IBM Tivoli Directory Server
DIT:Directory Information Tree



Back to top


Introduction to LDAP proxy servers

A directory is basically a collection of objects arranged in a hierarchical structure. It is a data repository that enables users or applications to find resources needed for a particular task. A directory server is typically used in customer environments where the bulk of the transactions are read operations.

Customer requirements with regard to a directory server have changed with time. Now, a directory server is expected to store millions of entries. Storing millions of entries in a single directory server is quite likely to degrade the performance and at the same time introduce hardware scalability issues. Distributed directories were introduced to address these performance and scalability issues.

A proxy server sits in front of multiple directory servers and provides a layer of abstraction between the set of back-end directories and the clients. There are different ways of setting up a distributed directory topology, as mentioned earlier. The proxy server is configured to know the methodology of splitting the data. This makes it possible for the proxy server to fetch the requested data from the back-end directories and relay them to the clients. The interaction between the proxy server and the back-end directory servers is entirely transparent to the clients. A proxy server can also act as a load balancer or a fail-over manager.



Back to top


IBM Tivoli Directory Proxy Server

IBM Tivoli Directory Server (TDS) implements the Internet Engineering Task Force (IETF) LDAP V3 specifications. It also includes enhancements added by IBM in functional and performance-related areas. TDS uses IBM DB2® as the backing store to provide “per LDAP operation” transaction integrity, high performance operations and online backup and restore capabilities. Tivoli Directory Proxy Server was introduced as part of the 6.0 release of the IBM Tivoli Directory Server.

The key features of Tivoli Directory Proxy Server are:

  • Scalability: Scalability is an essential aspect of directory servers. One of the means of achieving scalability is to distribute directory entries over a set of directory servers, rather than confining them to a single directory server.
    There are different ways of setting up a distributed directory. The currently available mechanisms are:
    • RDN Hash-based splitting: A unique hash value can be derived for a given directory server entry. The hash value is computed on the basis of the RDN of the entry. In a distributed directory topology, this hash value maps to a specific directory server in the back end.
    • Subtree-based splitting: In this splitting mechanism, each subtree can be configured to reside on a separate directory server in the back end.
  • Abstraction: In a distributed directory topology, the proxy server acts as an abstraction over a set of directory servers in the back-end. The proxy server glues the set of directory servers in such a way that users have a single interface to work with. Access to the back-end directory servers is transparent to clients.

In addition to the above, the proxy server has been introduced to provide:
  • Namespace partitioning
  • A unified view of a distributed directory
  • An efficient routing of user requests in the distributed directory topology
  • Load-balancing and fail-over in the context of a server cluster
  • Distributed authentication

The TDS 6.0 Administration Guide can be used to get to know more about TDS 6.0 Proxy Server.



Back to top


Generic terms and concepts associated with TDS Proxy Server

This section lists some commonly used terms in the context of the TDS Proxy Server. Relevant configuration attributes are written within parenthesis where applicable.

  • Split: A given namespace is partitioned into a set of partitions, each of which resides in an independent directory server instance. Each of these partitions is referred to as a split. The namespace being split is referred to by ibm-slapdProxyPartitionBase, while the number of partitions for a given namespace is specified by ibm-slapdProxyNumPartitions.
  • Partition Index (ibm-slapdProxyPartitionIndex): Each partition/split for a given namespace is represented by an index known as the partition index. When resolving a given entry location, the proxy partitioning algorithm returns a hash value, which is then used to resolve the corresponding back-end server. This hash value can be regarded as an alias for partition index.
  • ServerGroup: For a given partition, replication is normally setup across the set of servers serving that partition. If one of the servers in the partition is running, the proxy marks the partition as active. ServerGroup is a means of specifying a set of servers, wherein if any of the servers is up, the proxy can mark the relevant partition as active, even if the rest of the servers in the group are down.
  • Global administrative group members: Global administrative group members are users who have been assigned the administrative privileges for accessing entries in the backend server. However, they have no privileges or access rights to any data or operations related to the configuration of the back-end directory server. All global administrative group members have the same set of privileges. For a directory administrator, the global administrative group is a way to delegate administrative rights in a distributed environment to the back-end directory data.
  • Local administrative group members: Local administrative group members are users who have been assigned a subset of administrative privileges. All the local administrative group members have the same set of privileges. For a directory administrator, this group is a means to delegate a limited set of administrative tasks to one or more individual user accounts. These users can perform most of the administrative tasks. Exceptions to these tasks are operations that might increase the privileges of those users, such as changing the password of the directory administrator or clearing the audit log.
  • Connection Pool Size (ibm-slapdProxyConnectionPoolSize): Each proxy can be configured to talk to each of the back-end servers over a set of connections. These connections are in the form of a pool, whereby all the connections are established at the proxy start-up and used when required. This parameter is configurable and can be different for different back-end servers.
  • Proxy DN (ibm-slapdProxyDN): This is the DN that a proxy server binds to the backend servers. This DN would basically proxy the user binding to the proxy server.
  • Proxy Target URL (ibm-slapdProxyTargetURL): This attribute is used by the proxy server to specify the URL of the back-end server.

Note: The user binding to the proxy server would be considered an administrator if it is either the directory administrator, or a global administrative group member, or a local administrative group member.



Back to top


Scenarios where subtree-based splitting is required

Earlier in the article, two different splitting mechanisms were mentioned. The more commonly used mechanism of the two is the RDN Hash-based splitting. A hash is generated for a given entry based upon three factors: The parent entry, the partition base, and the number of partitions. This hash value is mapped to a specific back-end directory server. When mapping 100 entries across two back-end servers, it's quite likely that, for a certain kind of data, 90 entries are mapped to one back-end directory and the remaining 10 are mapped to the second one. Performance is likely to be impacted in such scenarios, because we don't have an even split.

In the subtree-based split, the back-end servers are used as containers for an entire subtree. In other words, if the data is scattered across a set of namespaces, each namespace is mapped to a different directory sever. In the example mentioned above, suppose the DIT has the data spread across two subtrees, each with 50 entries. It's possible to map these two subtrees across two different back-end servers. With RDN Hash-based split, it is quite possible that in the same scenario both subtrees get mapped to a single back-end, that is, 100 entries on one back-end server and zero entries on the other. In this case, a subtree-based split is more suitable than a RDN Hash-based split, because we are getting an even split through subtree-based partitioning.

It's noteworthy that a subtree might not necessarily be a suffix. The 'n' subtrees below a given suffix can be spread over 'n' back-end directory servers. Hence, in this article, the word "namespace" will mean a subtree or a partition base, rather than a suffix.

The diagram below brings out the difference between the two splitting mechanisms more clearly. Consider that a directory service is configured with the following suffixes:

  • o=ibm,c=us
  • cn=ibmpolicies
  • secAuthority=default

The RDN Hashbased distributed directory looks like this:
Figure 1: RDN Hash-based distributed directory setup
RDN Hash-based distributed directory setup

Note: Each namespace is split into a set of partitions. Each partition resides on an independent directory server instance. A given entry will belong to one of the partitions based upon the generated hash value for the entry.

Also note that partition 1 of all the namespaces can be configured to reside in the same directory server instance.

On the other hand, a subtree-based distributed directory looks like this:
Figure 2: Subtree-based distributed directory setup
Subtree-based distributed directory setup

Note: Each namespace is considered as an independent partition and mapped to an independent directory server instance.

If entries are added under o=ibm,c=us in such a way that 90 percent of the RDNs conform to hash 1 (partition 1) and the remaining 10 percent conforms to hash 2 (partition 2), we have a fairly uneven split. In this case, for both the suffixes, we would have 90% of the entries out of the total data set residing in partition 1 and the rest in partition 2.

If , for example, 50 percent of the data, out of the entire data set, is to fall under o=ibm,c=us and 50 percent under secAuthority=default, it is better to store o=ibm,c=us on back-end server 1 and secAuthority=default on back-end server 2. This would provide a fairly even split and, consequently, better performance.

The next two sections list the pros and cons of these two splitting mechanisms.



Back to top


Benefits of subtree-based split over RDN Hash-based split.

A Subtree-based split has the following advantages over RDN Hash-based split:

  • Ease of Set up: In the case of subtree-based splitting, no RDN hashing is involved. Hence, it's simpler in predicting whether a given namespace can lead to an even split. Administrators need to map each subtree to a different back-end server.
  • Better capacity planning: Capacity planning is easier in case of subtree-based splitting compared to RDN hashing. Since each subtree is mapped to an independent back-end server, it's much easier to plan the number of partitions and the number of back-end servers.
  • Dynamic re-distribution of data: In case of subtree-based splitting, it's quite easy to introduce a new back-end directory server and change the proxy server configuration to accommodate the new server. However, in RDN Hashing, if a new back-end server is introduced, the entire setup would need to be done again, because RDN Hash for a given entry changes with the number of back-end servers. In the context of dynamic re-distribution of data, RDN Hash-based splitting is definitely costlier compared to subtree-based splitting.
  • Scalability: In case the namespace can be divided evenly on the basis of subtrees, subtree-based splitting scales better than RDN Hash-based splitting. The distribution isn't at the entry level but at the subtree level. Hence, if a new subtree is to be added to the distributed directory topology it can be mapped to a new back-end server. If an existing subtree becomes too large to be stored on a single server and if it's feasible to identify two branches that can reside on different back-end servers, the subtree can be split into these two branches. Each of these branches can be mapped to independent back-end servers.


Back to top


Drawbacks of a subtree-based split compared to a RDN Hash-based split.

A subtree-based split has the following drawbacks compared to a RDN Hash-based split:

  • Subtree-based distributed directories can't replace the RDN Hash-based distributed directories as the ultimate way to attain scalability. If the DIT is breadth centric and it has entries that can't be placed under discrete partition bases, going for a RDN Hash-based distributed directory setup is more beneficial.


Back to top


Subtree-based splitting using the TDS 6.0 Proxy Server

This section provides the steps to set up a subtree-based distributed directory using the TDS 6.0 Proxy Server. The setup is based on the Figure 2 shown above.

The setup is comprised of two parts:
  • Setting up the proxy server
  • Setting up the back-end directory servers

Note: The setup is explained assuming that the Web Administration Tool will be used.

Setting up the proxy server

The following steps can be used to setup the proxy server. It is assumed that the two back-end servers in the topology are referred to as Server A and Server B.

  1. Start the directory server instance that is to be used as the proxy server. Because this instance isn't yet configured with the back-end server information, it comes up in configuration mode.
  2. Log on to the proxy server using the Web Administration tool.
  3. In the navigation area, expand the Proxy administration section.
  4. Click Manage proxy properties.
  5. Click the Configure as proxy server check box.

    Configure server as proxy


  6. In the Suffix DN field, as shown in the figure above, enter cn=ibmpolicies and click Add.
  7. In the Suffix DN field, as shown in the figure above, enter o=ibm,c=us and click Add.
  8. In the Suffix DN field, as shown in the figure above, enter secAuthority=default and click Add.
    At this point, the Manage proxy properties window appears as shown:

    List of suffixes


  9. Click OK to save your changes and return to the Introduction Panel.
  10. In the navigation area, click Manage back-end directory servers.
  11. Select the action Add from the drop-down of Select Action and click Go.
  12. In the Hostname field, enter the hostname for Server A.
  13. Enter the port number for Server A.
  14. Enter the number of connections that the proxy server can have with the back-end server in the Connection Pool Size field. The minimum value is 1 and the maximum value is 100.
  15. In the Authentication Method field, specify Simple.

    Add backend server


  16. Click Next.
  17. In the Bind DN field, specify the administration DN or the DN of a member of the administration group. For example, cn=root
  18. In the Bind password field, specify and confirm the administration password.
  19. Click Finish.

    Add backend server credentials


  20. Repeat steps 10 through 19 for Server B.
  21. After finishing the steps above, click Close to save your changes and return to the Introduction panel.
  22. In the navigation area, click the Manage partition bases.
  23. On the Partition bases table, click Add.
  24. In the Partition DN field, enter cn=ibmpolicies.
  25. In the Number of partitions field enter 1.
  26. Click OK.

    Add partition base


  27. Select the radio button for cn=ibmpolicies and click View servers.
  28. Verify that cn=ibmpolicies is displayed in the Partition base DN field.
  29. In the back-end directory servers for the partition base table, click Add.
  30. From the Back-end directory server menu, select Server A.
  31. In the Partition Index field enter 1.
  32. Click OK.

    Backend server for cn=ibmpolicies


  33. Repeat 22 through 32 for the suffix o=ibm,c=us.
  34. Repeat 22 through 32 for the suffix secAuthority=default. The only difference here with regards to the earlier steps is that we need to use Server B as the back-end server rather than Server A.
  35. Restart the proxy server for the changes to take effect.

Note: We have essentially mapped cn=ibmpolicies and o=ibm,c=us to Server A and secAuthority=default to Server B.
It's a good practice to map cn=ibmpolicies to a single back-end server and set up this suffix for replication to all the servers in the distributed directory topology. Manual set up of the global admin group separately on each of the back-end servers is not required if replication is set up.

Setting up the back-end directory servers

The following steps can be used to set up the back-end servers:

  1. Start Server A and log on to the server by using the Web Administration tool. This is the server that is configured as the back-end for both cn=ibmpolicies and o=ibm,c=us.
  2. Add an entry cn=user1,cn=ibmpolicies with a password mysecret.
  3. In the navigation area, click Manage entries.
  4. Select the radio button for cn=ibmpolicies and click Expand.
  5. Select the radio button for globalGroupName=GlobalAdminGroup and from the Select action drop-down menu, select Manage members and click Go.

    Manage Global Admin Group members


  6. In the member field enter cn=user1,cn=ibmpolicies click Add.
  7. The following message is displayed: You have not loaded entries from the server. Only your changes will be displayed in the table. Do you want to continue? Click OK.
  8. cn=user1,cn=ibmpolicies is displayed in the table. Click OK. cn=user1,cn=ibmpolicies is now a member of the global administration group.
  9. The user cn=user1,cn=ibmpolicies need not be created in every back-end of a distributed directory topology. However, the global admin group on each server needs to be updated to include the membership of this user. The best way to deal with this, as mentioned earlier, is to set up replication between all the back-ends in the distributed directory topology for the subtree cn=ibmpolicies.

    Global Admin Group members

At this point the topology is setup as shown in Figure 2. The entries under the suffixes o=ibm,c=us and cn=ibmpolicies are stored in Server A. The entries under the suffix secAuthority=default are stored in Server B.



Back to top


Advanced proxy scenarios

The layout of a DIT varies from customer to customer. Whilst some DITs are depth-centric, others might be breadth-centric. In some cases, it's hard to categorize the DIT as breadth- or depth-centric. In such scenarios, a mix of subtree-based and RDN Hash-based setups is used. The following bullet points elaborate the steps for such a setup.

  • Hybrid Proxy Setup: One of the ways of mixing the subtree-based and the RDN Hash-based solutions would be to map some subtrees to a single unique back-end and partition the rest of the subtrees on the basis of a RDN Hash. In other words, depth-centric subtrees can be distributed across independent back-ends (subtree-based splitting), while the other subtrees can be spread on the basis of hashes (RDN Hash-based splitting). The pictorial representation of this hybrid setup is given below:

    Figure 3: Hybrid proxy setup
    Hybrid Proxy Setup

    As shown in the figure above, entries under the split o=ibm,c=us can belong to either Partition 1 or Partition 2, based upon the hash-value they resolve to. As far as entries under cn=ibmpolicies and secAuthority=default are concerned, they would go to a single back-end server, because each of these partition bases is mapped to just one back-end server.

  • Hybrid proxy Setup within the same subtree (Nested split): Taking a step further in the setup above, we can have a hybrid setup within the same subtree as well. As shown in Figure 4 below, the subtree o=ibm,c=us is distributed across two partitions on the basis of RDN Hashes. However, each of the subtrees ou=Tivoli,o=ibm,c=us and ou=DB2,o=ibm,c=us is mapped to a single independent back-end. Hence, effectively all the data under o=ibm,c=us except for that under the subtrees ou=Tivoli,o=ibm,c=us and ou=DB2,o=ibm,c=us, would be distributed on the basis of an RDN Hash.

    Figure 4: Hybrid proxy setup within the same subtree
    Hybrid proxy setup in single subtree


  • Hybrid proxy setup with replication: We can have a hybrid setup with failover capabilities. As shown in Figure 5 below, the subtree o=ibm,c=us is split across two partitions on the basis of RDN Hashes. Both the partitions are configured over replication, in a peer-peer topology. On similar lines, the partition for secAuthority=default is also configured for replication. If one of the peers for any given partition goes down, the proxy write requests would failover to the other active peer configured for the partition. This increases the availability of the servers in the distributed directory. We can have replication set up between the back-end servers, regardless of whether they are split on the basis of RDN Hash or on the basis of subtrees.
    Another advantage of setting up replication over a given partition is load balancing. The proxy server sends the read requests across the set of replicas for a given partition in a round robin manner. So, a single back-end server need not face all the read requests. They can be evenly distributed across the set of configured back-end servers.

    Figure 5: Hybrid proxy setup with replication
    Hybridp setup with replication


  • As per the current design, a distributed directory topology is set up on the basis of a single partitioning algorithm. We don't have a mechanism in place that says partition 1 uses algorithm 1 for partitioning, partition 2 uses algorithm 2 for partitioning, and so on. If such a thing is implemented, users would write their own custom partitioning plug-ins and attach different partitioning schemes to different subtrees to better load balance their data and requests.


Back to top


Conclusion

This document provides a detailed procedure for setting up a subtree-based distributed directory. Also, it discusses the pros and cons of choosing the subtree-based split over RDN Hash-based split. Furthermore, as part of the advanced proxy scenarios, it also illustrates different scenarios, wherein a hybrid setup can be used.



Resources



About the authors

Ramakrishna Gorthi

Ramakrishna J Gorthi is a developer for the IBM Tivoli Directory Server, Pune center in India. He has six years of experience in the IT industry, all in IBM, with one year of experience in Level 2 Customer Support for the various versions of the IBM Tivoli Directory Server and the rest of the experience in the IBM Tivoli Directory Server development and testing. He has authored the TDS IBM Redbook® titled Understanding LDAP. He has written a developerWorks article titled “TAM-TDS Migration. He holds a degree in Computer Engineering from Pune Institute of Computer Technology, Pune (India). His areas of expertise include IBM Tivoli Directory Server from the Tivoli Security Products and DB2®.


Darshan Donni works for the IBM Tivoli Directory Server Level 2 Customer Support team. He was formerly with the IBM Tivoli Directory Server test team.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top