You are currently browsing the tag archive for the ‘LDAP’ tag.
In previous posts i pointed out the benefits of using web services (instead of direct LDAP access) to perform LDAP writes. In this post i ‘m going to take the whole discussion a few steps further and propose a mechanism (based on the basic principles of the LDAP Synchronization Protocol used by OpenLDAP for replication) to keep heterogeneous data systems synced (in our case a database with an LDAP server).
There are two, quite different scenarios when writing to an LDAP server (which actually depend on the client performing the updates):
- The first one is the case where you are using some sort of administration/modification interface (usually a web page) and perform the modifications through it. In this case the user is performing changes online and will usually wait until he gets a positive response from the underlying LDAP server. Even if you use a web services mechanism to perform the LDAP changes the end result will be the same: The web service function will return with a positive (or negative) result along with the corresponding error message if the operation failed at some point. The user is then able to take any necessary actions to make things work (like change a value to meet some constraints, use an attribute value that is not already used if we have a unique restriction on that attribute and so on) or just abandon the operation and move on. The changes performed are targeted, small, contained and serialized. Most of the error handling and decisions are performed by the user himself. The web services interface needs just perform the actions and return with the result without worrying about ‘keeping things in sync’.
- The second scenario is one where you have an external data source of some kind (usually a database) which needs to keep things synchronized with LDAP. Say your main employee source is your payroll system and you need to keep your LDAP synced with that (add people when they join the firm, modify them when needed and delete them when they leave). In this case you will usually create a web services operation interface and perform every operation when needed (call a web service add() operation when you create a user in the payroll system and so on).
The second scenario poses a few serious constraints and problems:
- Before enabling the above mechanism the external data source and LDAP need to be in sync. If your LDAP is empty and your payroll system already contains users you need a way to actually populate (usually with a manual offline mechanism) your Directory.
- You need a queuing mechanism so that changes are buffered and sent
- at a rate that the LDAP server can handle
- even if the LDAP server is down for a while
- Changes should be sent serialized, in the order they were committed to the data source and you cannot afford to lose a modification (that might break future modifications that depend on a previous one).
- An error in an operation should not put the whole synchronization process to a halt and it should be recoverable in most cases.
- If somehow you lose a modification you need a mechanism to resync the two parties.
The above complexities make it obvious that just keeping a changelog and performing the operations committed through the web services mechanism is not actually enough. What is needed is a full, robust, synchronization protocol. The OpenLDAP folks have gone through great pains to design and implement such a mechanism, called the LDAP Syncronization Protocol (described in the OpenLDAP Admin Guide and in RFC 4444). It’s a nice move to use the work done and keep the basic principles when designing a mechanism for our case.
First of all we need to define a key used to synchronize the entries. That key is actually application specific. We could have a unique employee id in our payroll system and decide to keep the same value in the employeenumber attribute in our LDAP entries. Obviously, we could have multiple data sources for each entry (for different attributes), in which case we need to define one key for each data source.
Along with that we have to define a new attribute that will hold the data sources for each entry (if we have just one that step could be skipped). The attribute will be multivalued and have one value for every data source (for instance myCompanyEntryDataSource=payroll). We will need to define a ‘friendlyname’ for each data source in our case.
If we keep multiple entry types (for instance students, faculty, employees) that each need special handling we will need to define an attribute to keep that one too (sometimes eduPersonAffiliation will suffice). If our entries are all of the same type we can skip that one too.
Thankfully, we don’t need to change much in our data source. The only thing that would be required is to keep a Global Data Version (GDV). That would just be a global, unique number that will be incremented for every write (add/delete/modify) operation on the database. The GDV could be maintained as a separate resource and be atomically and transaction-ally maintained for every operation committed (it must be incremented only by one on every operation, the increment operation should be atomic and an write operation on our data must always end with an increment on the GDV value).
We now have to define a few more things:
- Keep the global data version in our LDAP tree base in the form of a multivalued attribute. We keep one value for each data source and distinguish between them by using value tags (based on the data source friendly name we have defined). The GDV could be maintained on our data tree base entry for easy access.
- Create a PresentEntry() web service operation. That operation will just send all data (that a data source holds and shares with the LDAP directory) for an entry and update (or create if necessary) the entry on the LDAP accordingly. If the entry is synced no modification will be performed.
- Add an update timestamp attribute to the schema that will be added on all entries and will be updated by the PresentEntry() operation.
We are now one step before actually defining our synchronization mechanism. The last step is to manually walk our LDAP tree and set the data source attribute accordingly for all the entries that depend on it (if a data source attribute is actually maintained).
- Call a web service called GetDataVersion(). That will return the GDV for our data source (or null if we have never synchronized). If the GDV is stale (compared to the one maintained in the data source) then we can actually start the synchronization. Otherwise there’s no reason to perform any further action.
- Call a web service called StartTotalUpdate(DataVersion, UpdateTimestamp). DataVersion is the GDV of the source, while UpdateTimestamp is just the timestamp of our call to StartTotalUpdate(). We use this function in order to signal the start of the synchronization process as well as to achieve isolation. We don’t allow the sync process to run more than once simultaneously. The operation will return:
- DataCurrent if no action is needed
- UpdateInProgress if another update is already in progress
- NoDataVersion if LDAP does not have a GDV (in the case of initialization)
- Error in case of an error (like the LDAP server is not available)
It will also return a TotalUpdateId unique number which will need to be used by all subsequent web service operations (as an extra argument).
- Call the PresentEntry() operation for all the entries maintained on the data source (which should be synchronized with the LDAP directory). The call will be serialized and the operation on the LDAP could implement a throttling mechanism in order to perform operations with a rate that the LDAP server can handle. WS-ReliableMessaging could also be used to make sure that calls to the operation are ordered and happen only once. Every entry that is updated will also get an update timestamp attribute with the value of the UpdateTimestamp passed in StartTotalUpdate(). If the operation hits an unrecoverable error it will return the error to the caller and the update process will stop.
- Call a web service called EndTotalUpdate(TotalUpdateId, Boolean success, error message). This operations signals the end of the update process. If the update failed at some point we can signal it and also log a corresponding error message. If it was successful we update the GDV attribute in the base entry and run a post update process that will scan all entries (with a data source attribute corresponding to this source) that have an update timestamp attribute with a value other than the UpdateTimestamp. These entries should just be deleted.
When we initiate a total update and assign a totaupdateid we can also assign an idle timeout and an idle timer. Every call to the PresentEntry() operation will zero the timer. If the timer exceeds the idle timeout then we kill the update process and return a corresponding error to a call for the PresentEntry() or EndTotalUpdate() operations.
The above need minimum changes in the data source and only require coding of the PresentEntry() caller and provider. On the other hand even a single update needs walking through all the data maintained which is not scalable for more than a few hundred entries. If we have already implemented operation specific web service operations (add, change, delete etc) and we are willing to maintain a changelog on our data source we can define a process to only send updates to the LDAP and keep the above process for initial syncing and as a fallback if we lose sync.
In this case we need to create a changelog in the data source that will be updated on every write operation with the following data:
- GDV: The Global Data Version that corresponds to this operation
- Entry unique id
- Entry type
- Change Type
- Changed Data
In this case the update process can be defined as follows (quite similar to the total update process actually):
- Call the GetDataVersion()
- Call a StartUpdate() service similar to StartTotalUpdate() (with the same arguments, return values and funcionality)
- Call the operation specific web services. The services perform the operation, update the update timestamp attribute and return success or error. An optimization would be to only create a delete() operation for entry removal and keep the PresentEntry() operation for the rest of you needs.
- Call a EndUpdate() service similar to EndTotalUpdate(). The only difference is that in this case there’s no reason to walk through the tree and search for entries that need deletion.
The above process can be used if the LDAP tree has been initialized and the changelog contains enough data to perform synchronization. Otherwise a total update is performed and we fall back to normal changelog based updates.
I have previously described the advantages of using LDAP Proxy Authorization when performing changes on an LDAP server on behlaf of someone else. Here’s PHP code to actually perform proxy authorization. The code will first check the RootDSE to see if proxy authorization is supported and then perform an ldap search with proxied credentials. I ‘ll be releasing a new version of LUMS shortly containing this and a few other enhancements.
Please read the SASL chapter of OpenLDAP administrator’s guide first. Pay attention to the AuthzTo/AuthzFrom attributes and make sure you set ‘authz-policy to/from’ in order for things to work. Also, the admin guide has a small typo on the authzto/from dn regex definition. The correct form is:
In a previous post i described how a single entry change could lead to replication halt and a need to reinitialize things again. In OpenLDAP, since SyncRepl uses the present phase replication conflicts can be easily fixed. Just make the appropriate changes to the problematic entries and replication will resume and replicas will get synchronized.
The problem arises with Delta-Syncrepl. Since it depends on a Changelog database holding entry changes, when it reaches the problematic query, it will keep on trying to execute it on the replicas and replication will get halted. As a result, it’s important to be able to clear the changelog from the corresponding entries so that replication can continue.
The easiest way to do that is to simply wipe out everything from the changelog database. Syncrepl will fall back to full replication, replicas wil get synchronized and the changelog can be used to facilitate delta-syncrepl after synchronization. Since usually RefreshAndPersist replication will be used and replicas will not be far behind, the accesslog will probably be quite small and deleting it won’t make any large difference in synchronization time.
The hard way is to try and find the faulting entries and delete them. Now it’s a good time to read the slapo-accesslog man page and also look at some example entries in the accesslog database. Here are three entries, corresponding to an entry add. modify and deletion:
# 20090125102756.000001Z, accesslog
reqMod: objectClass:+ top
reqMod: objectClass:+ person
reqMod: objectClass:+ organizationalPerson
reqMod: objectClass:+ inetOrgPerson
reqMod: uid:+ test-user123
reqMod: cn:+ test user
reqMod: sn:+ user
reqMod: givenName:+ test
reqMod: structuralObjectClass:+ inetOrgPerson
reqMod: entryUUID:+ 93a43296-7f16-102d-9d8f-c7eb2283aa50
reqMod: creatorsName:+ cn=manager,dc=ntua,dc=gr
reqMod: createTimestamp:+ 20090125102756Z
reqMod: entryCSN:+ 20090125102756.999906Z#000000#001#000000
reqMod: modifiersName:+ cn=manager,dc=ntua,dc=gr
reqMod: modifyTimestamp:+ 20090125102756Z
# 20090125102958.000001Z, accesslog
reqMod: cn:= test user
reqMod: entryCSN:= 20090125102958.789043Z#000000#001#000000
reqMod: modifiersName:= cn=manager,dc=ntua,dc=gr
reqMod: modifyTimestamp:= 20090125102958Z
# 20090125103004.000001Z, accesslog
As you can see the important parts are the RDN which is based on the request time (reqStart), the request type (reqType with values add, modify and delete – although you can also just use the corresponding values for the objectclass attribute) and the request target DN (reqDN, holding the DN of the entry). Consequently, in order to be able to search the accesslog database efficiently, the administrator should add an index for the reqDN attribute and perform queries in the form: (&(reqDN=<entry DN>)(objectclass=auditAdd|auditModify|auditDelete)).
Depending on the conflicting entry’s final form, the administrator should delete all accesslog entries that will create a conflict and only leave (or not leave at all if there isn’t one) the resolving request.
I was curious to see how long online initialization would take with our server setup. First of all i wanted to test how much time slapadd takes with our user database. Keep in mind that we keep quite a lot of attributes in our user entries (the inetorg* set of objectclasses, eduperson and various service specific attributes), so we ‘ve ended up configuring 20 indexes on our database. That probably means that slapadd will have a lot of work to do when adding entries.
Plain slapadd took around 10 minutes to complete for 23,000 entries while slapadd with the -q option takes only 1,5 mintues!! Talk about a speedup. So it seems the -q option is a must.
The main difference between slapadd and online initialization though is that the later will go through all the modules configured on the ldap server. In our case this configuration includes:
- Changelog overlay (only added to the server configuration for the last init run).
- Audit log (thus all operations will get logged on the audit log file).
- constraint and attribute uniqueness overlays.
- Schema checking.
Consequently, I would assume that online initialization would take more time than the original slapadd (since it actually has to update two databases, the actual user database and the accesslog db as well as log the operations to the audit log).
(note: Please don’t take the actual init times into account, just the difference between them. Right now both replicas are hosted on the same machine as jails so i ‘m planning to update the numbers when i move one to another machine).
The first try on online init failed and seemed to get stack somewhere. Adding Sync to the value of olcLogLevel under cn=config can be very helpful on troubleshooting. It turned out that there’s an obvious chicken-egg problem with the value list attribute constraint of the constraint overlay. When setting up a value list attribute constraint, the list is taken from the result of an LDAP URL. Since during initialization there are no entries in the directory, that LDAP URL will return an empty list and as soon as you try to add an entry with an attribute corresponding to a value list constraint the operation will fail and initialization will get halted. The solution was to disable the value list constraint until initialization finished.
Next try was a success but took much more time than anticipated. One tuning action that had nice results was to increase the checkpoint interval. At first i had configured checkpoint to happen every 60KB’s on the user database and 120KB on the changelog database. Since i also had configured checkpoints to happen every 5 minutes it was a good idea to increase the size to 1MB. This dropped online initialization to 22 minutes (no accesslog overlay), more than double the time for plain slapadd to run on the same data.
Just to have an idea on the performance increase of checkpoint tuning here are the results on my test platform (consumer has a minimum configuration with just the back-hdb configuration and no changelog, auditlog, unqiueness and constraint overlays):
- Checkpoint Size: 120KB, Init Time: 14 minutes
- Checkpoint Size: 1MB, Init Time: 7 minutes (50% decrease!)
After enabling the accesslog overlay, the increase in total operation processing was quite impressive since this time online init took around 33 minutes (compared to 22 minutes).
Since syncrepl will fall back to full present/delete phase if the changelog does not contain enough information or is outdated, it becomes clear that the accesslog database and it’s contents are not mission critical. Consequently, it is advisable to enable the dbnosync configuration directive for the accesslog database. Enabling the setting dropped online init time to around 25 minutes, almost removing the additional accesslog update cost.
Lastly, a couple more notes on tuning the BDB database:
- db_stat -m will give you statistics on the BDB cache usage and hit ratio. It can be a helpful tool to check out if your BDB cachesize setting (set_cachesize in DB_CONFIG) is working properly or is set too low.
- This wiki page on Sun (although targeted on Sun DS) can provide useful hints on determining if your database page size is optimal for your configuration. In OpenLDAP back-hdb the corresponding configuration directive is dbpagesize.
Most of the end user services which use LDAP for authentication/authorization purposes usually feature the following characteristics:
- The LDAP queries performed target a single entry. The most common query filter is uid=<username> which can be easily indexed (equality index) and normally should only have one candidate entry (which means that the LDAP server can just use the index to find the entry in the database).
- Only a small percentage (as small as 10-20%) of the total user population is active at any given moment. That means that your cache memory needs not be as large as your database but only as large as your active working set. The user entries in the working set are the ones that will get accessed a lot and increase your cache hit ratio while the rest of the entries might never get accessed or be asked for very infrequently so maintaining a cache memory large enough to keep them is just a waste of memory. For example in the Greek School Network we ‘ve got around 200,000 ldap entries but in a single day only 40,000-50,000 entries will be accessed. That’s 20-25% percent which means that you can get great performance by tuning entry/directory cache to hold 40-50% of the database. In our case we get 98%(!) entry cache hit ratio with an entry cache size of 60,000 entries.
Although the above are the usual case, there are a few services that involve much heavier LDAP queries. One common example is mailing lists. We ‘ve got 70,000 teachers so an LDAP query to create a mailing list holding all their emails will hit the All-ID’s threshold (thus will not use an index) and will need to look through the whole entry database. Imagine having to run the same type of query for various other user categories (students, user’s belonging to specific organizational units and so on). The load on the LDAP server starts getting bigger as well as the total execution time of all queries.
In our 200K user database the id2entry file is 1,3GB and consequently scanning all the database from disk can take quite some time. That translates to roughly 6KB/entry. If you also take into account the indexes total size (500MB) and a minimum entry cache of around 60,000 entries (~900MB) it’s obvious that you need at least 4GB to keep database on memory and still be able to handle other LDAP queries and return processed entries quickly.
Since in this case you ‘ll hit the All ID’s threshold no matter how you run the LDAP query the only alternative is to try and find a way to lower the database size. The way to do that is Fractional Replication. Fractional Replication allows you to setup a replica that will only request a specific set of attributes for all entries. You obviously have to be a bit careful to always request all attributes required by the objectclasses on your LDAP entries but other than that you can keep the attribute set to the bare minimum to have your heavy LDAP queries working. For the mailing list example you only need to keep the attributes used for entry selection (edupersonaffiliation, edupersonorgunitdn), the mail attributes (mail, mailalternateaddress) and drop the rest. That way you can manage to lower your database size to a fraction of the original and be able to keep it in memory for all required queries lowering your query processing time dramatically.
Fractional Replication is a feature provided by all modern LDAP servers like OpenLDAP and Sun Java DS. Sun DS also provides Class of Service which can be used to dynamically create attribute values and lower your original database size.
We ‘ll be moving to OpenLDAP 2.4 shortly at ntua as our primary LDAP server. I wanted to post a few details on my testing so here goes:
In general the server is quite stable, fast and reliable. The only thing an administrator should really keep an eye on is making sure that the database files have the correct ownership/permissions after running database administration tools (slapadd, slapindex etc).
OpenLDAP has a really nice set of features which can prove quite useful like:
- Per user limits. The administrator can set time and size limits on a per user or user group basis apart from setting it globally. One nice application is to limit anonymous access to only a few entries per query while allowing much broader limits for authenticated users. In my case I found that you have to set both soft and hard limits for things to work correctly.
- MemerOf Overlay. Maintains a memberOf attribute in users which are members of static groups.
- Dynamic Group Overlay. Dynamically creates the member attribute for dynamic groups based on the memberURL attribute value. Since the memberURL is evaluated on every group entry access care should be taken so that only the proper users access the group entries.
- Audit Log Overlay. Maintains an audit text log file with all entry changes.
- Referrential Integrity and Attribute Uniqueness Overlays
- Constraint Overlay. A very handy overlay which allows the administrator to set various constraints on attribute values. Examples are count constraint (for instance on the userpassword attribute), size constraint (jpegPhoto attribute), regular expression constraint (mail attribute) and even value list constraint (through an LDAP URI) which can be very handy for attributes with a specific value set (like edupersonaffiliation). Make sure you use 2.4.13 though since value list constraint will crash the server in earlier versions.
- Monitor Backend in order to monitor directly through LDAP server status, parameters and operations.
- Dynamic Configuration which provides an LDAP view of all the server configuration thus allowing the administrator to dynamically change most of the server configuration. I would not recommend only using dynamic config (although it is possible) since it’s a bit cryptic and hard to administer. What i did is to enable it on top of the regular text slapd.conf in order to be able to dynamically change a few parameters, especially the database read-only feature (which can be used to perform online maintenance tasks like indexing or backing up data).
I ‘ve never given too much attention to LDAP Proxy Authorization till recently when a colleague brought it up. It’s actually a very neat way to perform operations on an LDAP server as a normal user without requiring knowledge of the user’s credentials. As a result you don’t need to setup an all powerful account but you can actually perform actions using a target user’s actual identity, simplifying access control policy on the LDAP server.
The above can come very handy in Single Sign On cases. Imagine a web site which uses Shibboleth to authenticate users (and thus has no knowledge of their password) but also has to perform actions on the user accounts stored on an LDAP server, either directly or by using a web service interface. Since the web site does not have the user’s password, it cannot perform an LDAP Bind with the user’s credentials. What it can do though is to bind as a special user which has proxy authorization privileges, SASL authorize to the user’s DN and perform the corresponding actions.
More information on Proxy Authorization and how to set it up in OpenLDAP can be found in the Administrator’s Guide ‘Using SASL’ chapter. One nice feature in OpenLDAP is that you can limit the accounts to which a user can authorize to to a specific user set defined either by an LDAP URL or a DN regular expression.
We have a 6.X setup running for quite a while (but not servicing requests yet) which will replace the current 5.X setup. In order to keep data current we have the primary 6.X master acting as a consumer of the 5.X master. We ‘ve run into the same problem many times and it seems that 5.X and 6.X act differently when adding an entry with empty RDN (something like uid=, ou=people, dc=<domain>). 5.X will allow the entry to be added while 6.X will reject it. The end result is that every time an entry like that is added (mostly by mistake) in the 5.X infrastructure, replication to the 6.X servers is halted and we need to initialize all of them from the start.
Seems strange that the servers exhibit such a different behaviour. If there’s a way to make them behave… properly i ‘d be glad to learn about it.
We ‘ve setup a DS6.X farm for the Greek School Network. It includes two datacenters (one for north and one for south Greece), each with one write master and two read-only replicas. The write masters are in a multi-master topology while the read-only replicas all get updated from both masters. The idea is to be able to lose up to one data-center without complete ldap service failure. We ‘ve been playing with the servers for about a month now and my impressions so far are:
- The web console is far superior to the previous Java console. Much faster, easily accessible by just a web browser and nicely setup.
- The installation is much harder now and requires far too many commands to get things going. Instead of just a zip with an installer you now have to install the directory server, play with cacao and dscc (directory server control center), add the console in the application server and learn a lot of <whatever>adm commands. Also we found the installation guide to be quite lacking and a bit unclear in a few issues.
- Security should be tighter with the new server since you now have to worry about a dozen ports instead of just the ldap(s) and console ports. You have a few ports for cacao two ports (http/https) for the console and the usual directory server ports
- The documentation quality is a bit degraded compared to previous products. Seems like documentation writing is starting to be outsourced to India as well :) I got the impression that the 5.X documentation was written by the engineers themselves while the 6.X was written from an outside partner.
- Until 6.1 we ‘ve faced quite a lot of stability issues (the servers run Solaris on 64-bit x86). The ldap server would crash for no reason and sometimes replication would fail and replicas would have to be reinitialized. We ‘ve installed 6.2 today and so far so good but i can say i ‘m rather disappointed with the software quality compared to the one i had gotten used to with the 5.X versions.
We haven’t tried to do any stress testing yet to see how the servers behave from a performance point. Online Replica initialization (over a LAN) on the other hand can reach speeds of at least 500 entries/sec (with 256MB import cache) which are quite impressive. I ‘ll try and update this post as things move on.
Ludo posted a nice link to some FS tuning guidelines for a DS
Got my paper included in the 1st LDAP Conference which will take place in Cologne, Germany between 6-7 September. Seems that all of the right people will be there including Kurt Zeilegna (OpenLDAP founder), Howarch Chu (SYMAS – OpenLDAP), Alex Karasulu (ApacheDS), Ludovic Poitou (Sun, OpenDS). Usually i can only find a few presentations in a conference that i feel i must attend. In this case i cannot find a single presentation not worth it’s time. Seems it will be two very busy days.
Abstracts Page: http://www.guug.de/veranstaltungen/ldapcon2007/abstracts.html
Conference Page: http://www.guug.de/veranstaltungen/ldapcon2007/index.html