There are many cases where you want to move from an old, legacy LDAP backend to OpenLDAP. Sometimes, this transition requires moving to a new naming context (for instance from o=<company>,c=<country> style to dc-based naming) and a lot of schema changes. The problem that the administrator usually faces is performing the necessary changes on the actual ldap data, a task that usually requires writing a script to manipulate an LDIF export. That is always a hard task and any error/omission is not easily fixed.
Another way to do things is to use the backends/overlays provided by OpenLDAP to transform the actual online data in such a way that a simple LDAP search on the whole tree will be enough to get an LDIF file ready for import on the new system. The necessary steps include (in the order described):

  • The meta backend to proxy requests to the legacy LDAP server.
  • The rwm overlay to map attributes and objectclasses to new names and delete those that will no longer be needed.
  • The relay backend to perform a suffix massage (if it is required). The suffix massage can be done earlier but doing that on a later stage provides the advantage of being able to transform DN-syntax values of mapped attributes.

Here’s an example (real life) configuration for the above scenario:

Read the rest of this entry »

In previous posts i pointed out the benefits of using web services (instead of direct LDAP access) to perform LDAP writes. In this post i ‘m going to take the whole discussion a few steps further and propose a mechanism (based on the basic principles of the LDAP Synchronization Protocol used by OpenLDAP for replication) to keep heterogeneous data systems synced (in our case a database with an LDAP server).

There are two, quite different scenarios when writing to an LDAP server (which actually depend on the client performing the updates):

  • The first one is the case where you are using some sort of administration/modification interface (usually a web page) and perform the modifications through it. In this case the user is performing changes online and will usually wait until he gets a positive response from the underlying LDAP server. Even if you use a web services mechanism to perform the LDAP changes the end result will be the same: The web service function will return with a positive (or negative) result along with the corresponding error message if the operation failed at some point. The user is then able to take any necessary actions to make things work (like change a value to meet some constraints, use an attribute value that is not already used if we have a unique restriction on that attribute and so on) or just abandon the operation and move on.  The changes performed are targeted, small,  contained and serialized. Most of the error handling and decisions are performed by the user himself. The web services interface needs just perform the actions and return with the result without worrying about ‘keeping things in sync’.
  • The second scenario is one where you have an external data source of some kind (usually a database) which needs to keep things synchronized with LDAP. Say your main employee source is your payroll system and you need to keep your LDAP synced with that (add people when they join the firm, modify them when needed and delete them when they leave). In this case you will usually create a web services operation interface and perform every operation when needed (call a web service add() operation when you create a user in the payroll system and so on).

The second scenario poses a few serious constraints and problems:

  1. Before enabling the above mechanism the external data source and LDAP need to be in sync. If your LDAP is empty and your payroll system already contains users you need a way to actually populate (usually with a manual offline mechanism) your Directory.
  2. You need a queuing mechanism so that changes are buffered and sent
    1. at a rate that the LDAP server can handle
    2. even if the LDAP server is down for a while
  3. Changes should be sent serialized, in the order they were committed to the data source and you cannot afford to lose a modification (that might break future modifications that depend on a previous one).
  4. An error in an operation should not put the whole synchronization process to a halt and it should be recoverable in most cases.
  5. If somehow you lose a modification you need a mechanism to resync the two parties.

The above complexities make it obvious that just keeping a changelog and performing the operations committed through the web services mechanism is not actually enough. What is needed is a full, robust, synchronization protocol. The OpenLDAP folks have gone through great pains to design and implement such a mechanism, called the LDAP Syncronization Protocol (described in the OpenLDAP Admin Guide and in RFC 4444). It’s a nice move to use the work done and keep the basic principles when designing a mechanism for our case.

First of all we need to define a key used to synchronize the entries. That key is actually application specific. We could have a unique employee id in our payroll system and decide to keep the same value in the employeenumber attribute in our LDAP entries. Obviously, we could have multiple data sources for each entry (for different attributes), in which case we need to define one key for each data source.

Along with that we have to define a new attribute that will hold the data sources for each entry (if we have just one that step could be skipped). The attribute will be multivalued and have one value for every data source (for instance myCompanyEntryDataSource=payroll). We will need to define a ‘friendlyname’ for each data source in our case.

If we keep multiple entry types (for instance students, faculty, employees) that each need special handling we will need to define an attribute to keep that one too (sometimes eduPersonAffiliation will suffice). If our entries are all of the same type we can skip that one too.

Thankfully, we don’t need to change much in our data source. The only thing that would be required is to keep a Global Data Version (GDV). That would just be a global, unique number that will be incremented for every write (add/delete/modify) operation on the database. The GDV could be maintained as a separate resource and be atomically and transaction-ally maintained for every operation committed (it must be incremented only by one on every operation, the increment operation should be atomic and an write operation on our data must always end with an increment on the GDV value).

We now have to define a few more things:

  • Keep the global data version in our LDAP tree base in the form of a multivalued attribute. We keep one value for each data source and distinguish between them by using value tags (based on the data source friendly name we have defined). The GDV could be maintained on our data tree base entry for easy access.
  • Create a PresentEntry() web service operation. That operation will just send all data (that a data source holds and shares with the LDAP directory) for an entry and update (or create if necessary) the entry on the LDAP  accordingly. If the entry is synced no modification will be performed.
  • Add an update timestamp attribute to the schema that will be added on all entries and will be updated by the PresentEntry() operation.

We are now one step before actually defining our synchronization mechanism. The last step is to manually walk our LDAP tree and set the data source attribute accordingly for all the entries that depend on it (if a data source attribute is actually maintained).

Synchronization Process:

  • Call a web service called GetDataVersion(). That will return the GDV for our data source (or null if we have never synchronized). If the GDV is stale (compared to the one maintained in the data source) then we can actually start the synchronization. Otherwise there’s no reason to perform any further action.
  • Call a web service called StartTotalUpdate(DataVersion, UpdateTimestamp). DataVersion is the GDV of the source, while UpdateTimestamp is just the timestamp of our call to StartTotalUpdate(). We use this function in order to signal the start of the synchronization process as well as to achieve isolation. We don’t allow the sync process to run more than once simultaneously. The operation will return:
    • DataCurrent if no action is needed
    • UpdateStarted
    • UpdateInProgress if another update is already in progress
    • NoDataVersion if LDAP does not have a GDV (in the case of initialization)
    • Error in case of an error (like the LDAP server is not available)

    It will also return a TotalUpdateId unique number which will need to be used by all subsequent web service operations (as an extra argument).

  • Call the PresentEntry() operation for all the entries maintained on the data source (which should be synchronized with the LDAP directory). The call will be serialized and the operation on the LDAP could implement a throttling mechanism in order to perform operations with a rate that the LDAP server can handle. WS-ReliableMessaging could also be used to make sure that calls to the operation are ordered and happen only once. Every entry that is updated will also get an update timestamp attribute with the value of the UpdateTimestamp passed in StartTotalUpdate(). If the operation hits an unrecoverable error it will return the error to the caller and the update process will stop.
  • Call a web service called EndTotalUpdate(TotalUpdateId, Boolean success, error message). This operations signals the end of the update process. If the update failed at some point we can signal it and also log a corresponding error message. If it was successful we update the GDV attribute in the base entry and run a post update process that will scan all entries (with a data source attribute corresponding to this source) that have an update timestamp attribute with a value other than the UpdateTimestamp. These entries should just be deleted.

When we initiate a total update and assign a totaupdateid we can also assign an idle timeout and an idle timer. Every call to the PresentEntry() operation will zero the timer. If the timer exceeds the idle timeout then we kill the update process and return a corresponding error to a call for the PresentEntry() or EndTotalUpdate() operations.

The above need minimum changes in the data source and only require coding of the PresentEntry() caller and provider. On the other hand even a single update needs walking through all the data maintained which is not scalable for more than a few hundred entries. If we have already implemented operation specific web service operations (add, change, delete etc) and we are willing to maintain a changelog on our data source we can define a process to only send updates to the LDAP and keep the above process for initial syncing and as a fallback if we lose sync.

In this case we need to create a changelog in the data source that will be updated on every write operation with the following data:

  • GDV: The Global Data Version that corresponds to this operation
  • Entry unique id
  • Entry type
  • Change Type
  • Changed Data

In this case the update process can be defined as follows (quite similar to the total update process actually):

  • Call the GetDataVersion()
  • Call a StartUpdate() service similar to StartTotalUpdate() (with the same arguments, return values and funcionality)
  • Call the operation specific web services.  The services perform the operation, update the update timestamp attribute and return success or error. An optimization would be to only create a delete() operation for entry removal and keep the PresentEntry() operation for the rest of you needs.
  • Call a EndUpdate() service similar to EndTotalUpdate(). The only difference is that in this case there’s no reason to walk through the tree and search for entries that need deletion.

The above process can be used if the LDAP tree has been initialized and the changelog contains enough data to perform synchronization. Otherwise a total update is performed and we fall back to normal changelog based updates.

I have previously described the advantages of using LDAP Proxy Authorization when performing changes on an LDAP server on behlaf of someone else. Here’s PHP code to actually perform proxy authorization. The code will first check the RootDSE to see if proxy authorization is supported and then perform an ldap search with proxied credentials. I ‘ll be releasing a new version of LUMS shortly containing this and a few other enhancements.

Please read the SASL chapter of OpenLDAP administrator’s guide first. Pay attention to the AuthzTo/AuthzFrom attributes and make sure you set ‘authz-policy to/from’ in order for things to work. Also, the admin guide has a small typo on the authzto/from dn regex definition.  The correct form is:

authzTo: {0}dn.regex:^uid=[^,]*,ou=people,<base>$

Read the rest of this entry »

Since it was always hard to find documentation i ‘ve gathered available noSOAP API documentation source here:

SOAP Arrays

I had a problem with using arrays with certain WS clients. It seems that nuSOAP uses SOAP array which are deprecated in favor of xml sequences according to WS-I. Here’s a small piece of code to create:

  • An array of strings
  • A soap structure
  • An array of the above structure
$server->wsdl->addComplexType(
        'ArrayOfDN',
        'complexType',
        'array',
        $soap_compositor,
        '',
        array(
                'dn' => array('name' => 'dn', 'type' => 'xsd:string',
                        'minOccurs' => '0', 'maxOccurs' => 'unbounded')
        )
);

$server->wsdl->addComplexType(
        'RoleDesc',
        'complexType',
        'struct',
        $soap_compositor,
        '',
        array(
                'cn' => array('name' => 'cn', 'type' => 'xsd:string'),
                'description' => array('name' => 'description', type => 'xsd:string')
        )
);

$server->wsdl->addComplexType(
        'ArrayOfRoleDesc',
        'complexType',
        'array',
        $soap_compositor,
        '',
        array(
                'desc' => array('name' => 'desc', 'type' => 'tns:RoleDesc',
                        'minOccurs' => '0', 'maxOccurs' => 'unbounded')
        )
);

$soap_compositor can have a value of  ‘all’, ‘sequence’ or ‘choice’. In our case the value is sequence.

UTF-8 encoded values (foreign charsets)

Another thing to keep in mind is UTF-8 encoded values (usually in foreign charsets like greek in my situation). The best approach is to use the iconv() php function to transform native charset (eg ISO-8859-7) to UTF-8 for transport and disable nusoap automatic utf8 decoding. You should also make sure to set the http transport charset to UTF-8 instead of the default iso-8859-1. The corresponding values are:

  • var $decode_utf8 = false;  in the nusoap_client and nusoap_server classes
  • var $soap_defencoding = ‘UTF-8′; in the nusoap_base class

Document/literal instead of RPC/Encoded

Modern web service clients usually request web services to be written in document-literal mode. Thus you should call the configureWSDL() and register() functions this way:

$server->configureWSDL(‘ReaderInterface’,”$url”,false,’document’);

$server->register(‘<function name>’,
array(‘argument’ => ‘xsd:string’),
array(‘retval’ => ‘xsd:string’),
false, #namespace
$url . ‘#<function name>’,
‘document’,
‘literal’,
‘<function documentation>’);

In current nuSOAP version there is no way to omit the namespace attribute in the bindings part of the WSDL. That means that you will probably hit on warning such as:

R2716 WSI-BasicProfile ver. 1.0, namespace attribute not allowed in doc/lit for soapbind:body: “SearchUser”
line 370 of file:/C:/temp/NetBeansProjects/testWSDL3/xml-resources/web-service-references/reader/wsdl/<hostname>/ws/reader.php.wsdl

when attempting to parse the WSDL in your client. For now you have to save the WSDL and remove the namespace attributes by hand. Hopefully the nuSOAP people will have it fixed soon.

SOAP faults

One common problem that you may hit on is to get an error when dissecting a soap fault. The usual fault is something like:

“No NamespaceURI, SOAP requires faultcode content to be a QName”

That means that when creating a soap fault with new soap_fault($faultcode, $faultfactor, $faultstring, $faultdetail) the $faultcode should be something like SOAP-ENV:Client or SOAP-ENV:Server instead of just ‘Client’ or ‘Server’ just as it is described in the class prototype in nusoap.php.

More of a quick not to self: I had someone in a project meeting (actually a Java Evangelist) try to make a point against using PHP for web services and using Java instead. Most of the arguments were targeted towards PHP not being scalable.

As far as Web Services PHP frameworks are conserned there’s always nuSoap and i ‘ve already made a note about WSO2. About PHP performance and scalability here’s a few arguments:

  • You ‘ll find it easier to develop in PHP than in Java and there are a lot more PHP developers out there than Java folks.
  • PHP with a properly tuned threaded Apache, running under an accelerator (to minimize page compile time) and using some kind of memory caching mechanism (like memcached or mcache to minimize database queries and session handling overhead) can be quite fast and vertically/horizontally scalable.
  • Lastly, Web Services are not a perfect example to point out scalability issues. They are are usually called a fraction of the percent that a web page is called and they are self-contained. They don’t need to keep session information and variables floating around, object memory is allocated on web service execution and is destroyed when execution ends, meaning that there’s no potential garbage collection overhead.

A more thorough examination of the issue can be found here and here.

I run across an OpenLDAP log file analysis tool. It’s written in Perl and will walk through the log file created by OpenLDAP (logging is performed through syslog) and report on various things like (taken from the website):

operations (e.g., Connect, Bind, Unbind) performed per host, unindexed searches, attributes requested, search filters used, total operations per server, and operation breakdowns by day, hour and month.

There’s also a statistics collector (collects data from cn=Monitor) that can be used as an input for graphing application in order to create graphs of server statistics.

Sun along with it’s Java Directory Server offering also provides a set of very useful utilities (as well as a set of LDAP libraries), called Directory Server Resource Kit (DSRK). It is available in binary form for Solaris and Redhat Linux so installation is as easy as running unzip.

The utilities it includes are quite helpful especially if you ‘re running a Java Directory Server (Log analyzer, Search Operations replay tool an so on). The documentation is quite helpful and provides all the necessary usage information. Maybe i ‘ll write a post soon about using them with your Sun DS installation.

One set of utilities included is specifically targeted at benchmarking your LDAP server operations. It is not as extensive as slamd but on the other side it’s very easy to understand and use so it was a good opportunity to use it on our OpenLDAP installation to measure it’s relative performance, especially with different configuration options.

Just as a side note, i am not trying to measure OpenLDAP performance or compare it with other products, i was just trying to find the best configuration for our setup and the software/hardware limits. The client was not even on the same network as the server although i made sure they were on a gigabit path.

License notice: According to the license accompanying DSRK “3.5 You may not publish or provide the results of any benchmark or comparison tests run on the Licensed Software to any third party without Sun’s prior written consent.” i ‘ve removed actual numbers from the post in order to stay out of trouble. If i find that i can include them, this notice will be removed and the numbers added.

All tools are multithreaded and provide command line option to set the number of threads used. In my case i found that i got the best performance when i set the number of threads to be equal to the number of real CPU cores. Also most of the tools allow random searches on the directory by performing substitutions on the provided search filter with random lines from an input file. That way you can simulate real life usage of your directory service.

First tool i used was the authentication rate measurement tool. I tried two setups. In the first the tool would perform a new connection for each bind while in the second it would keep a connection open and try rebinding. I only used one specific bind DN since i only wanted to measure the bind operation speed and not ldap entry searching. In the first case i got ‘a few’ binds/sec (with a single threaded tool run) while in the second case a multithreaded run performed a few thousands binds/sec. Using tcpdump to check out network overhead it became clear that every bind (along with connection establishment and breakup) needed 0,028 seconds with all the network packets transmitted. As a result each thread had a network upper limit of 35 binds/sec which made clear that this type of setup could not achieve extraordinary performance in a connect and bind every time setup.

The second tool was the search rate measurement tool. I always made sure to connect and bind once and use that connection for subsequent searches in order to measure only the LDAP Search speed. What i wanted to see was the impact of different configuration on the BDB and entry caches. Before each run i restarted the ldap server and performed an ldapsearch of the whole tree in order to prime the caches. All the iterations were random searches on the uid attribute requesting all entry attributes. My database holds about 24,000 entries, has a minimum BDB cache of 4MB and a total size of 109MB. The results were interesting:

BDB cache Entry cache Searches/Sec DBD cache hit ratio id2entry/dn2id/uid/total OpenLDAP process resident memory size
4MB zero

A

81%,87%,94%,91% 62MB
8MB zero

B

82%,94%,97%,93% 79MB
60MB zero

C

97%,99%,99%,99% 141MB
110MB zero

D

99% all 142MB
110MB 8000

E

- 232MB
119MB 24000

F

- 331MB
8MB

8000

G

- 211MB

BDB cache hit ratios were taken using db_stat -m. I started by keeping entry cache to zero entries and increasing the BDB cache in order to check out the difference in performance. Initial run with 4MBs produced a few thousand searches/sec while a BDB cache of 110MB produced a medium increase while it required more than double memory. You may wonder about the negligible difference in memory usage between 60 and 110MB of BDB cache. The reason is that i only performed searches on the uid attribute so, although the total database size is 109MBs, the sum of the id2entry.bdb, dn2id.dbd and uid.bdb was only 82MB. Since BDB will automatically increase the BDB cache setting by 25%, 60MB was actually 75MB, so increasing the setting to 110MB didn’t have any real effect. Having a 99% BDB cache hit ratio though proved helpful since the search rate increased quite significantly.

The rest of the runs tried to measure the entry cache performance hit. I kept BDB cache to 110MB in order to keep database files on memory and only measure the entry cache speedup. First run was made with an entry cache of 8000 entries (1/3 of the total entries) and the other with an entry cache of 24000 entries (100% of the entries). The entry cache provided a substantial increase (with 24,000 entries entry cache) from the first run and the 110MB BDB cache run. Memory usage increase was significant though since resident size increased to 330MB (compared to 60MB with 4MB BDB cache, almost 6x times more).

The last run tried to use both caches while minimizing memory usage. The server was setup to use 8MB of BDB cache and 8000 entries entry cache. It performed more than the 110MB BDB cache run but less than the entry cache measuring tries. That’s an imporant  increase from the initial run and it reaches a large percentage of the results of the combined full BDB/Entry cache with 211MB memory usage (3,5x 60MB and 65% of the 330MB).

It’s obvious that adding more caching can change the search rate substantially although that incurs a large cost in memory usage. Also, adding more cache than your working set will probably have no result and will just use up valuable memory. Since usually memory available is not unlimited (and the machine used as an LDAP server might also perform other duties as well) i would suggest a strategy like the last run. Set BDB cache to 2x or 3x times the minimum required and the entry cache to a reasonable value to hold your working set and not consume all your system memory. If on the other hand your database is small or you have enough memory, increasing the caches can only help in performance (but always try to keep the working set and not the whole database).

The last tool used was the entry modification rate measurement tool. I kept the full BDB/Entry cache settings in order to keep my database on memory and only measure the modification overhead (and not the entry searching cost). The tool was configured to change the userpassword attribute of a random entry with a random 10 character string. Since the entry selection was totally random that meant that database and disk access would also be random and performance would be much smaller than adding entries. In my setup i had the accesslog database, the audit log and logging enabled so an entry modification actually meant multiple disk writes to all the corresponding databases and log files. I made two test runs. One with a single thread and one with 8 threads (always connecting and binding once to minimize network and connection overhead). The results were a bit poor:

  • Single threaded: A mods/sec
  • 8 threads: B (B more than A) mods/sec

Measuring i/o usage with systat i found that the disk subsystem was quite saturated performing around 250-300 transactions/sec so the above numbers were probably the actual disk limit of the server.  I wanted to see what the overhead of the accesslog and audit log was so i temporarily disabled them and run the test again. This time it reached to more than double the number of the plain 8 thread run. The above makes it obvious that adding modules like audit log or accesslog can be helpful for debugging or for increasing the replication performance but it can also hurt the write throughput a lot. As a result, the administrator should always try and keep modules that require disk writes to the minimum.

Most of the tips listed here are included in my previous posts but i thought it would be a good idea to also keep them all in one place (and update it every time i have a new one :)). So here goes:

  • Make sure you index the values that need indexing and only keep the indexes that count. Also take into account that if the candidate entries for an entry are a significant percentage of the total entry number you are probably better off with sequential scanning of the whole database, compared to random access of a large percentage of the entries. It is very important to index attributes used for attribute uniqueness and for replication (entryCSN and entryUUID for the back-hdb and entryCSN, reqEnd, reqResult, reqStart for the accesslog database).
  • The most important cache is the Berkley DB cache which will allow you to keep database content in memory and reduse disk access to the minimum. As i showed in previous posts you can get a significant cache hit ratio by just keeping the internal B-tree pages for all the *.bdb files of your database. To calculate the total size needed you should run db_stat -d for all the bdb files and take the sum of (internal pages * page size). This sum should then be used as the value for the set_cachesize directive in DB_CONFIG. db_stat -m can be used to gather statistics on the BDB cache usage.
  • Since we are on the subject, in case the database page size is not suitable for the kind of data the database is carrying, db_stat -d will show a large amount of overflow pages for the problematic database files. You can use the dbpagesize <dbfile> <size> back-hdb directive to set the page size accordingly (but that change requires reinitializing your database).
  • If you can keep the database log file on another disk that will definitely increase performance.
  • Another idea is to set buffered i/o for the openldap log file in syslog.conf.
  • The entry cache should hold your entry working set. Your database might hold 100K users but you might notice that only 10K use your services each day. In this case it would be a good idea (especially if you are short on memory) to only keep the working set in entry cache. idlcache should be 3 times the size of the entry cache.
  • The checkpoint directive value should be set to a large value like 1MB. A nice idea would be to also set it to perform checkpoints every 5 minutes. That way if you have bursts of entry changes (or you are initializing your database) you will checkpoint every 1MB or else (during normal operation) every 5 minutes.
  • Since the accesslog database is not mission critical, if you are using it it would be a good idea to set the dbnosync directive.
  • Set 4 threads per CPU core.
  • UPDATE: You should use db_stat -c to take a look at the lock subsystem. If you see the maximum number of locks/lockers/lock objects at any one time reaching the maximum number possible you should increase the maximum values. The settings are:
set_lk_max_locks <nun>
set_lk_max_lockers <num
set_lk_max_objects <num>
  • UPDATE2: On Linux systems and with large user base slapadd will run much better if you use shared memory instead of the default mmaped files (shm_key directive in back-hdb configuration).

In a previous post i described how a single entry change could lead to replication halt and a need to reinitialize things again. In OpenLDAP, since SyncRepl uses the present phase replication conflicts can be easily fixed. Just make the appropriate changes to the problematic entries and replication will resume and replicas will get synchronized.

The problem arises with Delta-Syncrepl. Since it depends on a Changelog database holding entry changes, when it reaches the problematic query, it will keep on trying to execute it on the replicas and replication will get halted. As a result, it’s important to be able to clear the changelog from the corresponding entries so that replication can continue.

The easiest way to do that is to simply wipe out everything from the changelog database. Syncrepl will fall back to full replication, replicas wil get synchronized and the changelog can be used to facilitate delta-syncrepl after synchronization. Since usually RefreshAndPersist replication will be used and replicas will not be far behind, the accesslog will probably be quite small and deleting it won’t make any large difference in synchronization time.

The hard way is to try and find the faulting entries and delete them. Now it’s a good time to read the slapo-accesslog man page and also look at some example entries in the accesslog database. Here are three entries, corresponding to an entry add. modify and deletion:

# 20090125102756.000001Z, accesslog
dn: reqStart=20090125102756.000001Z,cn=accesslog
objectClass: auditAdd
reqStart: 20090125102756.000001Z
reqEnd: 20090125102757.000000Z
reqType: add
reqSession: 17
reqAuthzID: cn=manager,dc=ntua,dc=gr
reqDN: uid=test-user123,dc=ntua,dc=gr
reqResult: 0
reqMod: objectClass:+ top
reqMod: objectClass:+ person
reqMod: objectClass:+ organizationalPerson
reqMod: objectClass:+ inetOrgPerson
reqMod: uid:+ test-user123
reqMod: cn:+ test user
reqMod: sn:+ user
reqMod: givenName:+ test
reqMod: structuralObjectClass:+ inetOrgPerson
reqMod: entryUUID:+ 93a43296-7f16-102d-9d8f-c7eb2283aa50
reqMod: creatorsName:+ cn=manager,dc=ntua,dc=gr
reqMod: createTimestamp:+ 20090125102756Z
reqMod: entryCSN:+ 20090125102756.999906Z#000000#001#000000
reqMod: modifiersName:+ cn=manager,dc=ntua,dc=gr
reqMod: modifyTimestamp:+ 20090125102756Z

# 20090125102958.000001Z, accesslog
dn: reqStart=20090125102958.000001Z,cn=accesslog
objectClass: auditModify
reqStart: 20090125102958.000001Z
reqEnd: 20090125102958.000002Z
reqType: modify
reqSession: 21
reqAuthzID: cn=manager,dc=ntua,dc=gr
reqDN: uid=test-user123,dc=ntua,dc=gr
reqResult: 0
reqMod: cn:= test user
reqMod: entryCSN:= 20090125102958.789043Z#000000#001#000000
reqMod: modifiersName:= cn=manager,dc=ntua,dc=gr
reqMod: modifyTimestamp:= 20090125102958Z

# 20090125103004.000001Z, accesslog
dn: reqStart=20090125103004.000001Z,cn=accesslog
objectClass: auditDelete
reqStart: 20090125103004.000001Z
reqEnd: 20090125103004.000002Z
reqType: delete
reqSession: 22
reqAuthzID: cn=manager,dc=ntua,dc=gr
reqDN: uid=test-user123,dc=ntua,dc=gr
reqResult: 0

As you can see the important parts are the RDN which is based on the request time (reqStart), the request type (reqType with values add, modify and delete – although you can also just use the corresponding values for the objectclass attribute) and the request target DN (reqDN, holding the DN of the entry).  Consequently, in order to be able to search the accesslog database efficiently, the administrator should add an index for the reqDN attribute and perform queries in the form: (&(reqDN=<entry DN>)(objectclass=auditAdd|auditModify|auditDelete)).

Depending on the conflicting entry’s final form, the administrator should delete all accesslog entries that will create a conflict and only leave (or not leave at all if there isn’t one) the resolving request.

I was curious to see how long online initialization would take with our server setup. First of all i wanted to test how much time slapadd takes with our user database. Keep in mind that we keep quite a lot of attributes in our user entries (the inetorg* set of objectclasses, eduperson and various service specific attributes), so we ‘ve ended up configuring 20 indexes on our database. That probably means that slapadd will have a lot of work to do when adding entries.
Plain slapadd took around 10 minutes to complete for 23,000 entries while slapadd with the -q option takes only 1,5 mintues!! Talk about a speedup. So it seems the -q option is a must.
The main difference between slapadd and online initialization though is that the later will go through all the modules configured on the ldap server. In our case this configuration includes:

  • Changelog overlay (only added to the server configuration for the last init run).
  • Audit log (thus all operations will get logged on the audit log file).
  • constraint and attribute uniqueness overlays.
  • Schema checking.

Consequently, I would assume that online initialization would take more time than the original slapadd (since it actually has to update two databases, the actual user database and the accesslog db as well as log the operations to the audit log).

(note:  Please don’t take the actual init times into account, just the difference between them. Right now both replicas are hosted on the same machine as jails so i ‘m planning to update the numbers when i move one to another machine).

The first try on online init failed and seemed to get stack somewhere. Adding Sync to the value of olcLogLevel under cn=config can be very helpful on troubleshooting. It turned out that there’s an obvious chicken-egg problem with the value list attribute constraint of the constraint overlay. When setting up a value list attribute constraint, the list is taken from the result of an LDAP URL. Since during initialization there are no entries in the directory, that LDAP URL will return an empty list and as soon as you try to add an entry with an attribute corresponding to a value list constraint the operation will fail and initialization will get halted. The solution was to disable the value list constraint until initialization finished.

Next try was a success but took much more time than anticipated.  One tuning action that had nice results was to increase the checkpoint interval. At first i had configured checkpoint to happen every 60KB’s on the user database and 120KB on the changelog database. Since i also had configured checkpoints to happen every 5 minutes it was a good idea to increase the size to 1MB. This  dropped online initialization to 22 minutes (no accesslog overlay), more than double the time for plain slapadd to run on the same data.

Just to have an idea on the performance increase of checkpoint tuning here are the results on my test platform (consumer has a minimum configuration with just the back-hdb configuration and no changelog, auditlog, unqiueness and constraint overlays):

  • Checkpoint Size: 120KB, Init Time: 14 minutes
  • Checkpoint Size: 1MB, Init Time: 7 minutes (50% decrease!)

After enabling the accesslog overlay, the increase in total operation processing was quite impressive since this time online init took around 33 minutes (compared to 22 minutes).

UPDATE:

Since syncrepl will fall back to full present/delete phase if the changelog does not contain enough information or is outdated, it becomes clear that the accesslog database and it’s contents are not mission critical. Consequently, it is advisable to enable the dbnosync configuration directive for the accesslog database. Enabling the setting dropped online init time to around 25 minutes, almost removing the additional accesslog update cost.

Lastly, a couple more notes on tuning the BDB database:

  • db_stat -m will give you statistics on the BDB cache usage and hit ratio. It can be a helpful tool to check out if your BDB cachesize setting (set_cachesize in DB_CONFIG) is working properly or is set too low.
  • This wiki page on Sun (although targeted on Sun DS) can provide useful hints on determining if your database page size is optimal for your configuration. In OpenLDAP back-hdb the corresponding configuration directive is dbpagesize.

About Me

Kostas Kalevras

LinkedIn profile

E-mail:kkalev AT gmail DOT com
My status

More about me...

a

Follow

Get every new post delivered to your Inbox.