If you are setting up a large scale LDAP infrastructure i ‘d suggest you try and break the load across multiple servers instead of keeping everything in one large monolithic beast. That way you are able to easily add more power (by adding another server) if the load increases and you only lose a percentage of your total available power if you lose one server due to software or hardware malfunction. Moreover it’s always cheaper to buy a standard server than a huge multiprocessor multiGB memory mountain.

You should split your ldap servers into two main categories: Write Masters and Read-Only Replicas. Write masters should be the only ones serving write requests and complex reads while read-only replicas should service read requests from other application servers (mail, radius, web servers). The main difference between these two categories is in the type of requests that they will service. More specifically:

  • Write Masters should service write requests and search requests that usually return multiple entries (substring queries and others). Clients for write masters should be applications like:
    • User Administration Interface
    • End users performing people searches through application like email readers (which usually support searching a directory for user emails).
    • Applications that perform complex and time consuming queries. For instance dynamic mailing lists which use ldap searches to create their member list on every email sent.
  • Read-only replicas should service queries that usually only return a single entry. This mostly includes user authentication for various services (web access, radius, etc.) and email routing. One very important characteristic of such services is that only a small percentage of the total user population actually use them each day. For example you might have 1,000,000 users with email access but you will soon find out that only a small percentage (lower than 50%) actually use it each day. Even smaller percentage use services like dialup access, web access and others.

Splitting your traffic like this allows you to create different requirements on your server hardware. Write Masters usually need to perform a lot of disk activity since they service write requests and complex searches (which requires heavy indexing) and they also require large enough cache memory sizes in order to be able to find most of the requested entries on memory. In most cases, almost all of the user entries available in your directory will have to be searched upon at least once per day. On the other hand, read-only replicas only have to service a small percentage of the total user population per day. Not only that, but the active user population tends to be steady and change very slowly so the users you serviced yesterday will most likely be the same today as well. As a result your cache memory need only be a small percentage of the total user population yet it will still achieve excellent cache hit ratios.

So to sum it up your Master and Read-Only replica hardware specification should adhere to something like the following:

  • Masters should have fast disks (in order to handle high volumes of disk traffic), RAID security (the cost of losing the master is greater than losing a read-only replica) and large enough memory to keep at least 70-80% of the database on memory. For example if you ‘re expecting 1,000,000 users use at least 4-8GB of fast memory on your masters. Strong CPU (dual processor machine) is also important since you have to perform tasks like referential integrity, attribute uniqueness and schema checking on the masters whist you can omit them on the read-only replicas.
  • Read-only replicas don’t need to be as large as the masters nor even use RAID disk security. Memory requirements are much lower as well. Depending on your application usage you might just need to hold 20-50% of your database on memory and still achieve >90% cache hit ratio. So if you are expecting 1,000,000 users 2-4GB of memory should be enough. As for CPU make sure you have enough power so that it never get’s to more than 30-40% usage (if you get a nice cache hit ratio CPU usage should stay at a minimum). This is necessary so that you can afford to lose one read-only replica without downgrading the whole ldap service performance. Lastly, there’s no need for heavy indexing like in the master servers. Since most queries will invlove user authentication and mail routing you can usually get away with just creating equality indexes on uid,mail and mailalternateaddress.
Advertisements