You are currently browsing the tag archive for the ‘performance’ tag.
More of a quick not to self: I had someone in a project meeting (actually a Java Evangelist) try to make a point against using PHP for web services and using Java instead. Most of the arguments were targeted towards PHP not being scalable.
As far as Web Services PHP frameworks are conserned there’s always nuSoap and i ‘ve already made a note about WSO2. About PHP performance and scalability here’s a few arguments:
- You ‘ll find it easier to develop in PHP than in Java and there are a lot more PHP developers out there than Java folks.
- PHP with a properly tuned threaded Apache, running under an accelerator (to minimize page compile time) and using some kind of memory caching mechanism (like memcached or mcache to minimize database queries and session handling overhead) can be quite fast and vertically/horizontally scalable.
- Lastly, Web Services are not a perfect example to point out scalability issues. They are are usually called a fraction of the percent that a web page is called and they are self-contained. They don’t need to keep session information and variables floating around, object memory is allocated on web service execution and is destroyed when execution ends, meaning that there’s no potential garbage collection overhead.
Most of the tips listed here are included in my previous posts but i thought it would be a good idea to also keep them all in one place (and update it every time i have a new one :)). So here goes:
- Make sure you index the values that need indexing and only keep the indexes that count. Also take into account that if the candidate entries for an entry are a significant percentage of the total entry number you are probably better off with sequential scanning of the whole database, compared to random access of a large percentage of the entries. It is very important to index attributes used for attribute uniqueness and for replication (entryCSN and entryUUID for the back-hdb and entryCSN, reqEnd, reqResult, reqStart for the accesslog database).
- The most important cache is the Berkley DB cache which will allow you to keep database content in memory and reduse disk access to the minimum. As i showed in previous posts you can get a significant cache hit ratio by just keeping the internal B-tree pages for all the *.bdb files of your database. To calculate the total size needed you should run db_stat -d for all the bdb files and take the sum of (internal pages * page size). This sum should then be used as the value for the set_cachesize directive in DB_CONFIG. db_stat -m can be used to gather statistics on the BDB cache usage.
- Since we are on the subject, in case the database page size is not suitable for the kind of data the database is carrying, db_stat -d will show a large amount of overflow pages for the problematic database files. You can use the dbpagesize <dbfile> <size> back-hdb directive to set the page size accordingly (but that change requires reinitializing your database).
- If you can keep the database log file on another disk that will definitely increase performance.
- Another idea is to set buffered i/o for the openldap log file in syslog.conf.
- The entry cache should hold your entry working set. Your database might hold 100K users but you might notice that only 10K use your services each day. In this case it would be a good idea (especially if you are short on memory) to only keep the working set in entry cache. idlcache should be 3 times the size of the entry cache.
- The checkpoint directive value should be set to a large value like 1MB. A nice idea would be to also set it to perform checkpoints every 5 minutes. That way if you have bursts of entry changes (or you are initializing your database) you will checkpoint every 1MB or else (during normal operation) every 5 minutes.
- Since the accesslog database is not mission critical, if you are using it it would be a good idea to set the dbnosync directive.
- Set 4 threads per CPU core.
- UPDATE: You should use db_stat -c to take a look at the lock subsystem. If you see the maximum number of locks/lockers/lock objects at any one time reaching the maximum number possible you should increase the maximum values. The settings are:
set_lk_max_locks <nun> set_lk_max_lockers <num set_lk_max_objects <num>
- UPDATE2: On Linux systems and with large user base slapadd will run much better if you use shared memory instead of the default mmaped files (shm_key directive in back-hdb configuration).
Here are a few basic guidelines on things to keep an eye for when seting up a large scale radius service (most are FreeRADIUS specific):
First of all, when you see performance problems with your radius service always, ALWAYS blame the database first. I ‘ve seen FreeRADIUS crash on me, have memory leaks (in the early first days) but the server never was the actual bottleneck. As for more precise tips:
- Create an on-memory (HEAP) table to hold ‘live’ accounting. When i say live i mean online sessions. You perform an INSERT on Accounting-Start and DELETE on Accounting-Stop. That way you have a really fast and small table which you use for all RADIUS operations as well as for double login detection. If you need to retain sessions between SQL/server restarts you could just create a normal table (instead of heap) and still see big performance gains instead of a large (and always growing) radacct table.
- You obviously still need to keep historical accounting. You can perform that near-online by using the detail file/radrelay mechanism. In accounting also log to a detail file and have a separate radius process read through that and log to a radacct table. That way your main radius server will only have to perform writes to the detail file (which should normally always succeed and be fast enough) and only the secondary radius server (the radrelay process) will have to deal with any problems with the radacct table. As a result you can perform house keeping operations (deleting old entries, statistics extraction) with the main accounting table (radacct) without disturbing your actual radius service (one of the most frequently faced problems i ‘ve had in my own installations). You can use this radrelay process with additional sql modules to also perform online statistics creation.
- Alternatively, you can just use the sql_log module, log the actual sql queries and run them through radsqlrelay instead of using radrelay if that suits yours needs.
- If your database supports views you can do the following: Only add entries in the radacct table on stop packets and create a combined view of the liveacct and radacct tables. That way only have to do one sql query per session on the radacct table instead of two and stil keep full accounting overview.
- Possibly the most frequent problem i ‘ve faced was dealing with double logins. Normally, FreeRADIUS uses an external process (checkrad) which has to be executed on every possible double login in order to query the access server and determine if the user is actually already logged on or if the session is stale. The problem is that this process is time consuming (depending on how long it takes for the access server to respond), involves numerous process creations and usually is a place for big headaches (involving the use of waitpid() calls from the radius server to wait for the checkrad process). You can eliminate all this by just moving double login detection offline. The radius server should just trust the accounting database and immediately reject any already logged-on user. The big step forward is that it also logs a corresponding entry on an on-memory double loginers table for this reject. That table can be checked by an outside processs running every 1 minute. This process can then call checkrad in order to determine if a user is actually online or not. If the session is found to be stale then a corresponding fake accounting stop can be sent to the radius service for the session to be cleared. That way double login detection (in the radius process) will always take a specific amount of time to complete and not depend on access server response and process creating time.
- Another thing that can help in keeping your online accounting table current with the actual sessions is to use accounting-updates. Setup a large accounting-update interval on your access servers (for instance, 4 hours) and add a column on the liveacct table called AcctUpdateTime. On accounting-update update the column with the time of the packet. Create an index for this column and have a separate process run every hour and scan for entries with accounting-update interval * 2 + a few minutes old. These entries are stale and after checking with the access server you can just send a fake accounting-stop so that they get closed on the accounting tables.
- I ‘ve had a few cases where the access server would send an accounting-start multiple times and that would end up being inserted multiple times in the accounting tables. In order to avoid this scenario you can do the following: Use the acct_unique module to create a unique id based on the accouting packet attributes. Create a unique key constaint for the AcctUniqueId column so that any INSERT for the same unique id will fail and fall back to the accounting_start_qeury_alt which is just an update.
- Try using modules like rlm_perl or rlm_python when creating outside scripts. That way you don’t have to wait for process initialization or depend on error prone functions like waitpid() on a threaded application. I ‘ve had a client achieve response time of a few ms with a rlm_perl script which had to perform various sql queries and processing (the client was running a VoIP application so the total response time was one very important factor).
- If you observe sql query slowdowns, 99% you have to check your indexing. Run EXPLAIN SELECT on your queries and add any needed indexes.
- When performing large deletes (like deleting old records from the accounting table) i ‘ve found that it’s always better (at least in MySQL) to do a global LOCK TABLE WRITE around the DELETE.