Here are a few basic guidelines on things to keep an eye for when seting up a large scale radius service (most are FreeRADIUS specific):

First of all, when you see performance problems with your radius service always, ALWAYS blame the database first. I ‘ve seen FreeRADIUS crash on me, have memory leaks (in the early first days) but the server never was the actual bottleneck. As for more precise tips:

  • Create an on-memory (HEAP) table to hold ‘live’ accounting. When i say live i mean online sessions. You perform an INSERT on Accounting-Start and DELETE on Accounting-Stop. That way you have a really fast and small table which you use for all RADIUS operations as well as for double login detection. If you need to retain sessions between SQL/server restarts you could just create a normal table (instead of heap) and still see big performance gains instead of a large (and always growing) radacct table.
  • You obviously still need to keep historical accounting. You can perform that near-online by using the detail file/radrelay mechanism. In accounting also log to a detail file and have a separate radius process read through that and log to a radacct table. That way your main radius server will only have to perform writes to the detail file (which should normally always succeed and be fast enough) and only the secondary radius server (the radrelay process) will have to deal with any problems with the radacct table. As a result you can perform house keeping operations (deleting old entries, statistics extraction) with the main accounting table (radacct) without disturbing your actual radius service (one of the most frequently faced problems i ‘ve had in my own installations). You can use this radrelay process with additional sql modules to also perform online statistics creation.
  • Alternatively, you can just use the sql_log module, log the actual sql queries and run them through radsqlrelay instead of using radrelay if that suits yours needs.
  • If your database supports views you can do the following: Only add entries in the radacct table on stop packets and create a combined view of the liveacct and radacct tables. That way only have to do one sql query per session on the radacct table instead of two and stil keep full accounting overview.
  • Possibly the most frequent problem i ‘ve faced was dealing with double logins. Normally, FreeRADIUS uses an external process (checkrad) which has to be executed on every possible double login in order to query the access server and determine if the user is actually already logged on or if the session is stale. The problem is that this process is time consuming (depending on how long it takes for the access server to respond), involves numerous process creations and usually is a place for big headaches (involving the use of waitpid() calls from the radius server to wait for the checkrad process). You can eliminate all this by just moving double login detection offline. The radius server should just trust the accounting database and immediately reject any already logged-on user. The big step forward is that it also logs a corresponding entry on an on-memory double loginers table for this reject. That table can be checked by an outside processs running every 1 minute. This process can then call checkrad in order to determine if a user is actually online or not. If the session is found to be stale then a corresponding fake accounting stop can be sent to the radius service for the session to be cleared. That way double login detection (in the radius process) will always take a specific amount of time to complete and not depend on access server response and process creating time.
  • Another thing that can help in keeping your online accounting table current with the actual sessions is to use accounting-updates. Setup a large accounting-update interval on your access servers (for instance, 4 hours) and add a column on the liveacct table called AcctUpdateTime. On accounting-update update the column with the time of the packet. Create an index for this column and have a separate process run every hour and scan for entries with accounting-update interval * 2 + a few minutes old. These entries are stale and after checking with the access server you can just send a fake accounting-stop so that they get closed on the accounting tables.
  • I ‘ve had a few cases where the access server would send an accounting-start multiple times and that would end up being inserted multiple times in the accounting tables. In order to avoid this scenario you can do the following: Use the acct_unique module to create a unique id based on  the accouting packet attributes.  Create a unique key constaint for the  AcctUniqueId  column  so that any INSERT for the same unique id will fail and fall back to  the accounting_start_qeury_alt which is  just an update.
  • Try using modules like rlm_perl or rlm_python when creating outside scripts. That way you don’t have to wait for process initialization or depend on error prone functions like waitpid() on a threaded application. I ‘ve had a client achieve response time of a few ms with a rlm_perl script which had to perform various sql queries and processing (the client was running a VoIP application so the total response time was one very important factor).
  • If you observe sql query slowdowns, 99% you have to check your indexing. Run EXPLAIN SELECT on your queries and add any needed indexes.
  • When performing large deletes (like deleting old records from the accounting table) i ‘ve found that it’s always better (at least in MySQL) to do a global LOCK TABLE WRITE around the DELETE.
Advertisements