Speed up web session access
Nov 29th, 2008 by paolo
I recently solved a problem common to many heavy used web application following an idea of a friend of mine.
The problem is how to efficently manage the sessions of a web application when the number of them is high (500,000 for example) and the use of them his quite frequent (˜50,000 pages/day in my case).
A common approach is to store web sessions in a database table where the session identifier and the expiry date are indexed, but due to the fact that each time a session is used (read) is also updated (written), while the number of them grows, the time spent to update the indexes associated to the web session table grows too. The result of this situation, relatively to the hardware you use, is that to update a session the application could take 2-5 seconds: a time that is not adequate to the world wide web.
So a better solution has to be applied or better we have to adapt the databse solution to the number of sessions that has to be managed.
I don't consider my solution particulary smart, but it's very simple, effective and could easily scale...
As I already said, the idea is very simple and is to spread the sessions uniformly over more than one table. The number of tables should be proportional to the sessions to manage.
The solution is built on the default implementation of the database handler for web sessions provided by the ADOdb library for PHP. The default implementation is based on the sessions/adodb-session.php file, that contains the database handler, and on a table with the following schema:
-
CREATE TABLE `sessions` (
-
`sesskey` varchar(32) BINARY NOT NULL DEFAULT '',
-
`expiry` int(11) UNSIGNED NOT NULL DEFAULT '0',
-
`expireref` varchar(64) DEFAULT '',
-
`data` longtext,
-
PRIMARY KEY (`sesskey`),
-
KEY (`expity`)
-
) TYPE=MyISAM
The bigger problem of this solution is to find a fast algorithm to spread the sessions uniformly over more tables. In my case, I solved this problem taking advange from the way the session identifiers are created. Infact, these indentifiers are generated calculating the md5 string from a random integer (see the code below) and I decided to use the first character of the identifier string to choose on which table persist the corresponding session.
To prepare the backend (a MySql database in my case), I created 16 tables like the one described above with the following names:
sessions_0, sessions_1, sessions_2, sessions_3, sessions_4, sessions_5, sessions_6, sessions_7, sessions_8, sessions_9, sessions_a, sessions_b, sessions_c, sessions_d, sessions_e, sessions_f.
Then I've modified the default ADOdb session handler to work on the right table taking as discriminant factor the identifier of the session to manage. The changes I made are very simple, as an example, the code below show the pached lines of the read and write functions:
-
/*
-
Slurp in the session variables and return the serialized string
-
*/
-
$conn =& ADODB_Session::_conn();
-
$data = ADODB_Session::dataFieldName();
-
$filter = ADODB_Session::filter();
-
$table = ADODB_Session::table() . '_' . $key[0];
-
...
-
-
/*!
-
Write the serialized data to a database.
-
If the data has not been modified since the last read(), we do not write.
-
*/
-
{
-
global $ADODB_SESSION_READONLY;
-
-
-
$clob = ADODB_Session::clob();
-
$conn =& ADODB_Session::_conn();
-
$crc = ADODB_Session::_crc();
-
$data = ADODB_Session::dataFieldName();
-
$debug = ADODB_Session::debug();
-
$driver = ADODB_Session::driver();
-
$expire_notify = ADODB_Session::expireNotify();
-
$filter = ADODB_Session::filter();
-
$lifetime = ADODB_Session::lifetime();
-
$table = ADODB_Session::table() . '_' . $key[0];
-
...
The patch to apply to the gc function is a little more complex because this function has to loop on all tables and delete old sessions: this patch is left as an exercise for the reader
For me, the solution described above rocks and I think it can also scale easily taking as descriminant factor 2 or even 3 characters from the session identifier and using the % math operator to limit the tables needed.

