PHP class for parsing an Apache access.log file

I have written a PHP class that can be used to parse the Apache access.log file. The data can then be displayed in a custom manner that is easier to read or that only includes the data you really care about. You could also insert the data into a database or recordset so you could then sort or filter / search it.

A couple years ago I setup a linux box running Apache, PHP and MySQL. A couple weeks later MySQL and Apache crashed and would not start. After doing some troubleshooting and a little google-fu I discovered that the file /var/log/apache2/access.log file had grown so large it took up all available space root drive. It turns out that when you have a running web server, it tends to attract a lot of attention from shady individuals who will continually scan your server for vulnerabilities.

First, I just blocked all traffic through the firewall except only from a few trusted networks. This was not ideal for me as the purpose of setting up the Apache server was to be able to access it when away from home. This got me thinking of how to find out who was accessing my web server the most so I could block them. This lead me back to the access.log file. Trying to read this file manually and make sense of it was not going to be very likely.

I tried searching for PHP solutions to parsing and displaying the access.log file in a more readable way. I found a few paid solutions but I was looking for something free. I found a couple sources but none of the code seemed to work properly so I set out to write it myself. And now I am sharing it with the internet.


include("class.access_log_parser.php"); //include the class file

$apache_log_parser = new apache_log_parser(); // Create an apache log parser 

if ($apache_log_parser->open_log_file('/var/log/apache2/access.log')){ // Make sure it opens the log file 
  while ($line = $apache_log_parser->get_line()){ // while it can get a line 
  	$parsed_line = $apache_log_parser->format_line($line); // format the line 
        $ip = mysql_real_escape_string($parsed_line['ip']);
  	$identity = $parsed_line['identity'];
  	$log_user = $parsed_line['user'];
  	$log_date = $parsed_line['date'];
  	$log_time = $parsed_line['time'];
  	$timezone = $parsed_line['timezone'];
  	$method = mysql_real_escape_string($parsed_line['method']);
  	$path = mysql_real_escape_string($parsed_line['path']);
  	$protocol = mysql_real_escape_string($parsed_line['protocol']);
  	$status = mysql_real_escape_string($parsed_line['status']);
  	$log_bytes = mysql_real_escape_string($parsed_line['bytes']);
  	$referer = mysql_real_escape_string($parsed_line['referer']);
  	$agent = mysql_real_escape_string($parsed_line['agent']);
  	$clean_line = mysql_real_escape_string($line);
    //do stuff with the data here
    echo " $loopcount: $ip $log_date $log_time $method $path $protocol $status <br />";

// close the log file 

// empty the file (optional)

Github link: