Filtering Apache mod_log_forensic

The Apache access logs only get updated if a response is sent to the client. To get half made requests that result from dropped connections you need something like mod_log_forensic. Regrettably mod_log_forensic generates individual log entries on the order of a kilobyte. It also doesn't have timestamps. To address these I introduce a filtering program into the logging path.

It turns out that ForensicLog's pipe functionality is not very clearly documented and is inconsistently implemented between versions.  You can pipe to a program, but not to a pipeline in Apache 2.4, while Apache 2.2 lets you pipe to a pipeline, but won't work when piping to a bash script with a pipeline in it.  There are also some unexpected (by me) subtleties in terms of I/O buffering in either case.

Here are the key subtleties:

  1. ForensicLog needs fully-qualified paths to the programs being run
  2. Apache 2.4 won't interpret a pipeline and will instead send the pipe character as a positional parameter to the script leading to some odd behavior
  3. Python's I/O is buffered by default which will lead to logging being at risk of crash loss in memory

What I wanted to do was get a log with an ISO-8601 timestamp, the forensic ID, the request line, and the True-Client-IP header.

Here is what I came up with as a filter script: /etc/apache2/forensic_logger:

#!/usr/bin/python -u
import sys
from datetime import datetime

while 1:
  data = sys.stdin.readline()
  timestamp = datetime.utcnow()
  if data == '':
    break
  client_ip = ''
  fields = data.split('|')
  if len(fields) > 2:
  for field in fields[2:]:
    if field[:15] == 'True-Client-IP:':
      client_ip = field[15:]
  sys.stdout.write("{0}|{1}|{2}|{3}\n".format(
    timestamp.isoformat() + 'Z', fields[0], fields[1], client_ip)

It's a fairly simple solution, but you may have already noticed a subtlety: why "python -u"?  It turns out that without unbuffered I/O there is an indeterminate delay in data being output by this script getting to the next stage of the pipeline where we tee to write it to the log.  As a result the logs can wind up being buffered in memory as long as an entire process lifetime which means logs are very likely to be lost in a system failure.

The next piece is integration with Apache to write to the log file (Apache 2.4):

ForensicLog "|/etc/apache2/forensic_logrunner /var/log/apache2/sitename_forensic.log"

Or in Apache 2.2:

ForensicLog "|/etc/apache2/forensic_logger | tee --append /var/log/apache2/sitename_forensic.log"

And /etc/apache2/forensic_logrunner (not needed or usable in Apache 2.2):

#!/bin/bash
/etc/apache2/forensic_logger | tee --append $1