distlog.pl

by Anthony Tonns

Summary:
Have a logfile that spans multiple days and is in common log format? Want to break it down into daily files? distlog.pl is your script. Using the magic of perl and regular expressions, it will take files named access.old and break them down into files named YYYYMMDD.access.

Syntax:
distlog.pl file1.log [file2.log file3.log ...]
Will create files named YYYYMMDD.file1 [YYYYMMDD.file2, YYYYMMDD.file3] etc.

Limitations:

Notes:
Sometimes around midnight server logs can get funny. If a request that takes a long time to process comes in JUST before midnight and a request that takes a short time to process comes in JUST after midnight - you might get lines in the log that seem out of order, like:

207.203.100.134 - - [24/Dec/2002:23:59:50 -0400] "GET /list.html HTTP/1.1" 200 1048576 "-" "Elfzilla/1.2"
207.203.100.134 - - [24/Dec/2002:23:59:59 -0400] "GET /list.html HTTP/1.1" 200 1048576 "-" "Elfzilla/1.2"
207.203.100.134 - - [25/Dec/2002:00:00:01 -0400] "GET /list.html HTTP/1.1" 200 1048576 "-" "Elfzilla/1.2"
207.203.100.134 - - [25/Dec/2002:00:00:04 -0400] "GET /list.html HTTP/1.1" 200 1048576 "-" "Elfzilla/1.2"
64.12.181.249 - - [24/Dec/2002:23:59:57 -0400] "GET /cgi-bin/check-list.cgi?count=twice HTTP/1.1" 200 65535 "-" "SantaSled/2.0"
207.203.100.134 - - [25/Dec/2002:00:00:15 -0400] "GET /list.html HTTP/1.1" 200 1047552 "-" "Elfzilla/1.2"
distlog.pl can handle this as when dates change, it always opens the new file for append. Thus, previous data isn't clobbered, and the lines are appended to the end of the log (as it should be). Remember that some lines may still be not in date/time order, but the individual files should have lines with all the same date.

Source:
Here is the perl source to distlog.pl.
Here is the GPL, which is the license for 'distlog.pl'

Etc:
For those I've worked with at women.com, this script is infamous for it's comment:

# i am not a robust script
# please do not put me in a crontab
I hope that you take that advice to heart. :)


Tonns.org Homepage