3 Parser for Apache log files. This is a port to python of Peter Hickman's
4 `Apache::LogEntry Perl module`__.
6 .. __: http://cpan.uwinnipeg.ca/~peterhi/Apache-LogRegex
8 Takes the `Apache logging format`__ defined in your ``httpd.conf`` and
9 generates a regular expression which is used to a line from the log
10 file and return it as a dictionary with keys corresponding to the
11 fields defined in the log format.
13 .. __: http://httpd.apache.org/docs/current/mod/mod_log_config.html#formats
15 Import libraries used in the example:
17 >>> import apachelog.parser, sys, StringIO, pprint
19 You should generally be able to copy and paste the format string from
20 your Apache configuration, but remember to place it in a raw string
21 using single-quotes, so that backslashes are handled correctly.
23 >>> format = r'%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'
24 >>> p = apachelog.parser.Parser(format)
26 Now open your log file. For this example, we'll fake a log file with
29 >>> #log_stream = open('/var/apache/access.log')
30 >>> log_stream = StringIO.StringIO('\n'.join([
31 ... '192.168.0.1 - - [18/Feb/2012:10:25:43 -0500] "GET / HTTP/1.1" 200 561 "-" "Mozilla/5.0 (...)"',
34 >>> for line in log_stream:
36 ... data = p.parse(line)
38 ... print("Unable to parse %s" % line.rstrip())
40 ... pprint.pprint(data)
45 '%r': 'GET / HTTP/1.1',
46 '%t': '[18/Feb/2012:10:25:43 -0500]',
49 '%{User-Agent}i': 'Mozilla/5.0 (...)'}
50 Unable to parse junk line
52 The return dictionary from the parse method has values for each
53 directive in the format string.
55 You can also re-map the field names by subclassing (or clobbering) the
58 This module provides three of the most common log formats in the
61 >>> # Common Log Format (CLF)
62 >>> p = apachelog.parser.Parser(apachelog.parser.FORMATS['common'])
63 >>> # Common Log Format with Virtual Host
64 >>> p = apachelog.parser.Parser(apachelog.parser.FORMATS['vhcommon'])
65 >>> # NCSA extended/combined log format
66 >>> p = apachelog.parser.Parser(apachelog.parser.FORMATS['extended'])
68 For some older notes regarding performance while reading lines from a
69 file in Python, see `this post`__ by Fredrik Lundh. Further
70 performance boost can be gained by using psyco_.
72 .. __: http://effbot.org/zone/readline-performance.htm
73 .. _psycho: http://psyco.sourceforge.net/
75 On my system, using a loop like::
77 for line in open('access.log'):
80 was able to parse ~60,000 lines / second. Adding psyco to the mix,
81 up that to ~75,000 lines / second.
85 __license__ = """Released under the same terms as Perl.
86 See: http://dev.perl.org/licenses/
88 __author__ = "Harry Fuecks <hfuecks@gmail.com>"
90 "Peter Hickman <peterhi@ntlworld.com>",
91 "Loic Dachary <loic@dachary.org>",
92 "W. Trevor King <wking@drexel.edu>",