Monkey-patching the Python logging module

Saturday February 08, 2014

I took an interesting foray into the Python standard library the other day. It was the first time that I had dug into CPython's source code myself, so I figured my experience might be interesting to others considering doing the same thing.

Logging is a great thing - I'll just state that flat out. And Python's logging module is pretty darn good as well. An easy-to-use API, easy switching between different levels, output to stdout or to a file, and so on. However, there's one issue that really bugs me about this module. It uses %-format strings instead of the new '{}' style strings. This isn't a massive issue, but it's a thorn in my side when I want to write nice, modern Python code, and then having to switch to these old style format strings. It furthermore means I need to remember all those %05f codes. It's also somewhat sad that the situation stayed the same way in Python3. Quoting from the Python docs:

This method of string formatting is the new standard in Python 3, and should be preferred to the % formatting described in String Formatting Operations in new code.

And yet, in Python3, I still get a TypeError when I do the following command.

logging.error("hello {}", "world")

So, let's not settle for second rate. Is there a way that we can monkey-patch Python's logging module to accept {}-style format strings? (Yes.)

Browsing Python source code

Well, where do we start? I first needed to figure out where that message actually gets created. Let's first pull up the CPython source code. Googling for 'python source code' brought me to the CPython mercurial repo. I could have cloned it to use ack/grep, but I figured that'd take a while. There is a very convenient search bar in the top corner. Unfortunately, this doesn't do code search that local grep would do, but I figured, well, let's search for logging.py and see where that takes us.

That gave me a list of recent commits that touched files like logging.py. There's a file called test_logging.py though and that gave me a few false results. Nonetheless, clicking through these commits leads me to a file Lib/logging/__init__.py. Bingo, sounds exactly like what I'm looking for, since __init__.py denotes the root package definition of a Python library.

(For those following along at home, I specifically looked at this commit).

Clicking through that file's link only brought me to the file at that revision. Let's look at the latest version of this file. There's a gotcha here though: HEAD in Python is Python3, and I'm working on Python2. So I hit branches, click on the 2.7 branch, and only then I click Browse and navigate to that file. And look at that, I'm now looking at 2000 lines of genuine Python source code. I love Python, so it's actually pretty exciting!

Question though, is where do we start looking? Well, from the logging API I use, there is a class method info, so I know that is probably defined somewhere in here. I search for the most complete string of that definition, def info(, and that gives me 3 results on the page. Great.

Okay, one of those definitions calls a function _log(...), so let's search for def _log(. This function makes a record (makeRecord), and then handles it. Okay, I check out def makeRecord. Doesn't look like it actually does much, so go to def handle( instead. Gets a little confusing here, since there are a few def handle(s, but def handle( is conveniently right below that _log definition. Calls callHandlers. Calls handle on what is probably the Handler class.

There are a few instances of each of these definitions, but so long as you double check what class you're in, it should be easy enough to follow such a chain of logic. They call it "emit" a record, so let's look at the emit function referenced in this function. That calls format (note: Gotta be a little careful here. It's calling self.format, which must be a function defined for this class. Not str.format - I wish.)

Now, this format function is a bit tricky. It seems a argument record is the thing that gets printed out, presumably corresponding to some kind of Record class, but there's also some string formatting (with self._fmt) going on within this function itself. The comments help explain the action a bit. From reading that, it sounds like all the user-supplied arguments get formatted by the record, and the _fmt part is internal. (I quickly grepped for the definition of _fmt, and those results confirmed the hypothesis.)

So, now we check out getMessage(). And boom! There's some Unicode handling, and then there's this block of code.

if self.args:
    msg = msg % self.args
return msg

First off, I'm blown away that it took that many levels to get to the formatting. I'm not sure if this is Good, Modular, (Enterprise-Grade?) Software Design, or if this could be made much, much, simpler. Python's been around about as many years as I've been on this planet (although logging's only been around since Python 2.3), so I'll leave the benefit of the doubt with the logging module.

Anyway, after reading the source code, we finally found where the problem was.

Update: After doing all that reading, I discovered at some point after the fact that there is a really fast way to get to the exact same place. Open up the Python terminal, and call

logging.error('asdflkj', 'hello')

Boom - stack trace! And the bottom function in that stack trace is that getMessage that we just found. Nice trick that conveniently short-curcuits about 5 paragraphs of work. Lesson learned: stack traces are good for finding where errors occur, so if you want to find where something occurs, throw an error, and then look at the stack trace!

Making the modifications

Making the modification itself is actually fairly trivial. The whole of the function in CPython source looks like the following:

def getMessage(self):
    """
    Return the message for this LogRecord.

    Return the message for this LogRecord after merging any user-supplied
    arguments with the message.
    """
    if not _unicode: #if no unicode support...
        msg = str(self.msg)
    else:
        msg = self.msg
        if not isinstance(msg, basestring):
            try:
                msg = str(self.msg)
            except UnicodeError:
                msg = self.msg      #Defer encoding till later
    if self.args:
        msg = msg % self.args
    return msg

I'm going to leave the unicode stuff alone, and just modify a single line.

if self.args:
    msg = msg.format(*self.args)
return msg

And now, any code I write, I'm free to use the {-style formatting strings. I created an example gist.

I could create monkeypatchlog.py out of this and import that file whenever I wanted this functionality, and I could also do some kind of style detection to try to figure out if the string is % or { formatted (for compatibility with existing code, let's say.)

Contributing Back

Okay, so we monkey-patched to allow for logging on our end, but what about making this modification accessible to others? 2.x is maintenance, so the chance of this feature getting backported is minimal. Maybe we can introduce this into Python3. In fact, I was curious, and frankly a bit surprised that this behavior hadn't been changed in Python3. So I dug into the logging module in the HEAD branch. And it looks like there is actually support for a keyword style. (And looking at the Python3 docs confirms this.) However, the following still does not work in Python3. (Python 3.3 or Python3.4beta2)

import logging
logging.basicConfig(style="{")
logging.error("hello {}", "world") # Causes an error
logging.error('hello {foo}', {'foo':'1'}) # Prints "%(levelname)s:%(name)s:%(message)s"

In fact, unless I'm mistaken, the style attribute corresponds to the root message (with levelname, time, etc.), not a specific log message. But, that still seems like a problem in user experience (I'd call it a bug) in Python3's logging module. An excellent opportunity to contribute a patch back to Python. I'll save explaining that for another post.

Happy Python wrangling!


Want to receive similar articles? (No spam, promise!)