Monday 5 November 2012

Practical example of str.split's optional argument. Config file parser

Taking my little fun discovery for a ride. Used it to create a simple config file reader together with itertools.

    >>> lines = iter(open('config.cfg'))
    >>> import itertools
    >>> splitonce = lambda s:s.split('=',1)
    >>> trim = lambda s:s.rstrip('\n\r')
    >>> trimmed_lines_generator = itertools.imap(trim, lines)
    >>> split_lines_generator = itertools.imap(splitonce, trimmed_lines_generator)
    >>> settings = dict(split_lines_generator)
    >>> settings
    {'ConfigSetting3': '  Allow leading+trailing spaces!   ', 'eqsign': '=', 'ConfigItem2': 'Settle for no "=" signs!', 'Con
    figItem1': 'Configured'}
    >>>

This of course wouldn't work if I hadn't applied the optional argument to str.split, which I talked about in my previous post. It helps me split the file lines into keys and values by the equal sign, and not worry that the values might include equal signs of their own.

I am getting ahead of myself. In case you are not familiar with CFG files, here is the file I used in the above example. The settings file config.cfg:

    ConfigItem1=Configured
    ConfigItem2=Settle for no "=" signs!
    eqsign==
    ConfigSetting3=  Allow leading+trailing spaces!

Stripped out of the interpreter into cleaner code:

    from itertools import imap

    with open('config.cfg') as fp:
        lines = iter(fp)
        def splitonce(s):
            return s.split('=', 1) # split limit
        def trim(s):
            return s.rtrim('\n\r')

        trimmed_lines_generator = imap(trim, lines)
        split_lines_generator = imap(splitonce,
            trimmed_lines_generator)

        settings = dict(split_lines_generator)

I have used itertools.imap. Using iterators and chaining them together will make larger config files use less memory than if I used list(fp) to get a list of lines. This of course is more of a concern in larger config files.

No comments:

Post a Comment