Thursday, 6 December 2012

Wildcards in python (the fnmatch module)

Filtering and matching against wildcard patterns is really easy in python, using the fnmatch module. I found this out while looking in an article by Dan Carrol for a solution to a Django problem of mine.

By using the fnmatch function, one can match strings against a case-insensitive pattern. Use fnmatchcase for case-sensitive matching).

    >>> import fnmatch
    >>> fnmatch.fnmatch('example', 'exampl*')
    True
    >>> fnmatch.fnmatch('example', '*e')
    True
    >>> fnmatch.fnmatch('example', '*es')
    False
    >>> fnmatch.fnmatch('examples', '*es')
    True

There's also a filter function, to filter a list against a pattern.

    >>> files = ['file.py', 'file.txt']
    >>> fnmatch.filter(files, '*.py')
    ['file.py']

Besides *, ? and [] are also available. And that's it. It's a very simple syntax. A moderately powerful syntax which everybody can use.

    >>> fnmatch.fnmatch('a', '?')
    True
    >>> fnmatch.fnmatch('1', '[13579]')
    True
    >>> fnmatch.fnmatch('4', '[13579]')
    False
    >>>

The wildcard characters cannot be escaped with slashes. You can only escape with square brackets. For example:

    >>> fnmatch.fnmatch('Here is a star: *', '*\*')
    False
    >>> fnmatch.fnmatch('Here is a star: *', '* [*]')
    True

At first I thought this way of escaping was impractical, but because of that, you can use unescaped user input without the user ever getting unexpected results. And, if you want to get the original regex (for reusing later) you can always use translate:

    >>> as_regex = fnmatch.translate('m[ae]tch th?is!')
    >>> as_regex
    'm[ae]tch\ th.is\!\Z(?ms)'

Because this module uses regular expressions internally and allows to get the actual regular expression, I can use it as a less error-prone re in some scenarios.

I think this module is great. It gives me a bit less matching power than regular expressions, but then I can empower the user by asking them what and how they want to search for. You could arguably do this with regular expressions, but you would end up wasting time and money in documentation and customer support because regex is error-prone and dangerous.

Check out the docs for more information on this module.

1 comment: