Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Tuesday, 14 May 2013

Python's new Enum class

I used to wish to have an enumeration in Python, like you have in other languages.

On May 10th, GvR accepted PEP 435, which proposes the addition of an Enum type to the standard library.

Before this, us python coders would need to use plain classes for enums, which doesn't work so well. We can't get enum names by value, for example.

It follows that it's not possible to create a list of 2-tuples for use with Django models. We'd need to have a class mapping our choice names to their values, but you can't create the good old "choices" structure, so that's pretty useless. However, since our new enums are iterable we can:

    class ChoiceClass(IntEnum):
        foo = 1
        bar = 2
        baz = 3

    CHOICES = [(e.value, e.name) for e in ChoiceClass]

Creating your enumerations

By inheriting from Enum, you can create your own enumerations. If all your enumeration values are supposed to be of the same type, you should inherit Enum as well as that type.

    class Numbers(int, Enum):  # you can also use IntEnum
        one = 1
        two = 2
        three = 3

    class Text(str, Enum):
        one = 'one'
        two = 'two'
        three = 'three'

Using your enumerations

By iterating over your enum class, you get the enumeration members, which can be queried for their name or value.

    print(' '.join(['{}: {}'.format(number.name, number.value) for number in Numbers]))

Internally, there is some metaclass magic going on, to allow us to succintly declare these enums. The class comes with a few facilities, like __iter__ as I showed above, __call__ which gets enumeration items by value, and __getitem__ which gets them by name.

You should understand that enumeration items are actually instances of your enumeration class. Which allows us to say isinstance(Text.one, Text).

You can get your enumeration items in several ways.

  • As an attribute: Numbers.one, the Java way
  • By name: Numbers['one'], a tad prettier than using getattr
  • By value: Numbers(1), as a constructor

In a way, the third syntax reminds me of how we convert values using their constructors in python, for instance, using str or int to convert existing objects to string or integer values.

You can start using these enums right away. There was talk of porting this back to python 2 on the python-dev mailing list, but right now the code is for python 3 only. I'm going to convert some Django code to use these enums later, because I have three declarations for each Choice I need (SOMETHING_CHOICES, SOMETHING_DICT, SOMETHING_REVERSE), which is just backwards.

You can grab a copy of this module at bitbucket. Beware, because this was only put up for PEP evaluation. You'll want to wear a helmet and keep in mind that this is not stable or supposed to be used in production code. Although Guido has accepted it at the time of writing, it may still be subject to change, since the PEP is not in the "final" status.

That said, grab the "ref435" module from that package and try it out.

    >>> from ref435 import Enum
    >>> class Enumeration(int, Enum):
    ...     potatoes = 1
    ...     apples = oranges = 2
    ... 
    >>> Enumeration.apples is Enumeration.oranges
    True

You can look for reference in the PEP.

See you next time!

Sunday, 31 March 2013

Django-webtest

When testing my views in a Django application, I always use django-webtest. This is a Django binding for a library from paste, which simply talks to a web application using wsgi, and acts like a browser in a very basic manner.

For example: submitting forms works as expected and sends the data contained in the inputs inside such form, in the method defined in <form method=""> and to the url defined in <form action="">, click()ing links takes you to the url in their href attribute, etc.

These are tests which will actually tell you if your templates are broken. So you are not simply testing your views. You are testing the template html as well. If you forget to add that {÷ csrf_token ÷} to your form, the test will fail. If one of the designers accidentally removes your "edit profile" link, you will know that.

An obviously missing feature is JavaScript support. So, if your app doesn't work without it, you won't be able to use this to the fullest.

Your view testing code with WebTest will look like this:

    def test_delete_object(self):
        object = construct_model_instance_somehow()
        owner = object.owner

        # go to the object's page, logged in as the owner
        details_page = self.app.get(object.get_absolute_url(), user=owner)

        # click delete link
        delete_page = details_page.click('delete')

        form = delete_page.form

        form.submit().follow()

        self.assertFalse(Object.objects.filter(
            some_data='you know').exists())

Every .click() call is an assertion that the page loads with no errors, and returns the response. You can actually chain click calls, even though I don't find it so useful myself. If you want a BeautifulSoup to check that your HTML response contains the correct links, just install BeautifulSoup (3.x for now) in your environment and access response.html to get a soup. Not to mention .showbrowser(), which fires up your browser to show you the HTML directly. A huge time saver for debugging.

In conclusion, very useful stuff. You should use it.

Monday, 25 March 2013

Pip requirements.txt tip

Good guy requirements.txt

Everybody loves requirements.txt files, right? You can list your dependencies, and then together with pip and virtualenv reproduce your application's environment complete with all dependencies in several machines.

This file is also known as pipscript.txt. in case you don't know, requirements.txt is a file containing a list of packages in this format:

    Django==1.4.4
    Fabric>=1.0
    OtherPackage==6.0
    -e git://some.git/repo/containing/package

It's very simple to use. You just have to use the command pip install -r <requirements file>. For this reason, it makes it very easy to automate a deployment step.

Splitting your requirements

In case your application needs to split requirement files for some reason, you can do that! Just add this line in a pip file:

    -r some/requirements.txt

To include another requirements file.

The requirements listed in the included file will get installed too.

I found out about this trick in this heroku doc page . It makes a lot of sense, seeing that requirements.txt represents a pip install command for each line. We get -e, why not -r?

This is very useful for Django, where you need a different package set in production and development.

Saturday, 23 February 2013

Python discovery: urlparse - parse URLs with ease

I had a problem during testing: I wanted to test the query part of an url generated by a function. I was getting my regex hat out, when I started to wonder if the Python standard library included something to do what I sought to do.

A quick google search for "python parse url" did the trick. I found out about the urlparse module, which was clearly made for this purpose.

Example of parsing urls with python using the urlparse module:

    >>> from urlparse import urlparse
    >>> urlparse('http://www.example.com/pth?q=123#frag') 
    ParseResult(scheme='http', netloc='www.example.com', path='/pth', params='', query='q=123', fragment='frag')
    >>> parse_result = _
    >>> parse_result.scheme
    'http'
    >>> parse_result.query
    'q=123'
    >>> parse_result.fragment
    'frag'
    >>>

Lesson learned: There are many more things in the Python standard library than one would dare imagine.

Thursday, 31 January 2013

Python Discovery: Locals() and Globals()

Today I will mess with locals() and globals()

You may have encountered locals() and globals() before.

Here is what they do:

    >>> a = 3
    >>> def b():
    ...     c = 4
    ...     print 'locals', locals()
    ...     print 'globals', globals()
    ...
    >>> b()
    locals {'c': 4}
    globals {'a': 3, 'b': <function b at 0x021984B0>, '__builtins__': <module '__bui
    ltin__' (built-in)>, '__package__': None, '__name__': '__main__', '__doc__': Non
    e}
    >>>

They return a dictionary containing the local or the global scope.

According to this document, "scopes in Python do not nest!". At first it seemed like the python interpreter had an instance of the number two, but I think it actually counts as One local scope, and One extra scope to scan if the variable is not in this local scope.

It seems like manipulating globals() actually manipulates the names you have in your hands, directly. I liked this a lot, and was dumbfounded when I found out about it.

    >>> def b():
    ...     globals()['can_i_inject_a_name_plz'] = 'yes lol'
    ...     print can_i_inject_a_name_plz
    ...
    >>> b()
    yes lol

Yes, I was dumbfounded. I was fascinated. And then I found this:

    >>> def b():
    ...     locals()['injecting'] = 'yes'
    ...     print injecting
    ...
    >>> b()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in b
    NameError: global name 'injecting' is not defined

So I can't inject a name into the local scope. Well, I still have the global scope to play with. I also found that I can do this in the global scope of the interactive interpreter:

    >>> locals()['a'] = 4
    >>> a
    4

It seems like the Python interpreter still has his own secrets.

I decided to try to override variables below the current call stack. Manipulating the variables within other scopes granted me the ultimate power, and made my brain flesh out hundreds of ideas reeking of bad code practise and stuff which very few people, if any, had even used before in python:

    >>> globals()['name'] = 3
    >>> def g():
    ...     name = 6
    ...     print name
    ...
    >>> g()
    6

But no. My ideas crumbled. I can't access or change arbitrary names inside the functions I call. A small disappointment, followed by an "Eureka!" moment:

What if the function asked for it using the global keyword?

    >>> def g():
    ...     global locally_defined
    ...     try:
    ...         locally_defined
    ...     except NameError:
    ...         locally_defined = 'Local value'
    ...     print locally_defined
    ...
    >>> g()
    Local value
    >>>
    >>> globals()['locally_defined'] = 'Crazy value' # monkey patch!
    >>>
    >>> g()
    Crazy value
    >>>
    >>> del globals()['locally_defined'] # unpatch!

Or, even better, with the same effect.

    >>> def f():
    ...     global a
    ...     try:
    ...         a
    ...     except NameError:
    ...         a = 'local_value'
    ...     print a
    ...
    >>>

This is all nice, but what I achieved looks a lot like the intrincate and dark techniques of "argument passing" and "default arguments". I really don't want to lurk there.

Today's discovery was a failure. I learned much, but none of it is actually usable in real code. Sometime later, I will learn something better.

Saturday, 15 December 2012

Python interpreter: Underscore

I noticed that I used this trick in a previous post, and I feel like I should post about it since it's so unspoken of, and useful at the same time.
When in a CPython interpreter session (your plain old python session), the value given to "_" is always the last value automatically printed by the interpreter as the result of an expression.
    >>> 1+1
    2
    >>> _
    2
    >>> x = 3+5
    >>> _
    2
    >>> print "another value"
    another value
    >>> _
    2
As you can see, "_" is always the value that the interpreter automatically prints. If you use variable assignment, you can prevent the shortcut from gaining a value when you want to preserve it for later.
This can be very useful when you don't want to type again the full expression you used to obtain a certain value when you want to perform operations on such value. For example, in a Django shell, performing regex searches, testing generators and lists( list(_) ), etc.
Of course you can always use the Up arrow on your keyboard to repeat the last line of code and then edit it, if your shell supports history. But sometimes you really can't, and it's more practical to do it this way.
    >>> [1,2,3] + [4,5,6]
    [1, 2, 3, 4, 5, 6]
    >>> len([1,2,3] + [4,5,6])
    6
    >>> [1,2,3] + [4,5,6]
    [1, 2, 3, 4, 5, 6]
    >>> len(_)
    6
But don't worry about using "_" as a variable name inside the interpreter shell. It will simply override the behaviour described above.
    >>> _=100
    >>> _
    100
    >>> 4
    4
    >>> _
    100
    >>>
Trivia: the "underscore shortcut" doesn't show up in locals() or dir()
    >>> locals()
    {'__builtins__': , '__name__': '__main__', '__d
    oc__': None, '__package__': None}
    >>> dir()
    ['__builtins__', '__doc__', '__name__', '__package__']
    >>>

Thursday, 13 December 2012

A useful Django Template tag hidden

https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#pprint

It seems to be really useful for debugging and creating those initial quick and dirty templates.

I often want to stay off my template code while programming view code. When I create templates, I make them minimal and just want visual confirmation of success and not creating a full template.

Thursday, 6 December 2012

Wildcards in python (the fnmatch module)

Filtering and matching against wildcard patterns is really easy in python, using the fnmatch module. I found this out while looking in an article by Dan Carrol for a solution to a Django problem of mine.

By using the fnmatch function, one can match strings against a case-insensitive pattern. Use fnmatchcase for case-sensitive matching).

    >>> import fnmatch
    >>> fnmatch.fnmatch('example', 'exampl*')
    True
    >>> fnmatch.fnmatch('example', '*e')
    True
    >>> fnmatch.fnmatch('example', '*es')
    False
    >>> fnmatch.fnmatch('examples', '*es')
    True

There's also a filter function, to filter a list against a pattern.

    >>> files = ['file.py', 'file.txt']
    >>> fnmatch.filter(files, '*.py')
    ['file.py']

Besides *, ? and [] are also available. And that's it. It's a very simple syntax. A moderately powerful syntax which everybody can use.

    >>> fnmatch.fnmatch('a', '?')
    True
    >>> fnmatch.fnmatch('1', '[13579]')
    True
    >>> fnmatch.fnmatch('4', '[13579]')
    False
    >>>

The wildcard characters cannot be escaped with slashes. You can only escape with square brackets. For example:

    >>> fnmatch.fnmatch('Here is a star: *', '*\*')
    False
    >>> fnmatch.fnmatch('Here is a star: *', '* [*]')
    True

At first I thought this way of escaping was impractical, but because of that, you can use unescaped user input without the user ever getting unexpected results. And, if you want to get the original regex (for reusing later) you can always use translate:

    >>> as_regex = fnmatch.translate('m[ae]tch th?is!')
    >>> as_regex
    'm[ae]tch\ th.is\!\Z(?ms)'

Because this module uses regular expressions internally and allows to get the actual regular expression, I can use it as a less error-prone re in some scenarios.

I think this module is great. It gives me a bit less matching power than regular expressions, but then I can empower the user by asking them what and how they want to search for. You could arguably do this with regular expressions, but you would end up wasting time and money in documentation and customer support because regex is error-prone and dangerous.

Check out the docs for more information on this module.

Saturday, 1 December 2012

Phasing subtitles using python

I was trying to watch a film, but the subtitles I had were 1 second behind the actors' lines.
Not content with finding other subtitles on the web, I opened up the python interpreter and loaded the file into lines.
A little code followed

    import datetime
    import re

    lines = open('subs.srt', 'rb').read().splitlines()

    with open('out.srt', 'wb') as outp:
        for line in lines:
            if subtime.findall(line):
               time = datetime.datetime(1,1,1,*map(int, line[:8].split(':')))
               time += datetime.timedelta(seconds=1)
               outp.write('%02d:%02d:%02d%s' % (
                   time.hour, time.minute, time.second, line[8:]))
            else:
               outp.write(line + ' ')

It was just a few lines of code, showing off quite well a lot of the capabilities of python I love most. Text processing is always a cinch.

Explaining the code

The format of the subtitles was:
    [blank line]
    ID
    hh:mm:ss,ms: [text]
This explains why I had to check if the regex findall returned a match. The regex was ^\d\d:\d\d:\d\d.
When this regex found a line with subtitle time written on it, I did the reading, updating and writing the time. Otherwise, I just copied the line verbatim to the output file.
I simply cut the line using slice syntax. [:8] and [8:] got me the line's contents up to the seventh character, and from the eight character onwards, respectively.
I used the first seven characters of the line, split by the colon : character, as arguments to the datetime.datetime constructor, in true functional fashion. I had to map a call to int to turn all these number strings into integers.
To update the seconds correctly, I had to create an instance of datetime.timedelta with seconds set to 1 (which was my estimate of how off the time was), and add it the the time I got from the split string.
Having forgotten how to do date formatting, I just used string formatting against time.hour, time.minute and time.second, and joined in the rest [:8] of the string in the same operation.
It was quite fun, but my friends eventually grew impatient so in the end no film was watched.

Tuesday, 13 November 2012

Dividing by zero for fun and profit

It's monday, and I'm back from vacation, feeling like I'm wasting my time and inevitably falling victim to the great cycle of life, money and everything. while (42) { }. Fortunately, I have sweet, sweet sarcasm on my side.

Anyway, this post is supposed to be about dividing by zero.

In python, it's a great way to find if a certain code path deep inside your call stack is really getting called, and when. You get to write one line which results in a noisy exeption, so your pain and confusion is properly turned into a Traceback.

    class ReturnStatement(Statement):
        def __init__(self, returnee):
            1/0
            super(ReturnStatement, self).__init__(returnee, '<return>')

"Oh. This time the exception never fired. I was sure this was supposed to be executed."

That's the kind of thought you are supposed to get, or something among the lines of:

"There, the exception.. Then why is this @!$# method not working if it's being called?"

Anyway, you get a good troubleshooting test just for typing three characters and a new line. Good bargain!

Of course, in JavaScript it's useless.

Yields Infinity. That's a discussion for another point in time. Maybe. I may never be inclined again to speak of that matter. Hours and hours of agony because of a number having a completely unpredictable number. Ugh.

In compiled languages, it's mostly useless, too.

Mondays.

Wednesday, 7 November 2012

AAAAAAAAAA

A AAAA AAAAAA AA AAAA AAAA AAA AAAAAAAAAAAAAA.

    >>> def AAAAAAAAAA(s):
    ...     import re
    ...     return re.sub('\w', 'A', s)
    ...
    >>> AAAAAAAAAA('I want to know what Guido eats for breakfast.')
    'A AAAA AA AAAA AAAA AAAAA AAAA AAA AAAAAAAAA.'
    >>> AAAAAAAAAA("I have found a new way to encode messages. But it's irreversible.")

    "A AAAA AAAAA A AAA AAA AA AAAAAA AAAAAAAA. AAA AA'A AAAAAAAAAAAA."
    >>> AAAAAAAAAA("You have a way with words.")
    'AAA AAAA A AAA AAAA AAAAA.'
    >>>

AAAAA AA AAA AA AAAAAAAAAAAAAA AAAA.

And a thousand pointless points are given to whoever decodes this post.

Monday, 5 November 2012

Practical example of str.split's optional argument. Config file parser

Taking my little fun discovery for a ride. Used it to create a simple config file reader together with itertools.

    >>> lines = iter(open('config.cfg'))
    >>> import itertools
    >>> splitonce = lambda s:s.split('=',1)
    >>> trim = lambda s:s.rstrip('\n\r')
    >>> trimmed_lines_generator = itertools.imap(trim, lines)
    >>> split_lines_generator = itertools.imap(splitonce, trimmed_lines_generator)
    >>> settings = dict(split_lines_generator)
    >>> settings
    {'ConfigSetting3': '  Allow leading+trailing spaces!   ', 'eqsign': '=', 'ConfigItem2': 'Settle for no "=" signs!', 'Con
    figItem1': 'Configured'}
    >>>

This of course wouldn't work if I hadn't applied the optional argument to str.split, which I talked about in my previous post. It helps me split the file lines into keys and values by the equal sign, and not worry that the values might include equal signs of their own.

I am getting ahead of myself. In case you are not familiar with CFG files, here is the file I used in the above example. The settings file config.cfg:

    ConfigItem1=Configured
    ConfigItem2=Settle for no "=" signs!
    eqsign==
    ConfigSetting3=  Allow leading+trailing spaces!

Stripped out of the interpreter into cleaner code:

    from itertools import imap

    with open('config.cfg') as fp:
        lines = iter(fp)
        def splitonce(s):
            return s.split('=', 1) # split limit
        def trim(s):
            return s.rtrim('\n\r')

        trimmed_lines_generator = imap(trim, lines)
        split_lines_generator = imap(splitonce,
            trimmed_lines_generator)

        settings = dict(split_lines_generator)

I have used itertools.imap. Using iterators and chaining them together will make larger config files use less memory than if I used list(fp) to get a list of lines. This of course is more of a concern in larger config files.

Saturday, 3 November 2012

Python's string formatting syntax

As you are surely aware of, you can use python's % operator to do string formatting. It's part of the reasons why the language is so good at text processing.

The left operand is the format string, the right operand is a single format argument, or an iterable of format arguments. In the format string, you can use %s to insert a string, %d to insert an integer, %f for a float, etc. It's much like C's printf family of functions.

    >>> '%s %d' % ('a string', 123)
    'a string 123'

This you almost surely know. You might not know that the string can take a %r argument which calls repr on the arguments. Now that's useful!

    >>> 'a repr: %r' % ['', ""]
    "a repr: ['', '']"
    >>> 

Or that the format parameters can be in a dictionary.

    >>> '%(a_string)s %(an_int)d' % {'a_string':'a string', 'an_int': 123}
    'a string 123'
    >>> 

If the parameters were passed as a dictionary, the format string will not raise exceptions for extra parameters. So you can use locals() on your (trusted) format string, and format it easily. In a class method, self.__dict__ would also be useful.

    >>> a_number = 3
    >>> '%(a_number)d' % locals()
    '3'

String formatting parameters will also take their own parameters. The parameters are decimal characters (and an optional separation dot) between the % sign and the character indicating the type.

Here is an ugly syntax representation. Bear with me.

% [0|-][leading/trailing character count][An optional dot][Truncation](Type character)

That's all, I think. Be careful to add no spaces. The only characters allowed between % and the type indicator are [0-9\.\-]

  • First, the usual percentage sign.

  • Then, add a single zero if you want to fill the string with zeroes (you specify how many characters you would like to fill the string with in the next argument). If you want the string to have trailing spaces instead of leading spaces, add a minus sign.

  • After that, add the number of characters you want to fill the string with. You can skip this.

  • If you want to use the truncation argument (explained below), add a dot here.

  • Add the truncation argument, which is an integer.

For obvious reasons, you can't have trailing zeroes in anything.

The "Truncation" argument is used to left-truncate a string to that size, to grow (only grow) an integer string by zero-filling the left to that size, or to use that many decimal digits in a float.

Here are some examples with %s. %s doesn't let you put leading zeroes on your string. You can use str.zfill() for that.

    >>> s = '1234'
    >>> '%.1s' % s
    '1'
    >>> '%.4s' % s
    '1234'
    >>> '%.6s' % s
    '1234'
    >>> '%10s' % s
    '      1234'
    >>> '%10.2s' % s
    '        12'
    >>> '%10.4s' % s
    '      1234'
    >>> '%10.6s' % s
    '      1234'

%f examples:

    >>> f = 10.123
    >>> '%f' % f
    '10.123000'
    >>> '%.4f' % f
    '10.1230'
    >>> '%4f' % f
    '10.123000'
    >>> '%20f' % f
    '           10.123000'
    >>> '%20.2f' % f
    '               10.12'
    >>> '%020.2f' % f
    '00000000000000010.12'
    >>> 

%d can also be told to have leading zeroes or leading spaces. As mentioned above the "Truncation" part of the parameter can only make it grow. It wouldn't make sense to let it shrink, since that would shrink the number's value.

    >>> i = 10
    >>> '%d' % i
    '10'
    >>> '%.1d' % i
    '10'
    >>> '%.4d' % i
    '0010'
    >>> '%4d' % i
    '  10'
    >>> '%4.1d' % i
    '  10'
    >>> '%4.3d' % i
    ' 010'
    >>> '%4d' % i
    '  10'
    >>> '%04d' % i
    '0010'
    >>> 

As you know, in python, characters are indeed strings with len() of 1, so if you want to represent a character you just use %s. But you can also use %c. %c will raise an exception if the input string is not a character, and the extra assertion might prove a little useful. You can add leading and trailing spaces to this parameter too.

Finally, it's worth something to note that python has a newer formatting syntax. I haven't seen it used much, and don't use it either. It's not so practical as the one I described.

More information on string formating can be found in the python documentation

This has hopefully been a thorough dissection of python's string formatting syntax. Now go out and enjoy the sun!

Monday, 15 October 2012

Python Discovery: Named Tuples

I had known of the named tuple idiom for ages. It is used sparingly in some python API's, and I feel it is very pythonic, practical, and simple to read and code with.

It's used like this (this example is for the timetuple in datetime.datetime, which is a namedtuple, or looks a lot like one):

>>> tt = datetime.datetime.now().timetuple()
>>> tt
time.struct_time(tm_year=2012, tm_mon=10, tm_mday=15, tm_hour=20, ...)
>>> tt.tm_year
2012
>>> tt[0]
2012
>>> tt[3]
20
>>> tt.tm_mon
10
>>> 

And it's iterable.

>>> list(tt)
[2012, 10, 15, 20, 24, 8, 0, 289, -1]

It's basically a tuple which has attributes accessible through dot notation.

The timetuple idiom is perfect for API design, and for those times where a class is what you want, but a tuple would be okay too. For example, 2D dimensions. You could store them in a class with height, width attrs, or in a tuple.

I thought it was just some pythonic pattern, but today I needed to automate it because I found myself making dumb __init__ methods like these:

def __init__(self, left, right):
    self.left = left
    self.right = right

It is a useful pattern. It's pythonic. It's implementable in pure python with just a little metaprogramming. It could be in the standard library, I thought.

I opened up my python interpreter and entered import collections and dir(collections) looking for NamedTuple or something. namedtuple was there. It wasn't just a coincidence.

collections.namedtuple is a factory for creating namedtuple types, which you can then use. The resulting classes include methods like _asdict, _fields, index and _replace.

>>> a = collections.namedtuple('a', 'b, c')
>>> a(1,2)._asdict()
OrderedDict([('b', 1), ('c', 2)])
>>> a(1,2)._fields()
('b', 'c')
>>> a(1,2).index(1)
0
>>> a(1,2).index(2)
1
>>> a(1,2)._replace(b=3)
a(b=3, c=2)
>>> 

It is hashable, unlike a dict, and pickleable.

Here is how you create one:

>>> collections.namedtuple('SomeNamedTuple', 't1 t2')

The first argument is the name of the output class, and the second argument is the list of parameter names, whitespace - and/or comma - delimited.

This creates a named tuple with the __name__ SomeNamedTuple, and the attributes t1 and t2.

Extend it

When you need more stuff out of your namedtuple you should really subclass. Keep in mind that it is immutable, so you should return a copy of it in every method.

class Dimensions(collections.namedtuple('DimensionsTuple', 'height width')):
    pass

>>> Dimensions(10, 30).height
10
>>> Dimensions(10, 30).width
30

It is rather useful sometimes, so don't forget about the named tuple! You might need it sooner or later.

Sunday, 23 September 2012

Python: "or" and "and" operators yield more stuff than bool

Python's and and or operators don't yield bool. They don't calculate the result of the expressions into booleans, and most certainly do not give us True or False.

They return one of the objects we put in. So they are not useful only in if statements, and can't convert to a boolean by themselves (without bool(), that is).

You might be familiar with using or like this since it's a common alternative to ... if ... else .... Take this __init__ method for example:

    def __init__(self, form=None):
        self.form = form or self.make_form()

It is straightforward, readable, and short. Pythonic indeed.

If the right operand is an expression which would throw an error, as long as the left operand is true, you don't have to worry. The right expression will not be evaluated. This is not old. We have seen it in if statements in the C language. I have used it countless times in Java to avoid NullPointerException. It wouldn't make any sense for a machine or virtual machine to evaluate both expressions if the first one already evaluates to true.

>>> def get_sth():
...     raise NotImplementedError
... 
>>> [1] or get_sth()
[1]
>>> [] or get_sth()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in get_sth
NotImplementedError


A very good use of this is to try to tentatively get a resource in one manner which is more likely to succeed, and then try a more risky or less desireable fallback if the first manner yields nothing.

This is awesome for creating intuitively usable classes. For example, a make_form implementation could be a single-line "raise NotImplementedError", so client code can initialize the class without passing in a form object every time, but if they want to use the defaults they just have to inherit the class and make their own make_form method. It's very practical, intuitive and informative behavior just for a single expression, isn't it?

Here is the or expression behavior, described:
  • When one of the operands evaluate to True, return the one which evaluates to True.
    • >>> 1 or 0
      1
      >>> 0 or 1
      1
  • When both evaluate to True, returns the left one (python won't even look at the second operand, so you can rest assured no methods will be called, if any.)
    • >>> 1 or [1]
      1
      >>> [1] or 1
      [1]
  • When both evaluate to False, returns the right one.
    • >>> 0 or []
      []
      >>> [] or 0
      0
We can use the last caracteristic, too. Sometimes two values are False, but we want to use one over the other. (standard falsy values are the empty list, the empty tuple, 0, False, None, and the empty string).

The and operator is rather interesting. It gives you:
  • The right operand, when both evaluate to True:
    • >>> True and 1
      1
      >>> 1 and True
      True 
  • the operand which evaluates to False, when one of them is falsy:
    • >>> 19 and 0
      0
      >>> 0 and 19
      0
  • The left operand, when both evaluate to False
    • >>> 0 and False
      0
      >>> False and 0
      False

Thursday, 20 September 2012

Interesting-c

I'm creating a new programming language!

This language will be created through transcompilation to C. Inspired by the concept of CoffeeScript, and including a lot of Python philosophies and ideas, I will try to create a language which is really easy to use, and yet fast as C.

One of the most important features will be reflection and metaprogramming facilities. This will help developers by describing the structure of their programs at runtime. This makes for good ORM's, serializers, etc.

It will not use classes for encapsulation, but instead a new concept I have created. It will build upon the capabilities of classes and the concept of "is-a" relationships. More on these soon.

interesting-c aims to be able to interpret 99% of existing C programs as interesting-c. So a C program will (almost) always be a interesting-c program. This allows developers to convert gradually to interesting-c.

interesting-c will have a module system. modules will be able to import other modules as well as pure C source or header files. The syntax will be something like this:

    import "c_module.c";
    import icmodule;

When importing, interesting-c will just add a #include directive to the compiled source, but it will simulate the concept and behavior of a namespace. It will parse the #included file and look for identifiers which the current module can use. Optionally, users will be able to use import module_name as alias syntax to assign the module to an identifier so its namespace can be isolated from the local module namespace. This poses a new problem: C has no concept of a namespace and will not allow the programmer to choose between foo in module1.c and foo in module2.c. It's unclear how I will solve this

interesting-c will discourage (but still support) preprocessor directives. But there will be a lot more space for safer, more interesting pre-compile-time magic as well as runtime magic. And yet, you will still be able to shoot yourself in the foot and use macros as much as you want.

Early work

I have started creating interesting-c, and it really gives me new interesting problems to solve.

I am using PLY (python-lex-yacc)'s lex to create the lexer. My challenge right now is to preserve whitespace as possible, as well as most of the code appearance, since it's important for the users to be able to opt in and out of interesting-c at any time, so it will be something easy to adopt.

I have been creating small C programs to test C language features. Programs like these help me test whether and how the language supports something.

    /* Test whether floating point numbers can be octal */

    int main(int argc, char* argv[]){
        float oct = 01.1f;
        return 0;
    }

Monday, 10 September 2012

Python: str.split has a "limit" argument

I have recently made this little discovery. It seems that str.split has a maxsplit argument, which tells it to only split into a certain amount of parts. This could be really useful for text parsing.
I have in the past run into some (rare) situations where I needed to do this, but didn't know of the maxsplit parameter, and ended up using str.join and slices, to recreate the rest of the string with the delimiters.

It's a little boring to do, and it is ugly.
>>> url = '/posts/blog-1/10251/'
>>>
>>> #problem: split the URL into two parts
... #such that first_part == 'posts' and second_part == 'blog-1/10251'
... #first solution: split and join with slices.
...
>>> first_part = url.strip('/').split('/')[0]
>>> second_part = '/'.join(url.strip('/').split('/')[1:])
>>> first_part, second_part
('posts', 'blog-1/10251')
However, if we do this using the split limit argument, it becomes much more readable.
>>> #second solution: use unpacking, and str.split() with the limit argument
...
>>> first_part, second_part = url.strip('/').split('/',1)
>>> first_part, second_part
('posts', 'blog-1/10251')
>>>
The "limit" argument asks you how many splits you would like, not how many fragments you would like. So specify the n-1 when you want n fragments.

What about splitting by whitespace?

Splitting by whitespace is one of the most powerful features in str.split(). Since I usually invoke this functionality using "".split() without any arguments, I was worried about splitting by whitespace, with the limit argument being a positional-only argument, but you can also use "".split(None).
This is nice since the exact whitespace that used to be there would be impossible to recover with the above tactic (since it's not just a delimiter character).
>>> 'OneFragment TwoFragments ThreeFragments'.split()
['OneFragment', 'TwoFragments', 'ThreeFragments']
>>> 'OneFragment TwoFragments ThreeFragments'.split(maxsplit=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: split() takes no keyword arguments
>>> 'OneFragment TwoFragments ThreeFragments'.split(None, 1)
['OneFragment', 'TwoFragments ThreeFragments']

Split by whitespace, and preserve it.

When you split by whitespace, str.split splits by spaces, tabs, carrier returns and newlines. There are many whitespace characters, and sometimes you want to preserve this information. When using string.split and joining it back, you have no way of getting that information back. It's gone. However, the maxsplit argument allows you to preserve the existing whitespace.
>>> 'And together we fled.\r\nWe said:\r\n\t"Hello!"'.split(None, 1)
['And', 'together we fled.\r\nWe said:\r\n\t"Hello!"']
>>> print _[1]
together we fled.
We said:
        "Hello!"

Friday, 7 September 2012

Python: Unpacking works with one item!

One of the advantages in python is that it is very practical to unpack variables out of short lists and tuples. Some pieces of code which would otherwise be repetitive and ugly (a = lst[0]; b = lst[1]) end up clean, short and easy to read.

>>> a,b = [1,2]
>>> a
1
>>> b
2
>>> 
It's the reason behind python's multi-return values making sense. When we create code that is going to be used by other modules, it opens up a lot of possibilities, and eases the writing and understanding of the client code that uses them.
>>> #create_monkey will use the monkey count in its calculations.
... This value is very useful for client code, but it is rather expensive to obtain.
... monkey, new_monkey_count = create_monkey()
>>>

My discovery today is that you can "unpack" a list or tuple even when it contains only one item. Check it out:

ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, Jun 24 2011, 12:22:14) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> a, = [1]
>>> a
1
>>> a, = []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack
>>> 

This opens up some possibilities. If we are one hundred percent sure that the list contains a single value, we can unpack that into a variable instead of using [0], which is inherently ugly.

When you are not sure whether the container is empty (or more than one item), you can always catch the resulting ValueError. Or use this:

>>> (a,) = [] or ['default']
>>> a
'default'
>>> 
However, the comma is very subtle and future code readers (including the smartass who decided to use this obscure thang) will probably not notice it. This effectively makes refactoring affected code tricky.

Furthermore, someone may think that it is a syntax error, edit the comma out, and TypeErrors and ValueErrors start popping up everywhere. Subtle bug hunting fun!

There is a little workaround for these readability issues, which is to use full tuple syntax:
>>> (a,) = [1]
>>> a
1
>>>
So it seems like I can use this single-unpacking to do good and not just evil.
It seems to be really interesting, but I am not sure whether I should use it in real code. Seems like I have some meditation to do.
In the meantime, I can show off a little.
>>> a, = 1,
>>> a
1
>>>  

Saturday, 25 August 2012

django-model-permissions

I have created a small library to help create access control in Django projects. It uses a "lock" paradigm, where you can lock certain methods in Django Models (right now it actually works on every class, but I may be adding features that need more tight integration).

The purpose is to take access control to the Model layer, and out of the Controller (view) layer.

Here's how you use it (taken from the readme)

First, apply locks to some model methods:
class Example(models.Model, LockingMixin):
    class Meta:
        permissions = [
            ('example_can_get_id', 'Can get ID')
        ]

    name = models.CharField(max_length=33)

    @locks.permission('app.example_can_get_id')
    def get_id(self):
        return self.id
Then, instance it and use it. When you try to access a locked method, modelpermissions will throw an exception.
model = Example.objects.create(name='name')

try:
    model.get_id()
    assert False #remove this
except PermissionDenied:
    'django.core.exceptions.PermissionDenied was thrown'

But now let's unlock this model instance and try again

model.unlock(user)

model.get_id()
'no exception thrown'
 
Locks aren't limited to permission locks. You can use @locks.conditional and pass it a function that receives the model instance and do more flexible stuff with it.

And the future looks nice, too. Instead of just raising PermissionDenied, I feel that this could make more interesting stuff, like use an alternate method, when the user doesn't have enough permissions (even return a different Proxy model from the lock() method, which will start to return the model instance itself for chainability), and a different LockingMixin to lock things only when you explicitly call lock() on the Model.

It's available on github. Try it out!

Tuesday, 14 August 2012

Django Test Coverage

The django-test-coverage (https://github.com/srosro/django-test-coverage) is an exciting project that leverages coverage.py in Django projects.
It is most helpful in figuring out if your project is being comprehensibly covered by your automated tests. When you run ./manage.py test, django-test-coverage outputs coverage information. It doesn't help much that the coverage information prints out even when tests fail, making you scroll your command line session a bit more, but this might be fixable.
I was looking for a way to check test coverage for a Django application, and noticed that the project didn't have its own setup.py, nor was it ready for Django 1.4 because of a python package issue. I forked the project, created an appropriate python package, and moved the files in. Then I created a setup.py.
Not a big change, but it allowed me to test my application with coverage on Django 1.4, as well as install django-test-coverage much more cleanly, using pip.
My own fork is here: https://github.com/fabiosantoscode/django-test-coverage. You can report any bugs, suggest stuff to be added, fork, pull request...
Django becomes an even greater and more useful framework because of small plugins like these. It's fascinating how a few lines of code mostly wrapping external functionality can make such a big difference.