Saturday, 1 December 2012

Phasing subtitles using python

I was trying to watch a film, but the subtitles I had were 1 second behind the actors' lines.
Not content with finding other subtitles on the web, I opened up the python interpreter and loaded the file into lines.
A little code followed

    import datetime
    import re

    lines = open('subs.srt', 'rb').read().splitlines()

    with open('out.srt', 'wb') as outp:
        for line in lines:
            if subtime.findall(line):
               time = datetime.datetime(1,1,1,*map(int, line[:8].split(':')))
               time += datetime.timedelta(seconds=1)
               outp.write('%02d:%02d:%02d%s' % (
                   time.hour, time.minute, time.second, line[8:]))
            else:
               outp.write(line + ' ')

It was just a few lines of code, showing off quite well a lot of the capabilities of python I love most. Text processing is always a cinch.

Explaining the code

The format of the subtitles was:
    [blank line]
    ID
    hh:mm:ss,ms: [text]
This explains why I had to check if the regex findall returned a match. The regex was ^\d\d:\d\d:\d\d.
When this regex found a line with subtitle time written on it, I did the reading, updating and writing the time. Otherwise, I just copied the line verbatim to the output file.
I simply cut the line using slice syntax. [:8] and [8:] got me the line's contents up to the seventh character, and from the eight character onwards, respectively.
I used the first seven characters of the line, split by the colon : character, as arguments to the datetime.datetime constructor, in true functional fashion. I had to map a call to int to turn all these number strings into integers.
To update the seconds correctly, I had to create an instance of datetime.timedelta with seconds set to 1 (which was my estimate of how off the time was), and add it the the time I got from the split string.
Having forgotten how to do date formatting, I just used string formatting against time.hour, time.minute and time.second, and joined in the rest [:8] of the string in the same operation.
It was quite fun, but my friends eventually grew impatient so in the end no film was watched.

No comments:

Post a Comment