Publishing nosetests results in Jenkins CI

While trying to publish test results in Jenkins CI via the xUnit plugin, I’ve set up the Post-build Actions section in Jenkins according to what the nosetests documentation suggests, that is, to “tick the box named “Publish JUnit test result report”” and referenced the correct file.

However, I was repeatedly stumbling upon an error:

WARNING: The file '/var/lib/jenkins/workspace/Project/nosetests.xml' is an invalid file.
WARNING: At line 1 of file:/var/lib/jenkins/workspace/Project/nosetests.xml:cvc-complex-type.3.2.2: Attribute 'skip' is not allowed to appear in element 'testsuite'.
ERROR: The result file '/var/lib/jenkins/workspace/Project/nosetests.xml' for the metric 'JUnit' is not valid. The result file has been skipped.

Turns out, my nosetests.xml file does indeed contain the skip attribute. And that's also in line with the example in the official documentation.
Much to my surprise, though, running a few web searches made me realize there aren't many other users of Jenkins + xUnit plugin + nosetests who have the same problem.

To fix this, it looked like I had to write my one XLS file. All you need to do is create two templates, one matching the "skip" attribute, and the other matching everything else.

The "skip"-matching template will simply get rid of the attribute altogether. The other will pass everything else as it is.

By taking cue from this StackOverflow answer, the result is as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@skip" />
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Save it as nosetests.xsl. Now, armed with your shiny XSL file, put it in the $JENKINS_HOME/userContent directory, say in userContent/xunit_xsl/nosetests.xsl.

In the job config page, under the Post-Build Actions section, change the type from JUnit to Custom, and provide a reference to the XSD file you've created, in the form of $JENKINS_HOME/userContent/xunit_xsl/nosetests.xsl.

Save, run, and happily enjoy your test results!

Advertisements

Navigate the stack backward

Took me a bit to figure it out correctly, so here it is: a recursive generator to navigate the stack backward (upward, if you want)

def getUpperFrames(frame):
    yield frame
    if not hasattr(frame, 'f_back'):
        return
    for frame in getUpperFrames(frame.f_back):
        yield frame

And if you’re on Python 3 – which you should – then you get the bonus of using yield from instead of using for ... yield

Removing latex commands using Python “re” module

Recently I had to sanitize lines in a .tex file where a \textcolor command had been used.
The command was being used the following way: {\textcolor{some_color}{text to color}}.

The main problem was that the command could have appeared any number of times in a line, so I couldn’t apply the command a set number of times.
Also, given any color could have been used, a simple “blind replace” was clearly not a good weapon in this case.

I therefore resorted to applying a reg ex recursively until the line was cleaned of any \textcolor command.

In a nutshell:

def discolor(line):
    regex = re.compile('(.*?){\textcolor\{.*?\}(\{.*?\})\}(.*)')
    while True:
        try:
            line = ''.join(re.search(regex, line).groups())
        except AttributeError:
            return line

The key part here is that we match not only the text inside the \textcolor command, but also what comes before and after (the two (.*?) blocks). We return them all until there are any left: when that happens, accessing .groups() will raise an AttributeError, which we catch and use as sentinel to know when to return.

sort -k

TIL, when you call sort -k3, you’re not just sorting by the third field, but by whatever the value between the third field up to the end of the line is.
Not only that, in the case of ties, by default it will use also the first field.

Consider this example.

$ cat data
theta AAA 2
gamma AAA 2
alpha BBB 2
alpha AAA 3

Sorting with -k2 gives:

$ sort data -k2 --debug
sort: using simple byte comparison
gamma AAA 2
     ______
___________
theta AAA 2
     ______
___________
alpha AAA 3
     ______
___________
alpha BBB 2 
     _______
____________

Notice I’ve also added --debug, to show which parts are used in the comparisons.
So, first comes “AAA 2”, then “AAA 3”.
Also, for the two lines that have “AAA 2”, the first field is used, so “gamma” comes before “theta”.

Forget about the ties for now.
To consider field 2 only, rather than field 2 and all following fields, you need to specify a stop. This is done by adding “,2” to the -k switch. More in general, -km,n means “sort by field m up to n, boundaries included”.

$ sort data -k2,2 --debug
sort: using simple byte comparison
alpha AAA 3
     ____
___________
gamma AAA 2
     ____
___________
theta AAA 2
     ____
___________
alpha BBB 2 
     ____
____________

As you can see, field 2 only is taken into account at first.
“AAA 3” comes before “AAA 2” because, being a tie, the first field is used as a second comparison.

Taking this a step further, to actually only consider field 2 and resort to the original order in case of ties, that is, to have a stable sort, you need to pass the -s switch.

$ sort data -k2,2 -s --debug
sort: using simple byte comparison
theta AAA 2
     ____
gamma AAA 2
     ____
alpha AAA 3
     ____
alpha BBB 2 
     ____

This look similar to the first snippet, but actually the first two lines in the output are swapped. Here they appear in the original order.

Timezones and DST in Python

It’s incredible how fiddly it is to work with timezones.

Today, 14th of June—and this is important—I was trying to convert a made-up datetime from “Europe/London” to UTC.

I instinctively tried out this:
>>> almostMidnight = datetime.now().replace(hour=23, minute=59, second=59, microsecond=999999, tzinfo=pytz.timezone('Europe/London'))
>>> almostMidnight
datetime.datetime(2017, 6, 14, 23, 59, 59, 999999, tzinfo=<DstTzInfo 'Europe/London' GMT0:00:00 STD>)

At this point you will notice it didn’t take into account the DST offset (it should read BST).

As a further confirmation, converting to UTC keeps the same time:
>>> pytz.UTC.normalize(almostMidnight)
datetime.datetime(2017, 6, 14, 23, 59, 59, 999999, tzinfo=<UTC>)

Notice this result would be fine during the winter, so depending how much attention you devote and when you write the code you might miss out on this bug – which is why I love having the same suite of tests always running on a system that lives right after the upcoming DST change.

Even more subtler, if you were to try and convert to a different timezone, a geographical timezone that observes DST, you would see this:
>>> almostMidnight.astimezone(pytz.timezone('Europe/Rome'))
datetime.datetime(2017, 6, 15, 1, 59, 59, 999999, tzinfo=<DstTzInfo 'Europe/Rome' CEST+2:00:00 DST>)

Interesting. Now DST is accounted for. So converting to geographical timezones might also mask the problem.

Long story short, the correct way *I believe* to convert the timezone of a datetime object to UTC is to create a naive datetime object (no timezone info attached) representing localtime, and then call the “localize” of the timezone of interest. In code:
>>> almostMidnight = datetime.now().replace(hour=23, minute=59, second=59, microsecond=999999)
>>> almostMidnight
datetime.datetime(2017, 6, 14, 23, 59, 59, 999999)
>>> pytz.timezone('Europe/London').localize(almostMidnight).astimezone(pytz.UTC)
datetime.datetime(2017, 6, 14, 22, 59, 59, 999999, tzinfo=<UTC>)

There’s a very nice read on timezones by Armin Ronacher, which I recommend.

Asserting that an exception has not been raised

A quick way to test that some method does not raise an exception is to try and raise that exception. Sounds logic – and it is – but I keep forgetting it. So here it is for my future me.

Suppose you have a method that raises an exception if it can’t open a file.

def read_first_line(self, filename):
    try:
        with open(filename, 'r') as f:
            self.first_line = f.readline()
    except IOError:
        print("No such file {0}".format(filename), file=sys.stderr)

Testing that it works correctly when a file does exist is quite simple at this point:

def test_no_exception_is_raised_if_file_exists(self):
    # suppose your instance is in self.sut
    try:
       sut.read_first_line('file_that_exists.txt')
    except IOError:
        self.fail("IOError raised but should not have")

Graceful Harakiri

In the past few days I’ve been trying to overcome a problem we saw on our CI environment: tests being abruptly cut if they hang.

If a build takes too long, our CI tool stops it by sending it a SIGTERM signal. That’s fine in general, but if it’s a test (run by the nosetests driver) that’s taking too long to finish, a SIGTERM would cause it to immediately stop, without leaving any trace on the output where it hanged.

What I coded was a plugin, Graceful Harakiri, that intercepts the SIGTERM signal, converts it to an interrupt and, in the process, prints out the current frame, giving away some information about where the test got stuck.

The code is on GitHub – have a look at the description and use it. Feedback is most welcome.

Pdb cheat sheet

I often need a table with all the commands for the Python debugger (Pdb) but so far I couldn’t find a comprehensive one. So I wrote one. Enjoy!

h(elp) or ? print available commands
h(elp) command print help about command
q(uit) quit debugger
! stmt or exec stmt execute the one-line statement stmt
p(rint) expr print the value of expression
pp expr pretty-print the value of expression
whatis obj print the type of obj
bt or w(here) print stack trace from oldest frame to newest
l(ist) list 11 lines of source code around the current line
l m, n list from line m to line n
args print the arguments of the current function
alias [name [command [parameter parameter ...] ]] create an alias called name that executes command
unalias name remove alias
<ENTER> repeat the last command entered
n(ext) execute the current statement (step over)
s(tep) step into function
j(ump) n set line n the next line to be executed
r(eturn) continue execution until the current function returns
c(ontinue) continue execution until a breakpoints is encountered
unt(il) continue execution until reaching the line greater than the current one or the current frame returns
run or restart (re-)run the program
u(p) move one level up in the stack trace
d(own) move one level down in the stack trace
b(reak) show breakpoints
tb(reak) [filename:]n set a temporary breakpoint at line n of file filename
b [filename:]n set a breakpoint at line n of file filename
b fun set a breakpoint at function fun
disable bn1 [bn2, ...] disable breakpoint(s) bn1, bn2, …
enable bn1 [bn2, ...] enable breakpoint(s) bn1, bn2, …
clear [bn] clear breakpoint number bn; if no number is specified, clear them all
commands [bn] specify commands to run at breakpoint bn
condition bn expr expression expr must evaluate to true in order for breakpoint bn to be hit. If no expression is specified, the breakpoint is made unconditional

String-building performance – a small experiment

I wanted to create a collection of all ASCII letters (both upper- and lowercase) and single digits in Python.

The first (IMO most Pythonic) way that I came up with uses the standard string concatenation operator:

import string
a = string.ascii_letters + string.digits

Then I wondered, would it be faster to use the usual methods to format strings?
Say, a = "%s%s" % (string.ascii_letters, string.digits) or a = "{0}{1}".format(string.ascii_letters, string.digits)?

Only timeit.timeit can tell.

>>> timeit("a = string.ascii_letters + string.digits", setup="import string", number=10000000)
1.4662139415740967
>>> timeit("a = \"%s%s\" % (string.ascii_letters, string.digits)", setup="import string", number=10000000)
2.8996360301971436
>>> timeit("a = \"{0}{1}\".format(string.ascii_letters, string.digits)", setup="import string", number=10000000)
12.925632953643799

Interesting, innit. Now, a word of caution: these results are limited to this particular example. Results may vary (and probably do) if the “domain of the experiment” changes – that is, if you use different strings in terms of number, length, characters, etc.

Nosetests and the docstrings

For some reason when you use the -v switch in nosetests, instead of using the test name, the runner uses the docstring (and only first line, for that matter).

Solution: install the “nose-ignore-docstring” plugin:
sudo easy_install nose-ignore-docstring
and then enable it by adding:
[nosetests]
with-ignore-docstrings=1

to ~/.noserc

For additional information, I’ll point you to the plugin web page on PyPy.