Calling external commands in awk

Today I had a file in which some lines were displaying a base 10 number. I wanted to translate this number in base36, possibly keeping the format as it was in the original file.
The entry was something along the lines of:

    thisField='49173058277224'

I decided to resort to awk and calling an external command to do the conversion.

Turns out, there are a couple of ways you can tell awk to fire off an external command. One is using system. The other is by simply stating the command in double quotes. There’s a nice forum discussion around this topic, and for once it’s not on StackOverflow.

Here’s the first draft of the solution.
awk '{
if (/thisField=.([0-9]+)./) {
match($1, "([0-9]+)", m);
"python -c \"import numpy as np; print np.base_repr(" m[0] ", 36)\" " | getline converted;
sub("([0-9]+)", converted);
print;
}
else {
print;
}
}' orig.dat > converted.dat

The trick is to pipe the result of the command to getline so that it can be saved into a variable, that is later used.
As a side topic, notice that this rough solution is extremely slow: a numpy import will happen for every line that matches. Other solutions for this particular problem exist, such as using bc after setting a proper obase or awk only.

Advertisements

Git: apply a lost stash

Today’s scenario: stash save something in order to get the latest source code, stash pop to re-apply the last changes, do something silly and lose all in-flight changes.
Fear not, there’s a way to recover the “lost” stash, assuming you still have the output of the stash pop.
In fact, after the pop operation there should be a line that goes like this:

Dropped refs/stash@{0} (f2acd7f56e93236bbb813a9dab3bba18e124da04)

Armed with the SHA code, just cherry-pick it:

git cherry-pick f2acd7f56e93236bbb813a9dab3bba18e124da04 -m 1

.
Watch out: this will commit your change as it is. You may want to git reset --soft HEAD^1 and git commit --amend accordingly!

Unpickling GitPython datetimes

Pickles

I’ve been playing around with GitPython recently, in an effort to analyse the relation between commits and software quality.

One by-product of this analysis was a Pandas Series of the number of commits on a given day. Since this turned out to be a time-consuming operation (as I needed to repoint head back in time for each day I was interested in), I opted to pickle the Series. Imagine the horror when, the day after I had run the script, I discovered that unpickling the data raised an exception.

In [4]: commits = pd.read_pickle('commits.pkl')
...
TypeError: __init__() takes at least 2 arguments (1 given)

That error comes from pickle_compat.py, part of the Pandas library.
However, no mention is made of which class actually raised it.

Entering %debug and going up and down the stack didn’t reveal much either, so I decided to go closer to the actual unpickling operation, using cPickle.

In [10]: commits = cPickle.load(open('commits.pkl'))
...
TypeError: ('__init__() takes at least 2 arguments (1 given)', <class 'git.objects.util.tzoffset'>, ())

Still an error, but a more meaningful one. Let’s see what a brief inspection of tzoffset shows.

In [11]: import git.objects.util

In [12]: git.objects.util.tzoffset?
Init signature: git.objects.util.tzoffset(self, secs_west_of_utc, name=None)
Docstring:
File: /opt/bats/lib/python2.7/site-packages/git/objects/util.py
Type: type

So __init__ expects a secs_west_of_utc positional argument (no default).

To still be able to unpickle your data without the need for running the script again, you just need to mock that class with a slightly modified one. Thank partial applications for that.

In [20]: git.objects.util.tzoffset = partial(git.objects.util.tzoffset, secs_west_of_utc=0)
In [21]: commits = pickle.load(open('commits.pkl'))
In [22]:

Job done – thank you functools!

Using Magic Cookies to run programs remotely as root

airfryer-10-minute-smartie-cookies

Some magic cookies

Unbelievable how many times I fell for this – and am still falling.

The situation is as follows: you are on a remote box, using SSH and X-forwarding enabled. You can run any graphical program (say, wireshark) as that user, but as soon as you try prepending sudo you get: (wireshark:8881): Gtk-WARNING **: cannot open display: .

If you’ve been following me for long enough, you know I’ve been bitten already by a similar problem in the past. The only (minor) difference is that this time I don’t even have a DISPLAY variable set (as root).

So here’s another fix, this time using magic cookies.

Step 1, as normal user type echo $(xauth list ${DISPLAY#localhost}). You’ll get something like this back: machine/unix:25 MIT-MAGIC-COOKIE-1 41f6c7f04a706ca5e490b3edf8a26491

Step 2, as root, run xauth add followed by the line you got as output on the shell, that is: xauth add machine/unix:25 MIT-MAGIC-COOKIE-1 41f6c7f04a706ca5e490b3edf8a26491.

Exit the root shell, confidently type sudo DISPLAY="localhost:25.0"
wireshark
and enjoy!

Removing latex commands using Python “re” module

Recently I had to sanitize lines in a .tex file where a \textcolor command had been used.
The command was being used the following way: {\textcolor{some_color}{text to color}}.

The main problem was that the command could have appeared any number of times in a line, so I couldn’t apply the command a set number of times.
Also, given any color could have been used, a simple “blind replace” was clearly not a good weapon in this case.

I therefore resorted to applying a reg ex recursively until the line was cleaned of any \textcolor command.

In a nutshell:

def discolor(line):
    regex = re.compile('(.*?){\textcolor\{.*?\}(\{.*?\})\}(.*)')
    while True:
        try:
            line = ''.join(re.search(regex, line).groups())
        except AttributeError:
            return line

The key part here is that we match not only the text inside the \textcolor command, but also what comes before and after (the two (.*?) blocks). We return them all until there are any left: when that happens, accessing .groups() will raise an AttributeError, which we catch and use as sentinel to know when to return.

Append an item to an OrderedDict

Update 2017/12/07. Since it seems this post is pretty popular… Beware: the following snippet abuses accessing private fields and, more in general, relies on the internal details of another data structure. I don’t recommend using this approach in any code you rely on. OK for learning and investigating OrderedDict‘s internals, not OK for prod. Use at your own risk.

I needed a way to append an item to an OrderedDict, without creating a new object (too consuming) and I stumbled upon this answer on StackOverflow.

The answer gives a solution to the inverse problem (that is, prepending an item), but was good enough to be modified for my situation, without me needing to delve too much into the details of the OrderedDict data structure (it’s basically a linked list, under the hood).

Enough said, here it is for future reference:

class MyOrderedDict(OrderedDict):
    def append(self, key, value):
        root = self._OrderedDict__root
        last = root[0]

        if key in self:
            raise KeyError
        else:
            root[0] = last[1] = self._OrderedDict__map[key] = [last, root, key]
            dict.__setitem__(self, key, value)

Graceful Harakiri

In the past few days I’ve been trying to overcome a problem we saw on our CI environment: tests being abruptly cut if they hang.

If a build takes too long, our CI tool stops it by sending it a SIGTERM signal. That’s fine in general, but if it’s a test (run by the nosetests driver) that’s taking too long to finish, a SIGTERM would cause it to immediately stop, without leaving any trace on the output where it hanged.

What I coded was a plugin, Graceful Harakiri, that intercepts the SIGTERM signal, converts it to an interrupt and, in the process, prints out the current frame, giving away some information about where the test got stuck.

The code is on GitHub – have a look at the description and use it. Feedback is most welcome.

Running Coded-UI automated tests from the command line

Short and concise post, more like a memory aid for myself, about running Coded-UI tests without the need of the Visual Studio GUI. Disclaimer: I’m talking about VS 2012 (though this may apply to VS 2010 too).

According to MSDN,

MSTest is used to run load test in Visual Studio 2012. By default, unit tests and coded UI tests use VSTest.Console.exe in Visual Studio 2012. However, MSTest is used for compatibility with test projects created using Visual Studio 2010, or if a .testsettings file is manually added to a Visual Studio 2012 solution containing a unit test project, or coded UI test project.

Indeed you can run Coded-UI tests from the VS 2012 command prompt, by simply issuing VSTest.Console.exe NameOfYourTestSuite.dll. However, you could do the same by issuing MSTest.exe NameOfYourTestSuite.dll without the need of a .testsettings file.

It goes without saying, beware that if you try running your tests using MSTest from within a VS 2010 prompt, you’ll most likely end up having the command whinging like this: Unable to load the test container 'NameOfYourTestSuite.dll' or one of its dependencies. Error details: System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.VisualStudio.TestTools.UITesting, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

That said, if you want to be able to run similar tests on a machine that does not have VS 2012 installed, you just need to download the Agents for Visual Studio 2012.

Silly policies for the greater good might be not good at all

Documentation overload? – © by Frank Jepsen

There’s a policy at work: technical support cannot hand anything that comes from clients to testers or developers. Anything includes databases, configuration files, logins & password, and so forth.

While the aim is undoubtedly right, to protect clients’ privacy, the whole matter is totally pointless, not to say harmful.

Pointless because anyone, upon joining, signs a contract that explicitly says that disclosing any sensitive information is strictly forbidden. Adding an extra level of enforcement is far from being useful.

Harmful because it goes against testers and devs, by hindering their jobs. Ultimately, they are those who solve the problem for the client.

Share your data within your organisation!