Append an item to an OrderedDict

Update 2017/12/07. Since it seems this post is pretty popular… Beware: the following snippet abuses accessing private fields and, more in general, relies on the internal details of another data structure. I don’t recommend using this approach in any code you rely on. OK for learning and investigating OrderedDict‘s internals, not OK for prod. Use at your own risk.

I needed a way to append an item to an OrderedDict, without creating a new object (too consuming) and I stumbled upon this answer on StackOverflow.

The answer gives a solution to the inverse problem (that is, prepending an item), but was good enough to be modified for my situation, without me needing to delve too much into the details of the OrderedDict data structure (it’s basically a linked list, under the hood).

Enough said, here it is for future reference:

class MyOrderedDict(OrderedDict):
    def append(self, key, value):
        root = self._OrderedDict__root
        last = root[0]

        if key in self:
            raise KeyError
            root[0] = last[1] = self._OrderedDict__map[key] = [last, root, key]
            dict.__setitem__(self, key, value)

A web-app to give you your next bus arrival time

404 bus. Actually found!

You go to the bus stop and, by Murphy’s Law, either a) the bus has just gone or b) you’ll wait an endless amount of time for the bus to arrive.
Or at least this is what I experience most of the time.

That’s why I coded a small web-app that uses the TFL live stream data and HTML5 Geolocalization functionality to retrieve a list of the bus stops within 300 m from your location and the list of busses that should arrive from that moment on. And if it can’t find your location, you’re given the possibility to input a postcode, which, thanks to the data at OrdnanceSurvey and some good math, is converted into canonical coordinates and used to present with a similar result.

It was a very nice excuse for me to learn more about Flask (amazingly fast to use), HTML, CSS (first time I use Bootstrap!) and JS, and web-apps in general.

The solution is far from complete and several could be the improvements (handle errors with nice landing pages, allow the user to input a postcode regardless if geolocalization worked or not, add link to maps, …), but as it was more of an experiment, I’m happy with the result so far. It’s hosted at OpenShiftGive it a spin and check out the code.

Fear and loathing when compiling the kernel

Make menuconfig.

Carefully sift through the myriad of kernel options and select only the ones you need, dreaming of how fast your machine will be with the kernel you’re crafting.

Wait for your family to be asleep and spend the rest of the night on the interwebs to find out whether CONFIG_PPS_CLIENT_GPIO should be compiled as a module or built-in, if you should enable CONFIG_CAN as an excuse to buy that dongle you’ve been longing for a while and questioning every “if unsure, say N” – because you might well be unsure, but you certainly don’t want to miss out on that cool feature.

Be pleased with yourself when you have what you think is the perfect kernel for your box.

Save the config, but don’t use the default .config name: give it a different name for no particular reason, like kconfig, and exit.

make menuconfig again, load .config while you want to perform your last few tweaks. Be surprised when you find out CONFIG_HW_RANDOM_AMD, CONFIG_HW_RANDOM_INTEL and CONFIG_HW_RANDOM_VIA are all enabled, though you were sure all architecture-dependent options were already taken care of. Disable the two that don’t apply it and save as .config. Save it also as kconfig, just in case. Realise you’ve just overwritten the result of a dozen nights of lost sleep.

Count to ten, trying not to curse. Curse yourself when you reach 23. Exhale, slowly. Close the lid and get some sleep.

GOTO step 1 the day after.

Be even more pleased with yourself when you have another new shiny .config – also saved with different names to 8 other places, including a USB stick that you buried in the socks drawer and the SD card of your camera. Love the moment when you see you could shave off another handful of options and allow yourself to grin all day long when you go and come back to work. Remember you haven’t shaved since the first make menuconfig when a kid offers you a place on the bus, despite the fact you’re just 32.

make-kpkg your kernel and dpkg -i it. Smile and reboot.

Expect to select your kernel in GRUB and be a bit disappointed that you entirely missed the boot menu. Realise you’re still smiling anyway.

At the splash screen, wonder for a split second why neither keyboard nor touchpad work. Realise you can’t log in into the box and stop smiling suddenly.

Detach yourself and say to no one in particular “alright, I’ll just ssh in from the other box.” Try to remember whether sshd has started.

Start to realise a few things.

Realise in horror it’s not even installed.

Realise that you can’t reboot using the the older kernel, since the boot menu is hidden (you didn’t miss it: it really wasn’t there).

Realise you’re stuck. You’ve sort of locked yourself out your laptop. Realise this is the series of dumbest rookie errors you’ve done in ages.

Realise that, if there were an award for the stupidest thing, you’d win it hands down.

Give up on ssh and think about logging in using the on-screen keyboard. You still need a way to input characters.

Pray that you’ve enabled the support for a USB mouse. It’s kind of basic, but after what happened you’re doubting yourself about every single thing.

Locate in your head where you last saw a mouse in the house and run like there’s no tomorrow praying it’s still there. Find it and almost shed a tear of joy when you carry your mouse back to the laptop with both hands, like you just found the Graal.

Select each character of your 20-char password and be glad you’re alone and you won’t have to change your password(s) after these shenanigans.

Try to fire up the on-screen keyboard now that you’re logged in. Be lost wondering why this doesn’t work. Breathe slowly and resort to the char map. Admittedly not the best, decidedly better than nothing. Take three minutes to write sudo su and the password, char by char. As you can only find letters but no numbers or punctuation charaters in this stupid char map, open another shell and paste ll. Now you can copy paste numbers and some punctuation.

Can’t find the dash but you can write Devanagari in your shell

Open another shell and copy paste h i s t o r y. OK, now you’ve even got a selection of the most common commands. Realize you’re on the road to victory. Or at least, pretend so.

Now copy paste apt-get install sshd. Error. apt-cache search for it. Obviously it’s not sshd. It’s openssh-server. Install it. Error. Can’t find the repo. Move your head towards the top right corner and despair when you understand that wireless support is broken.

You will not be able to ssh in here. Ever.

Panic a bit and try a dpkg -P linux-image-4.1.6. Read a kind notice warning you you’re about to do a(nother) stupid(TM) thing. Wonder how you can select either ‘OK’ or ‘Cancel’ in this dialog, since you can’t paste control characters. Maybe this is a good thing after all.

Decide to edit the GRUB menu. Last time you checked, you needed to edit menu.lst. Learn that things have moved on and you now have to operate on /etc/defaults/grub. Open nautilus as root, thank the gods when you see that gedit is installed and regret all the times you said “I don’t need anything else outside vi“.

See that GRUB_HIDDEN_TIMEOUT_QUIET is set to true and GRUB_TIMEOUT is set to 0. Wonder why people think it’s always best when you hide things from sight. Change to true, 10. update-grub and reboot.

See the GRUB menu in all its uselessness, displaying a countdown from 10 with no other option.

You’re still on that crippled 4.1.6. Repeat all the weird stuff you just did, but at 10x speed. You’re now a semi-pro of on-screen keyboards, char maps and char copy/pasting.

Open /boot/grub/grub.cfg. Pause a second to thank for gedit again. See that the most-recent non-custom kernel is the third entry in the menu. So, without thinking twice set GRUB_DEFAULT from 0 to 2 in /etc/default/grub. Update-grub and reboot.

Accept that you messed up big time, as the system now enters the BIOS without even showing the useless countdown. You’re now on the path to desperation. Change boot settings and get yourself into a busybox environment. Feel like you’ve just seen the Matrix. At least now the keyboard works. <TAB> <TAB>. There are maybe 30 commands. vi is not one of them.

Think about mounting your partition, you’ll think how to modify grub.cfg once you get there.
Try to remember the layout of your HD. Screw it, just do mount -t ext4 /dev/sdaN /tmp/hd for N in 1 .. 6. Cry of joy at N=4. Try /tmp/hd/usr/bin/vi.tiny /tmp/hd/boot/grub/grub.cfg and be relieved watching vi compose the text of the file. Manually change every pointer to vmlinuz-4.1.6 to the last working kernel. Save. Reboot.

Back to normality.

Grep for ‘KEYBOARD‘ and ‘MOUSE‘ just for fun you kconfig (and all other 8 copies) and let your jaw drop when you see that neither CONFIG_KEYBOARD_ATKBD nor CONFIG_MOUSE_PS2 were selected.

GOTO beginning, again. Make menuconfig and go for another spin.

But not tonight.

Decoding RM4SCC for fun

I recently got curious about the bar code I could sometimes found on letters directed at me. I noticed there are just 4 symbols, begins and ends always in the same way, and is the same on all letters, regardless of the sender.

Armed with this basic information, after a bit of research I found out that the code is called Royal Mail 4-State Customer Code. Even more curious, I decided to write a simple decoder for it and, all of a sudden, all the knowledge in signal processing and telecommunications system retuned vivid in my mind, years after I took those classes (which I very much enjoyed, I must admit). Here is how I did it.

TL;DR: I put the code for the rm4sccdec (RM4SCC decoder) on GitHub. Use it at your own risk, as it’s not production-ready and needs some tweaking to reliably scan all types of image. I’ve used Python with OpenCV and numpy.

Step one: image pre-processing

The code does not include any information in the colour, which means we can simply get rid of the colour information and transform the image to greyscale.

Next, we want to maximise the “distance” between the information (the bars) and the noise (the background): this is usually done by thresholding the image. Using a global value for thresholding does not always give good results, especially when different areas of the image are characterised by different illumination. Some more advanced techniques, such as Otsu thresholding method (which I used in my decoder), are a better fit.

Finally, it is possible to have some residual noise, due to the thresholding process, whereby some white pixels are present in black areas and vice-versa. This is called salt’n’pepper noise and can effectively be filtered with median filters, which substitute the value of a pixel with the median of those around. The great advantage is that it preserves the edges of the image.

Step two: feature selection, extraction, and classification

Now we can start thinking about the features defining our symbols. We know we have 4 symbols, which we can call ascenderdescenderlong (Full Height, in the image), and short (Tracker, in the image).

The 4 symbols used in RM4SCC, from Wikipedia The 4 symbols used in RM4SCC, from Wikipedia

The first obvious feature we can select is the vertical position of each bar. After all, that’s the information we need to decode the codeword. However, if we choose the 4 points determining each bar, we’d probably end up complicating the decoding process too much.

An easy way out is to choose the centroid position (just its y-coordinate would be necessary) for each bar. Notice that, though, the long and short bars will share the same feature. If we go along this path, we need another feature to distinguish (at least) the long bar from the short bar. The second obvious feature is therefore the size or, more accurately, the area. This feature will allow us to distinguish easily long from short, but it will be pretty useless for the ascender and the descender.

For the extraction, we need to segment the image and find all the bars, and compute the so-called moments for each of them. The first three moments will be enough for us to get all the features we are interested in.

As a side note, as the segmentation function I have used does not return all segments in order, I had to extract the x-coordinate for each bar so as to be able and sort the vector of symbols.

If the code scanned is reasonably horizontal, we should be able to classify all four symbols pretty easily. For this bit I resorted to K-means clustering, although other classification methods can be used with similar results.

Step three: the actual decoding

If we don’t consider the starting and ending symbols, all symbols inbetween are grouped 4 by 4. For this reason we first need to build a dictionary that maps all valid combinations of 4-symbols group to the correct letter or number.

Finally, a bit of fun when computing the checksum. I translated the algorithm explained here, with the only difference that I wanted to avoid using yet another table to compute the final letter/number so instead I implemented the rule behind it (which boils down to ensuring ‘bit parity’).

Step four: enjoy it!

And possibly fork, improve and re-release :)

Misconfigured Python pretty printers in GDB

I noticed that running an executable in gdb displayed an error – and only the first time you run the program. The error reads:

(gdb) r
Starting program: /home/chris/workinprogress/cpp/book/a.out 
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/", line 63, in 
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
[Inferior 1 (process 7215) exited normally]

The problem is, the libstdcxx directory is not in the path.
The very simple fix is to add the directory to the Python path in gdbinit.

$ cat ~/.gdbinit
import sys
sys.path.insert(0, '/usr/share/gcc-4.8/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)

The journey from a crash logged in /var/log/messages to the offending line in the code

Recently I had to investigate a process crash. Usually, I grab the generated coredump and feed it to gdb. Usually, this works. Unless someone has already deleted the core file by mistake.

Fear not, there’s a solution. And here I am to document it before I forget it and spend another half an hour on Google.
Look in /var/log/messages the line that references the crash:

> sudo cat /var/log/messages | grep binary_that_crashed
Feb 30 17:20:22 zeus kernel: [9232272.493642] binary_that_crashed[12868] trap divide error ip:7f11de8d4095 sp:7f11daf7d990 error:0 in[7f11de876000+88000]

Then work out the location in the code by subtracting the address where the lib was loaded from the address of the instruction pointer:

> python -c "print '0x%x' % (0x7f11de8d4095-0x7f11de876000)"

Finally, get the offending line with either addr2line:

> addr2line 0x5e095 -fe /usr/lib/

where the mangled name of the method, _ZN3XYZ14ThisMethodName4plusEv can be demangled using c++filt:

> c++filt _ZN3XYZ14ThisMethodName4plusEv

or objdump:

> objdump -DCgl | grep -B1 5e095
   5e095:  48 f7 f1             div    %rcx

And while we’re more or less on topic, here’s a not-too-random selection of pointers to the stack (ha!):

Asserting that an exception has not been raised

A quick way to test that some method does not raise an exception is to try and raise that exception. Sounds logic – and it is – but I keep forgetting it. So here it is for my future me.

Suppose you have a method that raises an exception if it can’t open a file.

def read_first_line(self, filename):
        with open(filename, 'r') as f:
            self.first_line = f.readline()
    except IOError:
        print("No such file {0}".format(filename), file=sys.stderr)

Testing that it works correctly when a file does exist is quite simple at this point:

def test_no_exception_is_raised_if_file_exists(self):
    # suppose your instance is in self.sut
    except IOError:"IOError raised but should not have")

Graceful Harakiri

In the past few days I’ve been trying to overcome a problem we saw on our CI environment: tests being abruptly cut if they hang.

If a build takes too long, our CI tool stops it by sending it a SIGTERM signal. That’s fine in general, but if it’s a test (run by the nosetests driver) that’s taking too long to finish, a SIGTERM would cause it to immediately stop, without leaving any trace on the output where it hanged.

What I coded was a plugin, Graceful Harakiri, that intercepts the SIGTERM signal, converts it to an interrupt and, in the process, prints out the current frame, giving away some information about where the test got stuck.

The code is on GitHub – have a look at the description and use it. Feedback is most welcome.