dark patterns

Wikipedia defines dark pattern “a user interface that has been carefully crafted to trick users into doing things”.

Then, “carefully crafting fake notifications about customers renting cars” clearly falls into the same category.

This is what does. Once you select a city and some dates, you’ll get the list of cars available, and something else: a set of notifications that spawn with a random interval from one another implying other customers are renting cars in the same city you’re looking for a car to rent.

Does this put some pressure on you, hastening you into buying a car before all disappear? Yes.
Should you trust it? Actually, no.

If you inspect the source code, this is what you’ll see (click to view it full size):
Have a look yourself and you’ll find that as the page loads, a function will set to display, in the top right corner, some “pre-cooked” notification, lasting for 4.8 seconds, popping at random intervals from one another.

Reload the same page a few times and you’ll see those 4 notifications pop up time and again.

Sure it can’t be that legal, can it?


Create a valid ISIN

For a suite of tests I recently wrote, I had to create valid ISINs. The official page doesn’t give away much on how the final checksum digit should be computed and the examples on Wikipedia are, in my opinion, not particularly clear.

While I was trying to understand more, I stumbled upon a related unanswered question on stack overflow, which was basically after what I was trying to do, so I took the time to answer it.

Here I report a slightly modified version of the code snippet linked above, to create the checksum digit for the first 11 characters of an ISIN.

import string

def digit_sum(n):
    return (n // 10) + (n % 10)

alphabet = {letter: value for (value, letter) in
            enumerate(''.join(str(n) for n in range(10)) + string.uppercase)}

def isinChecksumDigit(isin):
    isin_to_digits = ''.join(str(d) for d in (alphabet[v] for v in isin))
    isin_sum = 0
    for (i, c) in enumerate(reversed(isin_to_digits), 1):
        if i % 2 == 1:
            isin_sum += digit_sum(2*int(c))
            isin_sum += int(c)

    checksum_digit = abs(- isin_sum % 10)
    return checksum_digit

Assuming countries is a list of valid, i.e. ISO-6166 compliant, country codes, you call the isinChecksumDigit as follows:

In [1]: isin = '{:2s}{:09d}'.format(random.choice(countries), random.randint(1, 10E8))

In [2]: isin
Out[2]: 'KR681111517'

In [3]: validIsin = isin + str(isinChecksumDigit(isin))

In [4]: validIsin
Out[4]: 'KR6811115171

Calling external commands in awk

Today I had a file in which some lines were displaying a base 10 number. I wanted to translate this number in base36, possibly keeping the format as it was in the original file.
The entry was something along the lines of:


I decided to resort to awk and calling an external command to do the conversion.

Turns out, there are a couple of ways you can tell awk to fire off an external command. One is using system. The other is by simply stating the command in double quotes. There’s a nice forum discussion around this topic, and for once it’s not on StackOverflow.

Here’s the first draft of the solution.

awk '{
  if (/thisField=.([0-9]+)./) {
    match($1, "([0-9]+)", m);
    "python -c \"import numpy as np; print np.base_repr(" m[0] ", 36)\" " | getline converted;
    sub("([0-9]+)", converted);
  } else {
}' orig.dat > converted.dat

The trick is to pipe the result of the command to getline so that it can be saved into a variable, that is later used.
As a side topic, notice that this rough solution is extremely slow: a numpy import will happen for every line that matches. Other solutions for this particular problem exist, such as using bc after setting a proper obase or awk only.

Changes to to root a Kindle Fire from Linux

For once, a post that’s not about testing: it’ll contain some brief notes that come in handy if you’re trying to use Root_with_Restore_by_Bin4ry_v33 on Linux to root a Kindle Fire.
The script,, won’t run out of the box. To make it work you’ll need to:

  • Make the script, and the files under stuff/ executable: chmod -r 755 stuff
  • Edit and
    • add a shebang on the first line: #!/bin/bash. This will fix the error that says read: Illegal option -n
    • Replace wait with sleep. This will fix the error that goes wait: pid 10 is not a child of this shell
  • Replace the adb binary in stuff/ with a more recent one. Run ./adb version to check the version. Android Debug Bridge version 1.0.39 – Revision 3db08f2c6889-android should work fine. This fixes the problem with mounting/remounting the filesystem of the device. The error message misleadingly suggests that you might not have root permissions (specifically, mount : permission denied (are you root ?)).
  • Edit (or create) ~/.android/adb_usb.ini and add the USB ID for the vendor, Amazon. Just add the number in the following format 0xnnnn (e.g. 0x1949). The ID can be found running dmesg after connecting the Kindle to the machine. This fixes the problem whereby the adb server does not see any device connected (even though the OS sees it).

If all goes well, should just work at this point.

Of course, YMMV and in any case I’ve got no liability if you screw it up. Happy rooting!

sort -k

TIL, when you call sort -k3, you’re not just sorting by the third field, but by whatever the value between the third field up to the end of the line is.
Not only that, in the case of ties, by default it will use also the first field.

Consider this example.

$ cat data
theta AAA 2
gamma AAA 2
alpha BBB 2
alpha AAA 3

Sorting with -k2 gives:

$ sort data -k2 --debug
sort: using simple byte comparison
gamma AAA 2
theta AAA 2
alpha AAA 3
alpha BBB 2 

Notice I’ve also added --debug, to show which parts are used in the comparisons.
So, first comes “AAA 2”, then “AAA 3”.
Also, for the two lines that have “AAA 2”, the first field is used, so “gamma” comes before “theta”.

Forget about the ties for now.
To consider field 2 only, rather than field 2 and all following fields, you need to specify a stop. This is done by adding “,2” to the -k switch. More in general, -km,n means “sort by field m up to n, boundaries included”.

$ sort data -k2,2 --debug
sort: using simple byte comparison
alpha AAA 3
gamma AAA 2
theta AAA 2
alpha BBB 2 

As you can see, field 2 only is taken into account at first.
“AAA 3” comes before “AAA 2” because, being a tie, the first field is used as a second comparison.

Taking this a step further, to actually only consider field 2 and resort to the original order in case of ties, that is, to have a stable sort, you need to pass the -s switch.

$ sort data -k2,2 -s --debug
sort: using simple byte comparison
theta AAA 2
gamma AAA 2
alpha AAA 3
alpha BBB 2 

This look similar to the first snippet, but actually the first two lines in the output are swapped. Here they appear in the original order.

A web-app to give you your next bus arrival time

404 bus. Actually found!

You go to the bus stop and, by Murphy’s Law, either a) the bus has just gone or b) you’ll wait an endless amount of time for the bus to arrive.
Or at least this is what I experience most of the time.

That’s why I coded a small web-app that uses the TFL live stream data and HTML5 Geolocalization functionality to retrieve a list of the bus stops within 300 m from your location and the list of busses that should arrive from that moment on. And if it can’t find your location, you’re given the possibility to input a postcode, which, thanks to the data at OrdnanceSurvey and some good math, is converted into canonical coordinates and used to present with a similar result.

It was a very nice excuse for me to learn more about Flask (amazingly fast to use), HTML, CSS (first time I use Bootstrap!) and JS, and web-apps in general.

The solution is far from complete and several could be the improvements (handle errors with nice landing pages, allow the user to input a postcode regardless if geolocalization worked or not, add link to maps, …), but as it was more of an experiment, I’m happy with the result so far. It’s hosted at OpenShiftGive it a spin and check out the code.

Decoding RM4SCC for fun

I recently got curious about the bar code I could sometimes found on letters directed at me. I noticed there are just 4 symbols, begins and ends always in the same way, and is the same on all letters, regardless of the sender.

Armed with this basic information, after a bit of research I found out that the code is called Royal Mail 4-State Customer Code. Even more curious, I decided to write a simple decoder for it and, all of a sudden, all the knowledge in signal processing and telecommunications system retuned vivid in my mind, years after I took those classes (which I very much enjoyed, I must admit). Here is how I did it.

TL;DR: I put the code for the rm4sccdec (RM4SCC decoder) on GitHub. Use it at your own risk, as it’s not production-ready and needs some tweaking to reliably scan all types of image. I’ve used Python with OpenCV and numpy.

Step one: image pre-processing

The code does not include any information in the colour, which means we can simply get rid of the colour information and transform the image to greyscale.

Next, we want to maximise the “distance” between the information (the bars) and the noise (the background): this is usually done by thresholding the image. Using a global value for thresholding does not always give good results, especially when different areas of the image are characterised by different illumination. Some more advanced techniques, such as Otsu thresholding method (which I used in my decoder), are a better fit.

Finally, it is possible to have some residual noise, due to the thresholding process, whereby some white pixels are present in black areas and vice-versa. This is called salt’n’pepper noise and can effectively be filtered with median filters, which substitute the value of a pixel with the median of those around. The great advantage is that it preserves the edges of the image.

Step two: feature selection, extraction, and classification

Now we can start thinking about the features defining our symbols. We know we have 4 symbols, which we can call ascenderdescenderlong (Full Height, in the image), and short (Tracker, in the image).

The 4 symbols used in RM4SCC, from Wikipedia The 4 symbols used in RM4SCC, from Wikipedia

The first obvious feature we can select is the vertical position of each bar. After all, that’s the information we need to decode the codeword. However, if we choose the 4 points determining each bar, we’d probably end up complicating the decoding process too much.

An easy way out is to choose the centroid position (just its y-coordinate would be necessary) for each bar. Notice that, though, the long and short bars will share the same feature. If we go along this path, we need another feature to distinguish (at least) the long bar from the short bar. The second obvious feature is therefore the size or, more accurately, the area. This feature will allow us to distinguish easily long from short, but it will be pretty useless for the ascender and the descender.

For the extraction, we need to segment the image and find all the bars, and compute the so-called moments for each of them. The first three moments will be enough for us to get all the features we are interested in.

As a side note, as the segmentation function I have used does not return all segments in order, I had to extract the x-coordinate for each bar so as to be able and sort the vector of symbols.

If the code scanned is reasonably horizontal, we should be able to classify all four symbols pretty easily. For this bit I resorted to K-means clustering, although other classification methods can be used with similar results.

Step three: the actual decoding

If we don’t consider the starting and ending symbols, all symbols inbetween are grouped 4 by 4. For this reason we first need to build a dictionary that maps all valid combinations of 4-symbols group to the correct letter or number.

Finally, a bit of fun when computing the checksum. I translated the algorithm explained here, with the only difference that I wanted to avoid using yet another table to compute the final letter/number so instead I implemented the rule behind it (which boils down to ensuring ‘bit parity’).

Step four: enjoy it!

And possibly fork, improve and re-release :)