TIL, when you call
sort -k3, you’re not just sorting by the third field, but by whatever the value between the third field up to the end of the line is.
Not only that, in the case of ties, by default it will use also the first field.
Consider this example.
$ cat data theta AAA 2 gamma AAA 2 alpha BBB 2 alpha AAA 3
$ sort data -k2 --debug sort: using simple byte comparison gamma AAA 2 ______ ___________ theta AAA 2 ______ ___________ alpha AAA 3 ______ ___________ alpha BBB 2 _______ ____________
Notice I’ve also added
--debug, to show which parts are used in the comparisons.
So, first comes “AAA 2”, then “AAA 3”.
Also, for the two lines that have “AAA 2”, the first field is used, so “gamma” comes before “theta”.
Forget about the ties for now.
To consider field 2 only, rather than field 2 and all following fields, you need to specify a stop. This is done by adding “,2” to the
-k switch. More in general,
-km,n means “sort by field m up to n, boundaries included”.
$ sort data -k2,2 --debug sort: using simple byte comparison alpha AAA 3 ____ ___________ gamma AAA 2 ____ ___________ theta AAA 2 ____ ___________ alpha BBB 2 ____ ____________
As you can see, field 2 only is taken into account at first.
“AAA 3” comes before “AAA 2” because, being a tie, the first field is used as a second comparison.
Taking this a step further, to actually only consider field 2 and resort to the original order in case of ties, that is, to have a stable sort, you need to pass the
$ sort data -k2,2 -s --debug sort: using simple byte comparison theta AAA 2 ____ gamma AAA 2 ____ alpha AAA 3 ____ alpha BBB 2 ____
This look similar to the first snippet, but actually the first two lines in the output are swapped. Here they appear in the original order.