Fung’s “I can’t believe it can sort” algorithm and others

Kragen Javier Sitaker, 02021-10-05 (updated 02021-12-30) (5 minutes)

Fung published “the simplest (and most surprising) sorting algorithm ever” this year.

The algorithm from the paper is indeed an astonishingly simple sorting algorithm, and it is indeed surprising that it works, particularly since at first glance it would appear to sort the array backwards (see the paper):

void
cantbelievesort(int *p, size_t n)
{
  int tmp;
  for (size_t i = 0; i < n; i++) {
    for (size_t j = 0; j < n; j++) {
      if (p[i] < p[j]) tmp = p[i], p[i] = p[j], p[j] = tmp;
    }
  }
}

That’s 10 lines of code according to David A. Wheeler’s ‘SLOCCount,’ though arguably 9 would be fairer. Its cyclomatic complexity is 3, with a deepest statement nesting level of 4, and it has 5 variables: three locals plus two arguments.

0000000000001369 <cantbelievesort>:
    1369:   f3 0f 1e fa             endbr64 
    136d:   48 89 f8                mov    %rdi,%rax
    1370:   4c 8d 0c b7             lea    (%rdi,%rsi,4),%r9
    1374:   49 39 c1                cmp    %rax,%r9
    1377:   74 23                   je     139c <cantbelievesort+0x33>
    1379:   31 d2                   xor    %edx,%edx
    137b:   48 39 f2                cmp    %rsi,%rdx
    137e:   74 16                   je     1396 <cantbelievesort+0x2d>
    1380:   8b 08                   mov    (%rax),%ecx
    1382:   44 8b 04 97             mov    (%rdi,%rdx,4),%r8d
    1386:   44 39 c1                cmp    %r8d,%ecx
    1389:   7d 06                   jge    1391 <cantbelievesort+0x28>
    138b:   44 89 00                mov    %r8d,(%rax)
    138e:   89 0c 97                mov    %ecx,(%rdi,%rdx,4)
    1391:   48 ff c2                inc    %rdx
    1394:   eb e5                   jmp    137b <cantbelievesort+0x12>
    1396:   48 83 c0 04             add    $0x4,%rax
    139a:   eb d8                   jmp    1374 <cantbelievesort+0xb>
    139c:   c3                      retq

With gcc -Os 9.3.0, that’s 52 bytes, 19 instructions. gcc -Os has regressed; in 4.7.2, that used to be 44 bytes, 17 instructions:

4007cc:       31 c0                   xor    %eax,%eax
4007ce:       eb 22                   jmp    4007f2 <cantbelievesort+0x26>
4007d0:       8b 0c 87                mov    (%rdi,%rax,4),%ecx
4007d3:       44 8b 04 97             mov    (%rdi,%rdx,4),%r8d
4007d7:       44 39 c1                cmp    %r8d,%ecx
4007da:       7d 07                   jge    4007e3 <cantbelievesort+0x17>
4007dc:       44 89 04 87             mov    %r8d,(%rdi,%rax,4)
4007e0:       89 0c 97                mov    %ecx,(%rdi,%rdx,4)
4007e3:       48 ff c2                inc    %rdx
4007e6:       eb 02                   jmp    4007ea <cantbelievesort+0x1e>
4007e8:       31 d2                   xor    %edx,%edx
4007ea:       48 39 f2                cmp    %rsi,%rdx
4007ed:       75 e1                   jne    4007d0 <cantbelievesort+0x4>
4007ef:       48 ff c0                inc    %rax
4007f2:       48 39 f0                cmp    %rsi,%rax
4007f5:       75 f1                   jne    4007e8 <cantbelievesort+0x1c>
4007f7:       c3                      retq

However, arguably, this sort routine is even simpler. It may or may not be surprising that it works:

void
dumbsort(int *p, size_t n)
{
  int tmp;
  for (size_t i = 1; i < n; i++) {
    if (p[i] < p[i-1]) tmp = p[i], p[i] = p[i-1], p[i-1] = tmp, i = 0;
  }
}

I don’t think I invented it, but I can’t remember who did.

By the same metric, that’s only 8 lines of code (also only 8 by my “fairer” metric), its cyclomatic complexity is only 2, its deepest statement nesting level is only 3, and it has only 4 variables (the same two arguments, but two locals instead of three). It compiles to only 15 amd64 instructions, occupying only 43 bytes:

  4007f8:       b8 01 00 00 00          mov    $0x1,%eax
  4007fd:       eb 1e                   jmp    40081d <dumbsort+0x25>
  4007ff:       4c 8d 04 87             lea    (%rdi,%rax,4),%r8
  400803:       48 8d 54 87 fc          lea    -0x4(%rdi,%rax,4),%rdx
  400808:       41 8b 08                mov    (%r8),%ecx
  40080b:       44 8b 0a                mov    (%rdx),%r9d
  40080e:       44 39 c9                cmp    %r9d,%ecx
  400811:       7d 07                   jge    40081a <dumbsort+0x22>
  400813:       45 89 08                mov    %r9d,(%r8)
  400816:       31 c0                   xor    %eax,%eax
  400818:       89 0a                   mov    %ecx,(%rdx)
  40081a:       48 ff c0                inc    %rax
  40081d:       48 39 f0                cmp    %rsi,%rax
  400820:       72 dd                   jb     4007ff <dumbsort+0x7>
  400822:       c3                      retq

Or, with more recent gcc, 55 bytes, 18 instructions:

000000000000139d <dumbsort>:
    139d:   f3 0f 1e fa             endbr64 
    13a1:   b8 01 00 00 00          mov    $0x1,%eax
    13a6:   48 39 f0                cmp    %rsi,%rax
    13a9:   73 28                   jae    13d3 <dumbsort+0x36>
    13ab:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
    13b2:   00 
    13b3:   4c 8d 04 17             lea    (%rdi,%rdx,1),%r8
    13b7:   48 8d 54 17 fc          lea    -0x4(%rdi,%rdx,1),%rdx
    13bc:   41 8b 08                mov    (%r8),%ecx
    13bf:   44 8b 0a                mov    (%rdx),%r9d
    13c2:   44 39 c9                cmp    %r9d,%ecx
    13c5:   7d 07                   jge    13ce <dumbsort+0x31>
    13c7:   45 89 08                mov    %r9d,(%r8)
    13ca:   31 c0                   xor    %eax,%eax
    13cc:   89 0a                   mov    %ecx,(%rdx)
    13ce:   48 ff c0                inc    %rax
    13d1:   eb d3                   jmp    13a6 <dumbsort+0x9>
    13d3:   c3                      retq

Dylan16807 points out that if you tail-recurse to restart the function instead of resetting the index, it gets even simpler.

Topics

Programming (49 notes)
Performance (22 notes)
Algorithms (14 notes)
Assembly-language programming (11 notes)
Small is beautiful (8 notes)
C (8 notes)
Sorting (3 notes)