Revision e7cb0b4455c85b53aeba40f88ffddcf6d4002498 authored by Johannes Schindelin on 11 May 2018, 14:03:54 UTC, committed by Jeff King on 22 May 2018, 03:50:11 UTC
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.

However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.

It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.

For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).

We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.

To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.

These can be recomputed by running the following Perl script:

-- snip --
use warnings;
use strict;

sub compute_short_name_hash ($) {
        my $checksum = 0;
        foreach (split('', $_[0])) {
                $checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
        }

        $checksum = ($checksum * 314159269) & 0xffffffff;
        $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
        $checksum -= (($checksum * 1152921497) >> 60) * 1000000007;

        return scalar reverse sprintf("%x", $checksum & 0xffff);
}

print compute_short_name_hash($ARGV[0]);
-- snap --

E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
1 parent 0fc333b
History
File Mode Size
nedmalloc
poll
regex
vcbuild
win32
apple-common-crypto.h -rw-r--r-- 2.7 KB
basename.c -rw-r--r-- 1.3 KB
bswap.h -rw-r--r-- 4.6 KB
cygwin.c -rw-r--r-- 407 bytes
cygwin.h -rw-r--r-- 108 bytes
fopen.c -rw-r--r-- 931 bytes
gmtime.c -rw-r--r-- 605 bytes
hstrerror.c -rw-r--r-- 530 bytes
inet_ntop.c -rw-r--r-- 4.8 KB
inet_pton.c -rw-r--r-- 6.8 KB
memmem.c -rw-r--r-- 752 bytes
mingw.c -rw-r--r-- 57.2 KB
mingw.h -rw-r--r-- 14.4 KB
mkdir.c -rw-r--r-- 468 bytes
mkdtemp.c -rw-r--r-- 153 bytes
mmap.c -rw-r--r-- 692 bytes
msvc.c -rw-r--r-- 113 bytes
msvc.h -rw-r--r-- 570 bytes
obstack.c -rw-r--r-- 13.8 KB
obstack.h -rw-r--r-- 19.1 KB
pread.c -rw-r--r-- 433 bytes
precompose_utf8.c -rw-r--r-- 4.6 KB
precompose_utf8.h -rw-r--r-- 1.3 KB
qsort.c -rw-r--r-- 1.2 KB
qsort_s.c -rw-r--r-- 1.3 KB
setenv.c -rw-r--r-- 862 bytes
sha1-chunked.c -rw-r--r-- 362 bytes
sha1-chunked.h -rw-r--r-- 81 bytes
snprintf.c -rw-r--r-- 1.5 KB
stat.c -rw-r--r-- 1.1 KB
strcasestr.c -rw-r--r-- 431 bytes
strdup.c -rw-r--r-- 169 bytes
strlcpy.c -rw-r--r-- 247 bytes
strtoimax.c -rw-r--r-- 214 bytes
strtoumax.c -rw-r--r-- 217 bytes
terminal.c -rw-r--r-- 2.5 KB
terminal.h -rw-r--r-- 142 bytes
unsetenv.c -rw-r--r-- 591 bytes
win32.h -rw-r--r-- 878 bytes
win32mmap.c -rw-r--r-- 1.1 KB
winansi.c -rw-r--r-- 15.8 KB

back to top