https://github.com/SoftwareHeritage/swh-model
Revision 574685052348bfc6ed28570f06b9cc4302dfde27 authored by Stefano Zacchiroli on 19 December 2020, 10:54:59 UTC, committed by Stefano Zacchiroli on 30 December 2020, 12:22:47 UTC
Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788
1 parent 76b744e
Tip revision: 574685052348bfc6ed28570f06b9cc4302dfde27 authored by Stefano Zacchiroli on 19 December 2020, 10:54:59 UTC
SWHID parsing: simplify and deduplicate validation logic
SWHID parsing: simplify and deduplicate validation logic
Tip revision: 5746850
File | Mode | Size |
---|---|---|
bin | ||
docs | ||
swh | ||
.gitignore | -rw-r--r-- | 137 bytes |
.pre-commit-config.yaml | -rw-r--r-- | 1021 bytes |
AUTHORS | -rw-r--r-- | 112 bytes |
CODE_OF_CONDUCT.md | -rw-r--r-- | 3.3 KB |
CONTRIBUTORS | -rw-r--r-- | 45 bytes |
LICENSE | -rw-r--r-- | 34.3 KB |
MANIFEST.in | -rw-r--r-- | 157 bytes |
Makefile | -rw-r--r-- | 163 bytes |
Makefile.local | -rw-r--r-- | 24 bytes |
README.md | -rw-r--r-- | 620 bytes |
mypy.ini | -rw-r--r-- | 571 bytes |
pyproject.toml | -rw-r--r-- | 237 bytes |
pytest.ini | -rw-r--r-- | 135 bytes |
requirements-cli.txt | -rw-r--r-- | 30 bytes |
requirements-test.txt | -rw-r--r-- | 26 bytes |
requirements.txt | -rw-r--r-- | 318 bytes |
setup.cfg | -rw-r--r-- | 247 bytes |
setup.py | -rwxr-xr-x | 2.5 KB |
tox.ini | -rw-r--r-- | 488 bytes |
Computing file changes ...