https://github.com/SoftwareHeritage/swh-model

sort by:
Revision Author Date Message Commit Date
08c69e6 Add missing `content_git_object` This would be useful for the IPFS bridge, and seems good to complete the API in any sense. 27 April 2022, 16:14:23 UTC
2c0701c Bump mypy to v0.942 26 April 2022, 11:40:38 UTC
f82a217 pre-commit: Remove codespell commit-msg hook That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it. 21 April 2022, 11:39:46 UTC
85f36f8 Add a SWH_MODEL_OBJECT_TYPES map {cls.objet_type: cls} in model.py it's a piece of information used several times in the swh stack. 11 April 2022, 08:47:46 UTC
9231cd0 Add .git-blame-ignore-revs file with automatic reformatting commits 08 April 2022, 13:15:31 UTC
4c39334 python: Reformat code with black 22.3.0 Related to T3922 08 April 2022, 13:15:09 UTC
953c403 pre-commit, tox: Bump black from 19.10b0 to 22.3.0 black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922 08 April 2022, 13:13:46 UTC
ba7af74 model/cli: Format click docstring 31 March 2022, 16:34:14 UTC
fb61d50 cli: Indent sphinx docstring with 3 leading spaces (not 4) 30 March 2022, 11:29:35 UTC
cd08c5a cli: Indent sphinx docstring 30 March 2022, 10:59:33 UTC
71e4277 Drop \b from docstring Hypothesis: This makes the documentation build fail. 30 March 2022, 10:24:59 UTC
d78d123 docs/cli: Fix misindented instructions 30 March 2022, 09:29:47 UTC
12257c1 docs/cli: Align and fix cli documentation This changes the deprecated :show-nested: instruction to the :nested: one [1]. This also fixes warning about misdefined block [2] [1] https://sphinx-click.readthedocs.io/en/latest/usage/#directive-click [2] ``` 10:38:38 Warning, treated as error: 10:38:38 /var/lib/jenkins/workspace/DMOD/tests-on-diff/docs/cli.rst:14:Literal block expected; none found. 10:38:38 make: *** [../../swh-docs/Makefile.sphinx:32: sphinx/html] Error 2 10:38:38 make: Leaving directory '/var/lib/jenkins/workspace/DMOD/tests-on-diff/docs' 10:38:38 ERROR: InvocationError for command '/usr/bin/make -I ../.tox/sphinx/src/swh-docs/swh/ -C docs' (exited with code 2) ``` 30 March 2022, 08:56:58 UTC
dbce3cc Add support for None as author or committer of a Revision committer=None happens on some malformed commits generated by old dgit versions ; and it is possible for author=None to happen for the same reason. 23 March 2022, 09:56:50 UTC
f6ad1ed pytest: Exclude build directory for tests discovery Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests. 22 March 2022, 10:14:32 UTC
bc3831f Exclude name and email attributes from People comparison these fields are computed attributes and may be removed from the backend storage. Helps writing tests. 18 March 2022, 09:42:10 UTC
dc5485e Add objects with non-None raw_manifest to TEST_OBJECTS This will be used by swh.storage.backfill's tests. 16 March 2022, 11:19:11 UTC
5359c43 Remove deprecated property 'TimestampWithTimezone.offset' 16 March 2022, 09:29:42 UTC
07c7b4f Revert "Restore 'offset' and 'negative_utc' arguments and make them optional" This reverts commit a0f5436273ef6f5b62a388ae131ed5afa7287d00. This means this commit removes the 'offset' and 'negative_utc' arguments. It also removes the 'negative_utc' attribute (which is not used anymore), but keeps an 'offset' property, which is an alias to 'offset_minutes()'. This is only to keep this commit readable; the next commit will remove this alias. 16 March 2022, 09:29:42 UTC
118c8a4 docs: Explain we prefer dir SWHIDs over rev/rel. 10 February 2022, 16:14:04 UTC
bd38e6f pre-commit: Bump hooks and add new one to check commit message spelling To install the new hook: $ pre-commit install -t commit-msg 10 February 2022, 16:02:56 UTC
839e0dc Add missing __slots__ to HashableObjectWithManifest This should slightly reduce the memory used by dir/rev/rel objects 10 February 2022, 11:13:30 UTC
a0674ad Fix crash in check_entries. 26 January 2022, 11:16:50 UTC
7364adc Fix f-string 26 January 2022, 11:07:46 UTC
a654060 Add method 'TimestampWithTimezone.offset_minutes' A future commit will remove the 'offset' attribute, and rename 'offset_bytes' to 'offset'. Modules using the `.offset` attribute will need to switch to this new method to keep working. 21 January 2022, 11:53:12 UTC
f43f9ac model: Add support for more edge cases in _parse_offset_bytes 17 January 2022, 16:33:38 UTC
e1c3fe8 Fix extlink errors Sphinx >=4.4.0 raises warnings (which are errors for us) when using warnings instead of an extlink role defined in the configuration 17 January 2022, 16:26:17 UTC
a0f5436 Restore 'offset' and 'negative_utc' arguments and make them optional This allows keeping compatibility with existing users of the TimestampWithTimezone constructor 14 January 2022, 12:44:59 UTC
be894c6 TimestampWithTimezone: Make 'offset_bytes' required and remove 'negative_utc'. 'offset' becomes a property instead of an attribute and constructor argument. This also removes both from the output of `.to_dict()`. This is step 6 of https://forge.softwareheritage.org/T3752 This will break packages that still use the constructor directly, ie. swh-storage and swh-loader-git (and tests of swh-loader-core and swh-loader-svn) 13 January 2022, 11:17:48 UTC
dbf185c Fix TimestampWithTimezone.from_dict() on datetimes before 1970 with non-integer seconds 12 January 2022, 13:41:38 UTC
94b00d6 docs: Add anchors to important sections of persistent-identifiers.rst swh-deposit needs to link to some of them. 11 January 2022, 13:27:05 UTC
56460e1 git_objects: Use raw offset_bytes to format dates, and remove format_offset() This is needed when recomputing the manifest of git objects with weird offsets. 07 January 2022, 12:57:52 UTC
c6ec4d8 hashutil: drop all pre-3.6 blake2 workarounds blake2s and blake2b have been provided by the stdlib hashlib since Python 3.6, and we declare 3.7 as minimum Python version supported. 22 December 2021, 14:46:01 UTC
9d517c1 hypothesis_strategies: Generate raw_manifest 22 December 2021, 12:20:53 UTC
2abbf73 model: Exclude 'raw_manifest' from dictionaries when it is null 1. Most objects do not need it so it's a waste of space 2. This means we just extend the existing format (some objects will have that key in their dict) instead of changing it (retroactively adding it to all objects) 22 December 2021, 12:20:52 UTC
f6144a1 model: Add a raw_manifest attribute This will be used to store the original manifest of 'weird' git objects, when we cannot reasonably represent them otherwise. 22 December 2021, 12:20:52 UTC
9e1e14a data-model: Remove mention of modification timestamps We don't store those 21 December 2021, 14:15:18 UTC
e342397 hypothesis_strategies: Generate only consistent directory entry permissions. 21 December 2021, 13:38:47 UTC
9d1d4a3 docs: Update the data model description Revision and release do not generally allow 'arbitrary' metadata; and it was missing ExtIDs and REMD 17 December 2021, 13:56:40 UTC
abb089d Pin mypy and drop type annotations which makes mypy unhappy This also drops spurious copyright headers to those files if present. Related to T3812 16 December 2021, 14:56:37 UTC
9b21d2d test_model: Fix compatibility with pytest-xdist Using .now() produces data that differs between xdist processes, as files are imported after forking, and xdist requires consistent data across processes. 15 December 2021, 13:01:20 UTC
59ed64b model: Add a check() method to model objects It calls attr.validate() (which calls the validators), and recomputes the hash of HashableObject instances. A future commit will also make it check the raw_manifest attribute when relevant 08 December 2021, 15:24:44 UTC
5eb6539 model: Deduplicate calls to hashlib. It's just simpler this way 08 December 2021, 15:20:56 UTC
2d0105b model: Add support for None to the type checker For the sake of completeness (a future commit may depend on it). 08 December 2021, 15:17:40 UTC
c322904 Add attribute TimestampWithTimezone.offset_bytes, to store raw Git offsets For now it is filled from 'offset' and 'negative_utc', but it will replace them in a future commit. This is to simplify and add support for more 'weird' offsets we do not currently support. 08 December 2021, 14:46:29 UTC
52d6e0a hypothesis_strategies: Ensure to generate valid directory entry name (again) 08 December 2021, 14:46:09 UTC
4803d78 Revert "hypothesis_strategies: Ensure to generate valid directory entry name" This reverts commit c525484e473720ffcde89534d3ba89093bc8b016. 08 December 2021, 14:46:09 UTC
f6e0a28 from_disk: Implement Directory.__contains__ It enables to easily check if a path exists from a root directory. 08 December 2021, 13:42:02 UTC
c525484 hypothesis_strategies: Ensure to generate valid directory entry name Since rDMOD8d96dfedee34203a4118e48a6208ee507511590b, directory entry names are validated in DirectoryEntry model and thus must not contain any slash characters. So update directory_entries_d hypothesis strategy to ensure such names are generated to avoid flaky tests. 07 December 2021, 10:34:39 UTC
37364c2 hashutil: Add support for md5 sum Enable to compute md5 sum through the hashutil.MultiHash class. Nevertheless, md5 is not put in DEFAULT_ALGORITHMS set and must be explicitely requested by client code. Related to T2400 06 December 2021, 18:26:31 UTC
243520d test_hashutil: Port tests from unittest to pytest 06 December 2021, 18:26:31 UTC
8d96dfe model: Validate Directory entry names I don't know any instance of these, but there is no harm in checking them. 01 December 2021, 17:20:47 UTC
2ffe5db Give model and swhid objects a nicer repr() 1. hashes are now repr()ed as `hash_to_bytes("1234...")` instead of b"\x12\x34..."` 2. SWHID objects are now repr()ed as `CoreSWHID.from_string('swh:1:...:1234...')` instead of `CoreSWHID(scheme='swh', version='1', object_type=..., object_id=b'\x12\x34')` 3. enums are now repr()ed as `MyEnum.NAME` instead of "<MyEnum.NAME: 'value'>` Thanks to these three changes, using repr() on a model object now prints a string that can be pasted directly in a `.py` file to write a new test case. 05 November 2021, 10:28:53 UTC
916627e type_validator: Re-allow subclasses The previous replaced attrs-strict's type validator with our own, stricter and faster, validator. However, the strictness can be a burden in other packages; for example, swh-storage tests rely on it to insert dummy data that raises exception when accessed, and it would be hard to do while using the exact expected type. This commit reverts the strict behavior, but keeps the performance optimization, by always checking with type equality, but in case type equality fails (which would raise an error before this commit), it gives the value a 'second chance', by trying isinstance. This means that, outside tests, isinstance should not be used at all, or very rarely. 01 October 2021, 12:31:41 UTC
734b081 model: Replace attrs-strict with stricter validation This reimplements attrs_strict.type_validator(), using type equality instead of isinstance. This makes my checksum validation script (that mostly just instantiates model objects, computes a checksum, then discard) run twice as fast. 28 September 2021, 09:51:15 UTC
e30eb7d persistent-identifiers.rst: Update references to manifest formats 24 September 2021, 10:08:47 UTC
7034b16 Add module-level docstrings. 23 September 2021, 18:22:32 UTC
4968b74 Move SWHID-related tests to test_swhids.py For consistency, as the classes are now in swhids.py 23 September 2021, 18:01:09 UTC
d5fd652 git_objects: Fix type annotations to accept the old dicts + raise warnings 23 September 2021, 15:13:38 UTC
f56becc Deprecate identifiers.py 1. Add a warning 2. Move identifier/manifest documentation to git_objects.py 3. Remove all imports of that module. Motivation: * SWHID classes were moved to swhids.py * manifest computation functions were moved to git_objects.py * Only reexports and trivial wrappers of model.py remain 23 September 2021, 14:52:31 UTC
df73e74 test_identifiers.py: Make sha1_git literals more consistent. 23 September 2021, 12:42:26 UTC
510df60 Remove identifier_to_bytes and identifier_to_hex They are not used anywhere. 23 September 2021, 12:42:26 UTC
9e8a547 Move manifest computation functions from identifiers.py to git_objects.py Since they are used by the vault for non-identifier-related stuff, I think it makes sense to move them to a new module. identifiers.py is now an empty shell, as all its features were moved to other modules and it only contains reexports and backward-compat functions. Therefore, it should be considered deprecated from now on. 23 September 2021, 12:42:26 UTC
57ae405 Refactor identifiers & model to make *_git_object() functions work on model classes instead of dicts Since we now use these classes everywhere, computing hashes required using to_dict() just to compute identifiers, which can be a performance bottleneck in code computing many checksums. 23 September 2021, 12:42:10 UTC
6a72f88 test_identifiers.py: Fix/update malformed data dicts A future commit will make identifier computation use the attrs classes, which are strict about what they accept. 23 September 2021, 09:08:36 UTC
9ec6832 Move SWHID classes and functions from identifiers.py to swhids.py identifiers.py initially worked only on bare sha1_git. I chose to add the SWHID classes in that module because of the name, but the SWHID code didn't actually interact with the other functions in the module, so it now feels out of place to me. 23 September 2021, 09:08:36 UTC
0dd33cd Add bazaar as supported revision type We're about to have a Bazaar loader 23 September 2021, 08:56:38 UTC
b6f5e30 HashableObject: Add type annotation for 'id' attribute This class is a mixin that only works with classes that define this attribute, so it makes sense to declare it. It also allows generic functions (that take a HashableObject parameter) to access it without 'type: ignore'. 16 September 2021, 12:09:15 UTC
e879497 Run Black 16 September 2021, 12:07:41 UTC
a32652f add a CVS revision type for use with the CVS loader 27 July 2021, 12:23:28 UTC
1545ef7 Add an extid_version field to ExtIDs This allows distinguishing multiple potential versions of the mapping between external objects and their counterparts archived in Software Heritage, for instance when a loader has a backwards-incompatible change that should result in objects being loaded again. The field defaults to zero, in which case it's backwards-compatible with the previous implementation in terms of identifier computation. 23 July 2021, 14:08:30 UTC
153c6e8 make deduplication optional when iterating over the merkle tree 02 July 2021, 09:51:55 UTC
cfb3073 hypothesis_strategies: generate non-ASCII IRIs for origin and authority 'urls'. We agreed a while ago they should be IRIs and not just URIs. This will trigger crashes in swh.storage.cassandra, as currently expects (wrongly) that origin urls are ASCII. 25 June 2021, 15:24:40 UTC
4009b3a hypothesis_strategies: Restrict size and alphabets for metadata_fetchers and raw_extrinsic_metadata * empty fetcher name or version is not accepted by cassandra (and is nonsensical anyway) * ditto for non-ASCII (and any non-printable is nonsensical) * null bytes/chars are accepted by neither postgresql or cassandra 25 June 2021, 15:08:16 UTC
90b477e hypothesis_strategies: Generate None metadata instead of {} This is the only value we should use from now on; and the value is ignored by swh-storage anyway. 25 June 2021, 14:13:37 UTC
baff6ba hypothesis_strategies: Add raw_extrinsic_metadata() strategy It will be used by swh-web. 25 June 2021, 09:22:32 UTC
e4566a6 from_disk: get swhid from Content/Directory objects Closes T3393 21 June 2021, 15:06:09 UTC
e09446a encode exclude patterns before extracting regex objects - add typing annotation to avoid such error in the future Fixes T3383 15 June 2021, 16:23:43 UTC
8ec0cf6 Add a TimestampWithTimezone.to_datetime() method 15 June 2021, 09:22:38 UTC
428c170 Fix normalize_timestamp() for datetime < epoch with microsecond>0 the problem was for datetime<epoch, the timestamp is negative, but since it's a float that includes the microseconds, if both are true (< epoch and microsecond > 0), then the computed (int) timestamp was off by one. Add dedicated tests for this. 15 June 2021, 09:22:38 UTC
ae50e43 cli: add recursive option 11 June 2021, 14:25:24 UTC
8540c67 mypy: Fix errors with release >= v0.900 09 June 2021, 13:36:37 UTC
4808fc2 Fix snapshot entries in swh_model_data test data make sure the snapshot id in OriginVisitStatus refers to existing Snapshot objects. 19 May 2021, 13:21:18 UTC
96cc355 tox: Disable coverage in py3-minimal If run after py3-full, it overwrites the full coverage info, so most lines are incorrectly reported as uncovered in Phabricator diffs. 11 May 2021, 09:24:43 UTC
8c904dc identifiers: Expose git_object instead of manifest The git_object is what will be actually useful to the vault. It's also easier to test, because test_identifier.py has the entire git_object in its test data. 11 May 2021, 08:59:28 UTC
523ab64 identifiers: Expose manifest computation Before this commit, manifests were only computed internally before hashing, so they were not available to outside modules. This makes testing the module very painful, because identifier functions can only be tested by checking the hash; so test failures did not show mismatches between the computed manifest and the expected one. Additionally, the 'git bare cooker' of the vault is likely to use these as well, as it needs to format git objects in the same format. 11 May 2021, 08:33:36 UTC
31cb72e Blacklist attr 21.1.0 There is a regression that breaks attr.evolve() when updating attributes that contain an attr class; which we use (eg. for Person or TimestampWithTimezone). v21.2.0 is expected to fix the issue, but won't be released immediately: https://github.com/python-attrs/attrs/issues/804#issuecomment-833471190 06 May 2021, 12:18:22 UTC
df036ef docs/persistent-identifiers: Add guidelines for fixing invalid SWHIDs. 30 April 2021, 11:01:02 UTC
f7e9d5c tox: Add sphinx environments to check sane doc build Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258 28 April 2021, 12:01:35 UTC
446bd2b Fix swh_model_data hardcoded id values and add a test to keep them correct. 23 April 2021, 15:24:08 UTC
1f6b3b9 swh_model_data: add parents to test revision 15 April 2021, 14:37:01 UTC
8d0352c Fix various Sphinx warnings 15 April 2021, 08:20:39 UTC
74b024f identifiers: Fix parsing of SWHID qualifier value containing '=' According to the SWHID specification, it is not forbidden for a qualifier value to contain a '=' character (for instance in origin URL). So update parsing code to handle that special case. 13 April 2021, 12:02:33 UTC
15d5bab identifiers: Fix some invalid ValidationError template string formats Some ValidationError exceptions could not be serialized to string due to these format errors. Related to T3234 13 April 2021, 10:43:55 UTC
f2dba17 docs: Ask readers to install swh.model[cli] to fully use swh-identify Otherwise, they will get an error asking them to install Click (or Dulwich if Click installed and they use -t snapshot) 12 April 2021, 11:18:58 UTC
27a05d6 tox: Check swh-identify can run even if Dulwich isn't installed 12 April 2021, 11:18:58 UTC
c62f13f swh-identify: Hide tracebacks if Click or Dulwich is not installed And show nice human-readable errors instead 09 April 2021, 13:14:52 UTC
eeedac7 Remove accidental dependency of 'swh-identify' on swh-core Was added in be8f1a559d8209710a08ca48d93b7f513fa1c42f 08 April 2021, 14:16:42 UTC
9523be0 Model test data: add Release with no author/date Some releases don't have authors and date fields, this case should be checked in the tests. 26 March 2021, 11:05:51 UTC
af5e461 Truncate RawExtrinsicMetadata.discovery_date to a second This truncation is already enshrined at the identifier level. Truncate the object itself as well, to reduce the possibility multiple different metadata objects with the same identifier. 18 March 2021, 09:57:02 UTC
back to top