https://github.com/SoftwareHeritage/swh-model

sort by:
Revision Author Date Message Commit Date
bf4ab43 identifiers: Remove the deprecated SWHID class Other packages don't use it anymore. 03 March 2021, 09:44:48 UTC
1e924e8 cli: stop using the deprecated SWHID class 03 March 2021, 09:44:27 UTC
8e01199 Add CoreSWHID.to_extended() This is a useful shorthand when generating SWHIDs in tests. 01 March 2021, 09:06:58 UTC
31a8a0f RawExtrinsicMetadata: Use CoreSWHID instead of SWHID for contexts SWHID is deprecated; and CoreSWHID does not support qualifiers at all, so RawExtrinsicMetadata no longer needs to check there are no qualifiers. 01 March 2021, 09:06:58 UTC
752fb81 RawExtrinsicMetadata: Use ExtendedSWHID as target and remove type ExtendedSWHID can identify either a software artifact or an origin, so we no longer need Union[SWHID, str]. Therefore, we no longer need the 'type' attribute, as it was only used to tell when the target is a SWHID and when it's an origin URL. 01 March 2021, 09:06:58 UTC
256bca2 Add a swhid() method to all hashable objects. It can be handy as a shortcut to build SWHID objects. 01 March 2021, 09:06:58 UTC
24b653e Add test for all qualifiers' converters and validators 23 February 2021, 12:48:49 UTC
069b56a Escape semicolon in origin qualifiers. 23 February 2021, 12:48:39 UTC
710fb42 Add test checking SWHID_QUALIFIERS matches the attributes of QualifiedSWHID. 23 February 2021, 12:48:39 UTC
7dead5d Fix qualifier parsing and add tests * Quote/unquote path * Fix line parsing and serializing to properly handle None * Fix error raised by check_visit/check_anchor 23 February 2021, 12:48:39 UTC
172eadb Deduplicate parsing/unparsing tests of the new SWHID classes They were all very similar and only differ in what 'edge' cases they accept 23 February 2021, 12:48:29 UTC
9bcc884 Deduplicate code between CoreSWHID, QualifiedSWHID, and ExtendedSWHID by making them all derive from an abstract class. 23 February 2021, 12:26:27 UTC
d4b20dc Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the last one. It also removes "ori" as a valid object type for CoreSWHID and QualifiedSWHID, as it now only belongs in ExtendedSWHID. 23 February 2021, 12:26:22 UTC
9923765 Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID. It is cleaner, avoids warnings, and will be needed when introducing ExtendedSWHID in a future commit. 19 February 2021, 13:20:58 UTC
8e91759 QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string. 19 February 2021, 12:49:11 UTC
eba8d84 Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the second one 19 February 2021, 12:48:27 UTC
690b7f8 Add new class QualifiedSWHID to replace SWHID, and deprecate the latter. Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) Since migrating from SWHID will break existing code, this commit uses the opportunity to modernize it a little, ie.: * `keyword`-only constructor, to get rid of the hacky default values for `object_type` and `object_id` * enum instead of strings for the object type * `bytes` instead of an hex string for the object id * rename `metadata` to `qualifiers` 19 February 2021, 12:47:25 UTC
758eb88 tests: Clean hashutil._blake2_hash_cache after mocking blake2 functions. Depending on the order in which tests are run, these tests may insert lambdas with mocked blake2 functions in their closure to be inserted in hashutil._blake2_hash_cache; causing all future tests to fail. While this does not happen with the default order of tests, it does when using pytest-xdist. 19 February 2021, 09:48:42 UTC
0c16581 Make explicit Python 3 dependency 02 February 2021, 12:52:16 UTC
1bfdf71 Update persistent identifiers doc with pip install info 29 January 2021, 23:53:27 UTC
cad940d Add swh-journal's model-related test data set in swh-model so it's kept up to date when evolutions are made in the model and thus preventing swh-journal and swh-model to be unecessarly coupled. Related to T2970. 26 January 2021, 15:43:13 UTC
9af451f model: Allow new status values not_found and failed to OriginVisitStatus Related to T2961 20 January 2021, 13:00:33 UTC
1ca92a5 Add an optional type field on OriginVisitStatus object The optional nature of the type will allow to avoid migrating the (db) data model right now then we can have this type field in kafka messages in the origin-visit-status topic Related to T2443 13 January 2021, 10:14:28 UTC
1d0c321 test_identifiers: Reorder SWHID tests. They were mixed in with snapshot tests. 12 January 2021, 11:24:35 UTC
731d10d test_identifiers: Make sure that {directory,revision,release,snapshot}_identifier() doesn't just return a value from the dict. For example, before this commit, you could replace the code of revision_identifier() with this: def release_identifier(release): return release.get("id", b"") and all tests would still pass. 12 January 2021, 11:17:30 UTC
18fde50 Add missing slots=True for Directory. 04 January 2021, 14:48:26 UTC
5746850 SWHID parsing: simplify and deduplicate validation logic Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788 30 December 2020, 12:22:47 UTC
76b744e model: Make all classes slotted. Unfortunately, sphinx (actually, autodoc) only picks up attributes if they fall in any of these cases: 1. are enum variants 2. are in slots 3. are in __dict__ 4. have an annotation 5. are found using its custom parser (see get_object_members in sphinx/ext/autodoc/importer.py) In theory, option 5 should work for us; unfortunately, autodoc only asks the parser the list of members with a comment. And it's not easy to adapt it to ask the parser for all members, because said parser (sphinx/pycode/parser.py) does not return the class qualname (aka. namespace) for members without comments. So, as I don't want to change the interface of sphinx.pycode.parser, this commit switches to relying on option 3, by adding __slots__ for all attr classes. Additionally, this might have some performance/memory improvement (though I did not check) and will further avoid mutation of these objects. 15 December 2020, 13:10:04 UTC
a3b6a64 Drop backwards-compatibility support for RawExtrinsicMetadata.id All reverse dependencies have been updated to avoid using it now, so it can now be removed, paving the way to recycle it into an intrinsic identifier. 16 November 2020, 17:51:31 UTC
8e121bb model.identifiers: Fix one space too many in error message Related to T2729 12 November 2020, 11:59:51 UTC
559a283 identifiers.parse_swhid: Make SWHIDs with whitespaces invalid So parse_swhid raises a ValidationError when that is detected. Related to T2769 12 November 2020, 11:39:04 UTC
fb504b4 identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid Related to T2769 12 November 2020, 11:39:04 UTC
22c7c88 model.identifiers: Improve error messages in case of invalid SWHIDs Related to T2769 12 November 2020, 10:15:56 UTC
47946aa test: Migrate parse_swhid test cases to pytest Related to T2769 10 November 2020, 17:37:51 UTC
4e3fdc0 Throw fewer deprecation warnings for the RawExtrinsicMetadata id attr This avoids throwing the deprecation warning when id and target are present and have the same value, which makes a from_dict / to_dict round-trip throw no deprecation warnings. 27 October 2020, 14:38:26 UTC
9da17a5 Rename the RawExtrinsicMetadata id field to target This backwards-compatible change prepares the transition to give RawExtrinsicMetadata an `id` field that is computed intrinsically from its contents (using the HashableObject mixin). 26 October 2020, 15:57:49 UTC
498a107 Update the HashableObject interface to take the object itself This will enable a gradual enhancement of the functions in identifiers.py to take model objects directly, and return the bytes of the hash instead of an hex representation. 23 October 2020, 15:12:34 UTC
aa84d8d CONTRIBUTORS: Add Antoine Cezar 23 October 2020, 12:48:18 UTC
2b869aa swh identify: add --exclude 23 October 2020, 09:21:35 UTC
9224c8c Make revision/release identifiers explicitly the hash of a manifest This collapses the shared logic between these two identifier computations into a few more explicit steps: - generate data for the manifest (in either identifier computation); - format the manifest (in the new format_manifest function); - hash the manifest (in the new hash_manifest function). This will enable reusing this logic for more object types, as well as stronger typing for the manifest computation. 14 October 2020, 16:49:54 UTC
a251df2 Add a 'unique_key' method on model objects that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`. 08 October 2020, 09:17:35 UTC
bdfde82 cli: make SWHIDParamType return SWHID type instead of string 05 October 2020, 08:29:06 UTC
fe3ec55 tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops 02 October 2020, 14:23:49 UTC
362ebf6 Merge the two test_identifiers.py files. I created one in the wrong directory and didn't see the existing one. 29 September 2020, 10:48:29 UTC
be8f1a5 Adapt cli declaration entrypoint to swh.core 0.3 The addition of '-p no:pytest_swh_core' in pytest.ini is needed to prevent pytest from loading the pytest_swh_core plugin which we do not need here and which would require some more dependencies (e.g. requests). 25 September 2020, 13:24:29 UTC
e0b4b94 model: remove deprecated and unused PID methods Use the new SWHID naming convention instead of SWH PID. 18 September 2020, 15:05:02 UTC
a273718 python: Reorder imports with isort Related to T2610 17 September 2020, 15:57:03 UTC
5bda22f pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 15:57:02 UTC
eff3c63 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:56:04 UTC
7404486 cli: speedup the `swh` cli command startup time move most import statements in functions. Related to T2575. 10 September 2020, 14:19:18 UTC
12fe1f7 model: Fix "unused 'type: ignore' comment" error with mypy 0.782 25 August 2020, 13:32:51 UTC
c85990b Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. 25 August 2020, 08:40:13 UTC
6dd6ace model: Raise error on naive datetimes. We may unknowingly pass naive datetimes to the storage through them, causing the underlying DB to assign them a timezone that might not match the actual one. It already happens in swh.model and swh.loader.package tests. 14 August 2020, 12:12:35 UTC
d1db7b9 model.Content.to_dict: Remove ctime entry when it's None Same as for the field data, it helps for code not yet migrated to use model object. 07 August 2020, 07:53:31 UTC
b1a16b1 model: Add Sha1 alias Related to T645 07 August 2020, 07:50:45 UTC
09dcc04 model: Add final object_type field on metadata related model objects 06 August 2020, 17:00:21 UTC
37cdd84 setup.py: Really use the correct keyword Related to T2105 06 August 2020, 16:43:30 UTC
3b2e6c0 setup.py: Use the correct keywords Related to T2105 06 August 2020, 16:13:47 UTC
dab3d72 setup.py: Migrate from vcversion from setuptools-scm Related to T2105 04 August 2020, 12:06:52 UTC
f9fc106 add ImmutableDict.__repr__ It can help in pytest's diffs 30 July 2020, 13:41:29 UTC
b58d901 Fix incorrectly typed null constants in extra_headers byte strings 29 July 2020, 10:46:18 UTC
8f609e5 Import Mapping from collections.abc instead of collections to fix a deprecationg warning. 29 July 2020, 10:46:09 UTC
81f9fbc Declare pytest markers to prevent warnings 29 July 2020, 10:41:12 UTC
3b2d72c Rename MetadataAuthorityType.DEPOSIT to MetadataAuthorityType.DEPOSIT_CLIENT. D3560 20 July 2020, 09:35:27 UTC
bf43536 Rework dia -> pdf pipeline for inkscape 1.0 - Use dia directly to convert from .dia to .svg (inkscape would use dia via a plugin anyway) - Add proper runes to detect inkscape >= 1 and use the export options for that. 09 July 2020, 17:35:21 UTC
0547a51 identifiers: Add to_dict method to SWHID class 08 July 2020, 14:18:40 UTC
52ef52e Use attr instead of NamedTuple to generate SWHID. As NamedTuple inherits from tuple, msgpack serializes it like a tuple, which makes it indistinguishable from a tuple when deserializing, which is an issue for the RPC API. 07 July 2020, 15:34:41 UTC
bea256e Make SWHID immutable and hashable. 07 July 2020, 13:12:44 UTC
06837d5 Implement ImmutableDict.__hash__. 07 July 2020, 13:10:53 UTC
c4dad17 Allow passing an ImmutableDict as argument to ImmutableDict's constructor. It allows easy conversion of Union[ImmutableDict, Dict] to ImmutableDict. 07 July 2020, 13:04:54 UTC
9e475a7 Implement to_dict and from_dict for metadata-related classes. 07 July 2020, 11:31:05 UTC
af0dd1a Add a new ImmutableDict class, and use it in model objects. So they are truly immutable now. 07 July 2020, 11:31:05 UTC
78fc5f7 Add raw metadata to the model. This will allow swh-storage to have a signature for *_metadata_add that is consistent with other *_add endpoints. 07 July 2020, 09:48:19 UTC
a7d9aca Extract the extra_headers from metadata on the Revision model class Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers. 06 July 2020, 09:57:55 UTC
1ff0516 identifiers: Rename some functions and types related to SWHIDs When Software Heritage persistent identifiers were introduced, they were not yet abbreviated as SWHIDs. Now that abbreviation is growing adoption, rename some functions and types in swh.model.identifiers for consistency: - PersistentId -> SWHID - persistent_identifier -> swhid - parse_persistent_identifier -> parse_swhid Backward compatibility with previous naming is maintained but deprecation warnings are introduced to encourage the use of the new names. Numerous variables in swh.model codebase have also been renamed accordingly. Also rework and improve documentation. 03 July 2020, 12:11:32 UTC
8863b5c Refactor common loader behavior within from_disk.iter_directory 02 July 2020, 13:09:50 UTC
363b165 Unify object_type some more within the merkle and from_disk modules 02 July 2020, 13:03:04 UTC
40a40f5 model.OriginVisit: Drop obsolete fields Related to T2310 29 June 2020, 09:08:06 UTC
e632abe Tag model entities with their "object_type" this aims at preventing constant usage of isinstance() based dispatch code when writing generic code handling model entities. For example, the "object_type" argument of JournalWriter.write_addition() has become superflous now we only pass model entities, etc. This idea comes olasd's reading of mypy doc: https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions This comes with a refactoring of from_dict.DiskBackedContent to make it *not* inherit from model.Content: object_type being Final, it cannot be overloaded. 24 June 2020, 15:39:02 UTC
661b7c2 OriginVisitStatus: Allow "created" status Related to T2310 24 June 2020, 07:16:50 UTC
636f8c2 model.OriginVisit: Make obsolete fields optional Related to T2310 23 June 2020, 15:29:53 UTC
f349bdc swh.model.model.OriginVisit: Drop the dateutil.parser.parse use 22 June 2020, 08:14:30 UTC
ba0c4e1 model.hypothesis_strategies: Make metadata always none on origin_visit This is not used. This is broken storage wise (origin-visit-add does not deal correctly with it and it so happens there is no test around it). And finally, this will soon go away with T2310. 16 June 2020, 17:10:53 UTC
f723eb1 Fix the model: Revision.message can be None And adapt the revisions_d() strategy accordingly. 16 June 2020, 08:35:02 UTC
b70b281 Fix message generation in hypothesis strategy releases_d() This can be None, according to the model. 16 June 2020, 08:35:02 UTC
5c5f34f Use the optional() strategy instead of one_of(none(), ...) when possible for the sake of consistency. 16 June 2020, 08:34:54 UTC
a427e18 Allow negative_utc to be None in normalize_timestamp() thus in TimestampWithTimezone.from_dict(). This is needed to help consuming existing (invalid) messages from kafka. Warning: tests added in this revision do not cover the whole normalize_timestamp() function. 15 June 2020, 07:40:43 UTC
3d9f694 Use Tuple instead of List in model declarations. This is a step forward having model objects, declared as frozen, immutable. This requires attrs_strict >= 0.0.7. 03 June 2020, 09:32:05 UTC
340656d Fix origin_visit hypothesis strategies the visit attribute is expected to be strictly positive. 03 June 2020, 09:23:00 UTC
a95646f Exclude [Skipped]Content.ctime from hash/eq computation this attribute is not an intrinsic property of a content object, so it should not be used when comparing or hashing. 29 May 2020, 15:14:31 UTC
29312df Add support for model object anonymization Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable. For Person, the method returns a Person whith hashed fullname (and unset name and email). For Revision and Release, the method returns an anonymized version of the object, i.e. with instance of Person replaced by anonymized ones. 20 May 2020, 14:28:01 UTC
cce3036 SWHID spec: fix typos ";;" which made some examples fail 14 May 2020, 13:47:46 UTC
091498e Make aware_datetimes() generate only ISO8601-encodable datetimes. 05 May 2020, 10:03:39 UTC
9f5d266 SWHID spec: full reread Reviewers: rdicosmo Reviewed By: rdicosmo Differential Revision: https://forge.softwareheritage.org/D3108 30 April 2020, 17:06:41 UTC
b80b135 setup.py: add documentation link 29 April 2020, 16:32:31 UTC
08fd228 hypothesis_strategies: Generate aware datetimes instead of naive ones. Production should only use aware datetimes. 29 April 2020, 11:02:22 UTC
0fad886 doc: check-in IANA registration template for the "swh" URI scheme Closes T1003 29 April 2020, 07:34:30 UTC
8367eec Restructure SWHID documentation in preparation for T2385 - merge grammars into a single one - explain better that SWHIDs are made up of core identifier + qualifiers - separate qualifier into context and fragment onex - add reference to swh-identify 28 April 2020, 18:47:50 UTC
f97d216 SWHID spec: bump version to 1.3 and add last modified date 28 April 2020, 14:04:42 UTC
d230938 SWHID spec: make SWHIDs plural where needed 28 April 2020, 14:04:19 UTC
back to top