https://github.com/SoftwareHeritage/swh-model

sort by:
Revision Author Date Message Commit Date
18fde50 Add missing slots=True for Directory. 04 January 2021, 14:48:26 UTC
5746850 SWHID parsing: simplify and deduplicate validation logic Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788 30 December 2020, 12:22:47 UTC
76b744e model: Make all classes slotted. Unfortunately, sphinx (actually, autodoc) only picks up attributes if they fall in any of these cases: 1. are enum variants 2. are in slots 3. are in __dict__ 4. have an annotation 5. are found using its custom parser (see get_object_members in sphinx/ext/autodoc/importer.py) In theory, option 5 should work for us; unfortunately, autodoc only asks the parser the list of members with a comment. And it's not easy to adapt it to ask the parser for all members, because said parser (sphinx/pycode/parser.py) does not return the class qualname (aka. namespace) for members without comments. So, as I don't want to change the interface of sphinx.pycode.parser, this commit switches to relying on option 3, by adding __slots__ for all attr classes. Additionally, this might have some performance/memory improvement (though I did not check) and will further avoid mutation of these objects. 15 December 2020, 13:10:04 UTC
a3b6a64 Drop backwards-compatibility support for RawExtrinsicMetadata.id All reverse dependencies have been updated to avoid using it now, so it can now be removed, paving the way to recycle it into an intrinsic identifier. 16 November 2020, 17:51:31 UTC
8e121bb model.identifiers: Fix one space too many in error message Related to T2729 12 November 2020, 11:59:51 UTC
559a283 identifiers.parse_swhid: Make SWHIDs with whitespaces invalid So parse_swhid raises a ValidationError when that is detected. Related to T2769 12 November 2020, 11:39:04 UTC
fb504b4 identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid Related to T2769 12 November 2020, 11:39:04 UTC
22c7c88 model.identifiers: Improve error messages in case of invalid SWHIDs Related to T2769 12 November 2020, 10:15:56 UTC
47946aa test: Migrate parse_swhid test cases to pytest Related to T2769 10 November 2020, 17:37:51 UTC
4e3fdc0 Throw fewer deprecation warnings for the RawExtrinsicMetadata id attr This avoids throwing the deprecation warning when id and target are present and have the same value, which makes a from_dict / to_dict round-trip throw no deprecation warnings. 27 October 2020, 14:38:26 UTC
9da17a5 Rename the RawExtrinsicMetadata id field to target This backwards-compatible change prepares the transition to give RawExtrinsicMetadata an `id` field that is computed intrinsically from its contents (using the HashableObject mixin). 26 October 2020, 15:57:49 UTC
498a107 Update the HashableObject interface to take the object itself This will enable a gradual enhancement of the functions in identifiers.py to take model objects directly, and return the bytes of the hash instead of an hex representation. 23 October 2020, 15:12:34 UTC
aa84d8d CONTRIBUTORS: Add Antoine Cezar 23 October 2020, 12:48:18 UTC
2b869aa swh identify: add --exclude 23 October 2020, 09:21:35 UTC
9224c8c Make revision/release identifiers explicitly the hash of a manifest This collapses the shared logic between these two identifier computations into a few more explicit steps: - generate data for the manifest (in either identifier computation); - format the manifest (in the new format_manifest function); - hash the manifest (in the new hash_manifest function). This will enable reusing this logic for more object types, as well as stronger typing for the manifest computation. 14 October 2020, 16:49:54 UTC
a251df2 Add a 'unique_key' method on model objects that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`. 08 October 2020, 09:17:35 UTC
bdfde82 cli: make SWHIDParamType return SWHID type instead of string 05 October 2020, 08:29:06 UTC
fe3ec55 tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops 02 October 2020, 14:23:49 UTC
362ebf6 Merge the two test_identifiers.py files. I created one in the wrong directory and didn't see the existing one. 29 September 2020, 10:48:29 UTC
be8f1a5 Adapt cli declaration entrypoint to swh.core 0.3 The addition of '-p no:pytest_swh_core' in pytest.ini is needed to prevent pytest from loading the pytest_swh_core plugin which we do not need here and which would require some more dependencies (e.g. requests). 25 September 2020, 13:24:29 UTC
e0b4b94 model: remove deprecated and unused PID methods Use the new SWHID naming convention instead of SWH PID. 18 September 2020, 15:05:02 UTC
a273718 python: Reorder imports with isort Related to T2610 17 September 2020, 15:57:03 UTC
5bda22f pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 15:57:02 UTC
eff3c63 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:56:04 UTC
7404486 cli: speedup the `swh` cli command startup time move most import statements in functions. Related to T2575. 10 September 2020, 14:19:18 UTC
12fe1f7 model: Fix "unused 'type: ignore' comment" error with mypy 0.782 25 August 2020, 13:32:51 UTC
c85990b Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. 25 August 2020, 08:40:13 UTC
6dd6ace model: Raise error on naive datetimes. We may unknowingly pass naive datetimes to the storage through them, causing the underlying DB to assign them a timezone that might not match the actual one. It already happens in swh.model and swh.loader.package tests. 14 August 2020, 12:12:35 UTC
d1db7b9 model.Content.to_dict: Remove ctime entry when it's None Same as for the field data, it helps for code not yet migrated to use model object. 07 August 2020, 07:53:31 UTC
b1a16b1 model: Add Sha1 alias Related to T645 07 August 2020, 07:50:45 UTC
09dcc04 model: Add final object_type field on metadata related model objects 06 August 2020, 17:00:21 UTC
37cdd84 setup.py: Really use the correct keyword Related to T2105 06 August 2020, 16:43:30 UTC
3b2e6c0 setup.py: Use the correct keywords Related to T2105 06 August 2020, 16:13:47 UTC
dab3d72 setup.py: Migrate from vcversion from setuptools-scm Related to T2105 04 August 2020, 12:06:52 UTC
f9fc106 add ImmutableDict.__repr__ It can help in pytest's diffs 30 July 2020, 13:41:29 UTC
b58d901 Fix incorrectly typed null constants in extra_headers byte strings 29 July 2020, 10:46:18 UTC
8f609e5 Import Mapping from collections.abc instead of collections to fix a deprecationg warning. 29 July 2020, 10:46:09 UTC
81f9fbc Declare pytest markers to prevent warnings 29 July 2020, 10:41:12 UTC
3b2d72c Rename MetadataAuthorityType.DEPOSIT to MetadataAuthorityType.DEPOSIT_CLIENT. D3560 20 July 2020, 09:35:27 UTC
bf43536 Rework dia -> pdf pipeline for inkscape 1.0 - Use dia directly to convert from .dia to .svg (inkscape would use dia via a plugin anyway) - Add proper runes to detect inkscape >= 1 and use the export options for that. 09 July 2020, 17:35:21 UTC
0547a51 identifiers: Add to_dict method to SWHID class 08 July 2020, 14:18:40 UTC
52ef52e Use attr instead of NamedTuple to generate SWHID. As NamedTuple inherits from tuple, msgpack serializes it like a tuple, which makes it indistinguishable from a tuple when deserializing, which is an issue for the RPC API. 07 July 2020, 15:34:41 UTC
bea256e Make SWHID immutable and hashable. 07 July 2020, 13:12:44 UTC
06837d5 Implement ImmutableDict.__hash__. 07 July 2020, 13:10:53 UTC
c4dad17 Allow passing an ImmutableDict as argument to ImmutableDict's constructor. It allows easy conversion of Union[ImmutableDict, Dict] to ImmutableDict. 07 July 2020, 13:04:54 UTC
9e475a7 Implement to_dict and from_dict for metadata-related classes. 07 July 2020, 11:31:05 UTC
af0dd1a Add a new ImmutableDict class, and use it in model objects. So they are truly immutable now. 07 July 2020, 11:31:05 UTC
78fc5f7 Add raw metadata to the model. This will allow swh-storage to have a signature for *_metadata_add that is consistent with other *_add endpoints. 07 July 2020, 09:48:19 UTC
a7d9aca Extract the extra_headers from metadata on the Revision model class Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers. 06 July 2020, 09:57:55 UTC
1ff0516 identifiers: Rename some functions and types related to SWHIDs When Software Heritage persistent identifiers were introduced, they were not yet abbreviated as SWHIDs. Now that abbreviation is growing adoption, rename some functions and types in swh.model.identifiers for consistency: - PersistentId -> SWHID - persistent_identifier -> swhid - parse_persistent_identifier -> parse_swhid Backward compatibility with previous naming is maintained but deprecation warnings are introduced to encourage the use of the new names. Numerous variables in swh.model codebase have also been renamed accordingly. Also rework and improve documentation. 03 July 2020, 12:11:32 UTC
8863b5c Refactor common loader behavior within from_disk.iter_directory 02 July 2020, 13:09:50 UTC
363b165 Unify object_type some more within the merkle and from_disk modules 02 July 2020, 13:03:04 UTC
40a40f5 model.OriginVisit: Drop obsolete fields Related to T2310 29 June 2020, 09:08:06 UTC
e632abe Tag model entities with their "object_type" this aims at preventing constant usage of isinstance() based dispatch code when writing generic code handling model entities. For example, the "object_type" argument of JournalWriter.write_addition() has become superflous now we only pass model entities, etc. This idea comes olasd's reading of mypy doc: https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions This comes with a refactoring of from_dict.DiskBackedContent to make it *not* inherit from model.Content: object_type being Final, it cannot be overloaded. 24 June 2020, 15:39:02 UTC
661b7c2 OriginVisitStatus: Allow "created" status Related to T2310 24 June 2020, 07:16:50 UTC
636f8c2 model.OriginVisit: Make obsolete fields optional Related to T2310 23 June 2020, 15:29:53 UTC
f349bdc swh.model.model.OriginVisit: Drop the dateutil.parser.parse use 22 June 2020, 08:14:30 UTC
ba0c4e1 model.hypothesis_strategies: Make metadata always none on origin_visit This is not used. This is broken storage wise (origin-visit-add does not deal correctly with it and it so happens there is no test around it). And finally, this will soon go away with T2310. 16 June 2020, 17:10:53 UTC
f723eb1 Fix the model: Revision.message can be None And adapt the revisions_d() strategy accordingly. 16 June 2020, 08:35:02 UTC
b70b281 Fix message generation in hypothesis strategy releases_d() This can be None, according to the model. 16 June 2020, 08:35:02 UTC
5c5f34f Use the optional() strategy instead of one_of(none(), ...) when possible for the sake of consistency. 16 June 2020, 08:34:54 UTC
a427e18 Allow negative_utc to be None in normalize_timestamp() thus in TimestampWithTimezone.from_dict(). This is needed to help consuming existing (invalid) messages from kafka. Warning: tests added in this revision do not cover the whole normalize_timestamp() function. 15 June 2020, 07:40:43 UTC
3d9f694 Use Tuple instead of List in model declarations. This is a step forward having model objects, declared as frozen, immutable. This requires attrs_strict >= 0.0.7. 03 June 2020, 09:32:05 UTC
340656d Fix origin_visit hypothesis strategies the visit attribute is expected to be strictly positive. 03 June 2020, 09:23:00 UTC
a95646f Exclude [Skipped]Content.ctime from hash/eq computation this attribute is not an intrinsic property of a content object, so it should not be used when comparing or hashing. 29 May 2020, 15:14:31 UTC
29312df Add support for model object anonymization Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable. For Person, the method returns a Person whith hashed fullname (and unset name and email). For Revision and Release, the method returns an anonymized version of the object, i.e. with instance of Person replaced by anonymized ones. 20 May 2020, 14:28:01 UTC
cce3036 SWHID spec: fix typos ";;" which made some examples fail 14 May 2020, 13:47:46 UTC
091498e Make aware_datetimes() generate only ISO8601-encodable datetimes. 05 May 2020, 10:03:39 UTC
9f5d266 SWHID spec: full reread Reviewers: rdicosmo Reviewed By: rdicosmo Differential Revision: https://forge.softwareheritage.org/D3108 30 April 2020, 17:06:41 UTC
b80b135 setup.py: add documentation link 29 April 2020, 16:32:31 UTC
08fd228 hypothesis_strategies: Generate aware datetimes instead of naive ones. Production should only use aware datetimes. 29 April 2020, 11:02:22 UTC
0fad886 doc: check-in IANA registration template for the "swh" URI scheme Closes T1003 29 April 2020, 07:34:30 UTC
8367eec Restructure SWHID documentation in preparation for T2385 - merge grammars into a single one - explain better that SWHIDs are made up of core identifier + qualifiers - separate qualifier into context and fragment onex - add reference to swh-identify 28 April 2020, 18:47:50 UTC
f97d216 SWHID spec: bump version to 1.3 and add last modified date 28 April 2020, 14:04:42 UTC
d230938 SWHID spec: make SWHIDs plural where needed 28 April 2020, 14:04:19 UTC
1379385 SWHID spec: simplify and generalize escaping requirements 27 April 2020, 13:17:50 UTC
3ef4843 SWHID spec: add support for IRI Closes T2379 26 April 2020, 14:44:51 UTC
56cf99a SWHID: deal with escaping in origin qualifiers 24 April 2020, 14:56:47 UTC
3f38808 SWHID doc: improve wording of intrinsic parts v. the rest 24 April 2020, 08:11:45 UTC
1037e88 Add a split_content argument to object_dicts() and objects() strategies Make it possible to generate Content and SkippedContent under different object types (namely "content" and "skipped_content"). Default to False to keep backward compat. 21 April 2020, 12:49:14 UTC
ebd3807 Add a blacklist_types argument to object_dicts() and objects() hypothesis strategies so one can choose not to generate some of the object types. Blacklist "origin_visit_status" by default to prevent breaking dependent packages' tests. 21 April 2020, 12:48:33 UTC
bfba3bd Fix hypothesis strategies alias for origin visit update objects 20 April 2020, 15:37:56 UTC
e5227e2 setup: Update the minimum required runtime python3 version Related to T2367 20 April 2020, 15:37:56 UTC
d52549f CLI: add test for swh identify w/o args and user required=True to check that, as it is the preferred way 17 April 2020, 15:42:16 UTC
7b2cc1f CLI: require explicit "-" to identify via stdin 17 April 2020, 15:25:03 UTC
6ac6cb7 SWHID doc: fix minor grammar issue hat tip to @rdicosmo for noticing 17 April 2020, 15:11:38 UTC
098f76a SWHID doc: fix link in CISE paper reference 17 April 2020, 14:42:46 UTC
36f921b identifiers.py: reference to SWHIDs using explicit anchors 17 April 2020, 14:23:13 UTC
94242ca swh identify: embrace SWHID naming in user-facing doc/messages 17 April 2020, 14:22:41 UTC
4c78d47 PID doc: embrace the SWHID naming 17 April 2020, 14:22:11 UTC
0ab482e PID doc: add reference to CISE paper 17 April 2020, 14:21:46 UTC
2ae347d doc: document identify CLI 16 April 2020, 14:25:14 UTC
401bc17 model: Rename OriginVisitUpdate to OriginVisitStatus This also adapts the hypothesis strategies, using the plural form origin_visit_statuses. That plural form is acceptable because in our context, the statuses are countable. Related to T2310 10 April 2020, 08:43:20 UTC
6f8c66c model: Black formatting 10 April 2020, 08:43:04 UTC
94da010 Add a pyproject.toml file to target py37 for black 08 April 2020, 20:16:56 UTC
bf3f1ce Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. 08 April 2020, 14:53:06 UTC
5d6883b from_disk: path parameter to dir_filter functions 08 April 2020, 09:31:22 UTC
c7c1a57 docs/data-model: Update visits chapter definition Hinting at the origin_visit_update model Related to T2310 02 April 2020, 14:32:02 UTC
64a7f62 model: Make message field optional in Release model A release may have an empty message, for instance those derived from a Mercurial repository. So make that field optional to avoid type validation errors. 02 April 2020, 12:00:30 UTC
074c210 hypothesis: Fix some issues in snapshots strategy and add tests Fix keyword parameters transmission to snapshots_d strategy. Ensure max_size constraint is respected when fixing snapshot aliases. 02 April 2020, 09:45:59 UTC
back to top