https://github.com/SoftwareHeritage/swh-model

sort by:
Revision Author Date Message Commit Date
153c6e8 make deduplication optional when iterating over the merkle tree 02 July 2021, 09:51:55 UTC
cfb3073 hypothesis_strategies: generate non-ASCII IRIs for origin and authority 'urls'. We agreed a while ago they should be IRIs and not just URIs. This will trigger crashes in swh.storage.cassandra, as currently expects (wrongly) that origin urls are ASCII. 25 June 2021, 15:24:40 UTC
4009b3a hypothesis_strategies: Restrict size and alphabets for metadata_fetchers and raw_extrinsic_metadata * empty fetcher name or version is not accepted by cassandra (and is nonsensical anyway) * ditto for non-ASCII (and any non-printable is nonsensical) * null bytes/chars are accepted by neither postgresql or cassandra 25 June 2021, 15:08:16 UTC
90b477e hypothesis_strategies: Generate None metadata instead of {} This is the only value we should use from now on; and the value is ignored by swh-storage anyway. 25 June 2021, 14:13:37 UTC
baff6ba hypothesis_strategies: Add raw_extrinsic_metadata() strategy It will be used by swh-web. 25 June 2021, 09:22:32 UTC
e4566a6 from_disk: get swhid from Content/Directory objects Closes T3393 21 June 2021, 15:06:09 UTC
e09446a encode exclude patterns before extracting regex objects - add typing annotation to avoid such error in the future Fixes T3383 15 June 2021, 16:23:43 UTC
8ec0cf6 Add a TimestampWithTimezone.to_datetime() method 15 June 2021, 09:22:38 UTC
428c170 Fix normalize_timestamp() for datetime < epoch with microsecond>0 the problem was for datetime<epoch, the timestamp is negative, but since it's a float that includes the microseconds, if both are true (< epoch and microsecond > 0), then the computed (int) timestamp was off by one. Add dedicated tests for this. 15 June 2021, 09:22:38 UTC
ae50e43 cli: add recursive option 11 June 2021, 14:25:24 UTC
8540c67 mypy: Fix errors with release >= v0.900 09 June 2021, 13:36:37 UTC
4808fc2 Fix snapshot entries in swh_model_data test data make sure the snapshot id in OriginVisitStatus refers to existing Snapshot objects. 19 May 2021, 13:21:18 UTC
96cc355 tox: Disable coverage in py3-minimal If run after py3-full, it overwrites the full coverage info, so most lines are incorrectly reported as uncovered in Phabricator diffs. 11 May 2021, 09:24:43 UTC
8c904dc identifiers: Expose git_object instead of manifest The git_object is what will be actually useful to the vault. It's also easier to test, because test_identifier.py has the entire git_object in its test data. 11 May 2021, 08:59:28 UTC
523ab64 identifiers: Expose manifest computation Before this commit, manifests were only computed internally before hashing, so they were not available to outside modules. This makes testing the module very painful, because identifier functions can only be tested by checking the hash; so test failures did not show mismatches between the computed manifest and the expected one. Additionally, the 'git bare cooker' of the vault is likely to use these as well, as it needs to format git objects in the same format. 11 May 2021, 08:33:36 UTC
31cb72e Blacklist attr 21.1.0 There is a regression that breaks attr.evolve() when updating attributes that contain an attr class; which we use (eg. for Person or TimestampWithTimezone). v21.2.0 is expected to fix the issue, but won't be released immediately: https://github.com/python-attrs/attrs/issues/804#issuecomment-833471190 06 May 2021, 12:18:22 UTC
df036ef docs/persistent-identifiers: Add guidelines for fixing invalid SWHIDs. 30 April 2021, 11:01:02 UTC
f7e9d5c tox: Add sphinx environments to check sane doc build Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258 28 April 2021, 12:01:35 UTC
446bd2b Fix swh_model_data hardcoded id values and add a test to keep them correct. 23 April 2021, 15:24:08 UTC
1f6b3b9 swh_model_data: add parents to test revision 15 April 2021, 14:37:01 UTC
8d0352c Fix various Sphinx warnings 15 April 2021, 08:20:39 UTC
74b024f identifiers: Fix parsing of SWHID qualifier value containing '=' According to the SWHID specification, it is not forbidden for a qualifier value to contain a '=' character (for instance in origin URL). So update parsing code to handle that special case. 13 April 2021, 12:02:33 UTC
15d5bab identifiers: Fix some invalid ValidationError template string formats Some ValidationError exceptions could not be serialized to string due to these format errors. Related to T3234 13 April 2021, 10:43:55 UTC
f2dba17 docs: Ask readers to install swh.model[cli] to fully use swh-identify Otherwise, they will get an error asking them to install Click (or Dulwich if Click installed and they use -t snapshot) 12 April 2021, 11:18:58 UTC
27a05d6 tox: Check swh-identify can run even if Dulwich isn't installed 12 April 2021, 11:18:58 UTC
c62f13f swh-identify: Hide tracebacks if Click or Dulwich is not installed And show nice human-readable errors instead 09 April 2021, 13:14:52 UTC
eeedac7 Remove accidental dependency of 'swh-identify' on swh-core Was added in be8f1a559d8209710a08ca48d93b7f513fa1c42f 08 April 2021, 14:16:42 UTC
9523be0 Model test data: add Release with no author/date Some releases don't have authors and date fields, this case should be checked in the tests. 26 March 2021, 11:05:51 UTC
af5e461 Truncate RawExtrinsicMetadata.discovery_date to a second This truncation is already enshrined at the identifier level. Truncate the object itself as well, to reduce the possibility multiple different metadata objects with the same identifier. 18 March 2021, 09:57:02 UTC
975e989 model: Add a swhid() method to RawExtrinsicMetadata. All other hashable objects but ExtId have one. It will be used by swh-deposit. 12 March 2021, 12:58:37 UTC
71be461 Add an ExtID object this object aims at being able to keep in the SWH Archive an SWHID <-> External object ID map, e.g. to be able to keep track of Mercurial ids so the Mercurial loader can be made more efficient. Related to T2849. 10 March 2021, 09:39:28 UTC
fca3658 Fix MetadataAuthority.from_dict() was modifying the dict given as argument. 08 March 2021, 15:15:08 UTC
2185f93 model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash. 04 March 2021, 14:22:13 UTC
4386330 identifiers: Properly define the behavior of raw_extrinsic_metadata on negative timestamps. The rounding algorithm wasn't specified 04 March 2021, 10:19:33 UTC
3ce4125 identifiers: Change the manifest format of raw_extrinsic_metadata to use integer instead of ISO8601 Serializing as ISO8601 makes the hash brittle, because the database may change the timezone silently and/or lose precision in the microseconds. As we do not need precise timestamp, using an integer is good enough, and is consistant with the git format. The manifest also does not need to contain a timezone, as it only represents the timezone of the system that fetched this metadata, which is useless data. 04 March 2021, 10:19:33 UTC
fc808e1 model: Add 'id' field to RawExtrinsicMetadata So that they can be properly deduplicated and referenced. 04 March 2021, 10:17:44 UTC
f6eab95 identifiers: Add raw_extrinsic_metadata_identifier This will be used to compute an intrisic identifier for RawExtrinsicMetadata; which can be used for deduplication and refering to it like any other sha1_git instead of needed to use a tuple of its fields. 04 March 2021, 10:17:44 UTC
bf4ab43 identifiers: Remove the deprecated SWHID class Other packages don't use it anymore. 03 March 2021, 09:44:48 UTC
1e924e8 cli: stop using the deprecated SWHID class 03 March 2021, 09:44:27 UTC
8e01199 Add CoreSWHID.to_extended() This is a useful shorthand when generating SWHIDs in tests. 01 March 2021, 09:06:58 UTC
31a8a0f RawExtrinsicMetadata: Use CoreSWHID instead of SWHID for contexts SWHID is deprecated; and CoreSWHID does not support qualifiers at all, so RawExtrinsicMetadata no longer needs to check there are no qualifiers. 01 March 2021, 09:06:58 UTC
752fb81 RawExtrinsicMetadata: Use ExtendedSWHID as target and remove type ExtendedSWHID can identify either a software artifact or an origin, so we no longer need Union[SWHID, str]. Therefore, we no longer need the 'type' attribute, as it was only used to tell when the target is a SWHID and when it's an origin URL. 01 March 2021, 09:06:58 UTC
256bca2 Add a swhid() method to all hashable objects. It can be handy as a shortcut to build SWHID objects. 01 March 2021, 09:06:58 UTC
24b653e Add test for all qualifiers' converters and validators 23 February 2021, 12:48:49 UTC
069b56a Escape semicolon in origin qualifiers. 23 February 2021, 12:48:39 UTC
710fb42 Add test checking SWHID_QUALIFIERS matches the attributes of QualifiedSWHID. 23 February 2021, 12:48:39 UTC
7dead5d Fix qualifier parsing and add tests * Quote/unquote path * Fix line parsing and serializing to properly handle None * Fix error raised by check_visit/check_anchor 23 February 2021, 12:48:39 UTC
172eadb Deduplicate parsing/unparsing tests of the new SWHID classes They were all very similar and only differ in what 'edge' cases they accept 23 February 2021, 12:48:29 UTC
9bcc884 Deduplicate code between CoreSWHID, QualifiedSWHID, and ExtendedSWHID by making them all derive from an abstract class. 23 February 2021, 12:26:27 UTC
d4b20dc Add new class ExtendedSWHID as an alternative to SWHID/QualifiedSWHID Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the last one. It also removes "ori" as a valid object type for CoreSWHID and QualifiedSWHID, as it now only belongs in ExtendedSWHID. 23 February 2021, 12:26:22 UTC
9923765 Use dict instead of temporary SWHID when parsing {Core,Qualified}SWHID. It is cleaner, avoids warnings, and will be needed when introducing ExtendedSWHID in a future commit. 19 February 2021, 13:20:58 UTC
8e91759 QualifiedSWHID: Replace the 'qualifiers' dict with statically defined attributes And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string. 19 February 2021, 12:49:11 UTC
eba8d84 Add new class CoreSWHID as an alternative to SWHID/QualifiedSWHID Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the second one 19 February 2021, 12:48:27 UTC
690b7f8 Add new class QualifiedSWHID to replace SWHID, and deprecate the latter. Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) Since migrating from SWHID will break existing code, this commit uses the opportunity to modernize it a little, ie.: * `keyword`-only constructor, to get rid of the hacky default values for `object_type` and `object_id` * enum instead of strings for the object type * `bytes` instead of an hex string for the object id * rename `metadata` to `qualifiers` 19 February 2021, 12:47:25 UTC
758eb88 tests: Clean hashutil._blake2_hash_cache after mocking blake2 functions. Depending on the order in which tests are run, these tests may insert lambdas with mocked blake2 functions in their closure to be inserted in hashutil._blake2_hash_cache; causing all future tests to fail. While this does not happen with the default order of tests, it does when using pytest-xdist. 19 February 2021, 09:48:42 UTC
0c16581 Make explicit Python 3 dependency 02 February 2021, 12:52:16 UTC
1bfdf71 Update persistent identifiers doc with pip install info 29 January 2021, 23:53:27 UTC
cad940d Add swh-journal's model-related test data set in swh-model so it's kept up to date when evolutions are made in the model and thus preventing swh-journal and swh-model to be unecessarly coupled. Related to T2970. 26 January 2021, 15:43:13 UTC
9af451f model: Allow new status values not_found and failed to OriginVisitStatus Related to T2961 20 January 2021, 13:00:33 UTC
1ca92a5 Add an optional type field on OriginVisitStatus object The optional nature of the type will allow to avoid migrating the (db) data model right now then we can have this type field in kafka messages in the origin-visit-status topic Related to T2443 13 January 2021, 10:14:28 UTC
1d0c321 test_identifiers: Reorder SWHID tests. They were mixed in with snapshot tests. 12 January 2021, 11:24:35 UTC
731d10d test_identifiers: Make sure that {directory,revision,release,snapshot}_identifier() doesn't just return a value from the dict. For example, before this commit, you could replace the code of revision_identifier() with this: def release_identifier(release): return release.get("id", b"") and all tests would still pass. 12 January 2021, 11:17:30 UTC
18fde50 Add missing slots=True for Directory. 04 January 2021, 14:48:26 UTC
5746850 SWHID parsing: simplify and deduplicate validation logic Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788 30 December 2020, 12:22:47 UTC
76b744e model: Make all classes slotted. Unfortunately, sphinx (actually, autodoc) only picks up attributes if they fall in any of these cases: 1. are enum variants 2. are in slots 3. are in __dict__ 4. have an annotation 5. are found using its custom parser (see get_object_members in sphinx/ext/autodoc/importer.py) In theory, option 5 should work for us; unfortunately, autodoc only asks the parser the list of members with a comment. And it's not easy to adapt it to ask the parser for all members, because said parser (sphinx/pycode/parser.py) does not return the class qualname (aka. namespace) for members without comments. So, as I don't want to change the interface of sphinx.pycode.parser, this commit switches to relying on option 3, by adding __slots__ for all attr classes. Additionally, this might have some performance/memory improvement (though I did not check) and will further avoid mutation of these objects. 15 December 2020, 13:10:04 UTC
a3b6a64 Drop backwards-compatibility support for RawExtrinsicMetadata.id All reverse dependencies have been updated to avoid using it now, so it can now be removed, paving the way to recycle it into an intrinsic identifier. 16 November 2020, 17:51:31 UTC
8e121bb model.identifiers: Fix one space too many in error message Related to T2729 12 November 2020, 11:59:51 UTC
559a283 identifiers.parse_swhid: Make SWHIDs with whitespaces invalid So parse_swhid raises a ValidationError when that is detected. Related to T2769 12 November 2020, 11:39:04 UTC
fb504b4 identifiers.parse_swhid: Check the swhid qualifiers and fail if invalid Related to T2769 12 November 2020, 11:39:04 UTC
22c7c88 model.identifiers: Improve error messages in case of invalid SWHIDs Related to T2769 12 November 2020, 10:15:56 UTC
47946aa test: Migrate parse_swhid test cases to pytest Related to T2769 10 November 2020, 17:37:51 UTC
4e3fdc0 Throw fewer deprecation warnings for the RawExtrinsicMetadata id attr This avoids throwing the deprecation warning when id and target are present and have the same value, which makes a from_dict / to_dict round-trip throw no deprecation warnings. 27 October 2020, 14:38:26 UTC
9da17a5 Rename the RawExtrinsicMetadata id field to target This backwards-compatible change prepares the transition to give RawExtrinsicMetadata an `id` field that is computed intrinsically from its contents (using the HashableObject mixin). 26 October 2020, 15:57:49 UTC
498a107 Update the HashableObject interface to take the object itself This will enable a gradual enhancement of the functions in identifiers.py to take model objects directly, and return the bytes of the hash instead of an hex representation. 23 October 2020, 15:12:34 UTC
aa84d8d CONTRIBUTORS: Add Antoine Cezar 23 October 2020, 12:48:18 UTC
2b869aa swh identify: add --exclude 23 October 2020, 09:21:35 UTC
9224c8c Make revision/release identifiers explicitly the hash of a manifest This collapses the shared logic between these two identifier computations into a few more explicit steps: - generate data for the manifest (in either identifier computation); - format the manifest (in the new format_manifest function); - hash the manifest (in the new hash_manifest function). This will enable reusing this logic for more object types, as well as stronger typing for the manifest computation. 14 October 2020, 16:49:54 UTC
a251df2 Add a 'unique_key' method on model objects that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`. 08 October 2020, 09:17:35 UTC
bdfde82 cli: make SWHIDParamType return SWHID type instead of string 05 October 2020, 08:29:06 UTC
fe3ec55 tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops 02 October 2020, 14:23:49 UTC
362ebf6 Merge the two test_identifiers.py files. I created one in the wrong directory and didn't see the existing one. 29 September 2020, 10:48:29 UTC
be8f1a5 Adapt cli declaration entrypoint to swh.core 0.3 The addition of '-p no:pytest_swh_core' in pytest.ini is needed to prevent pytest from loading the pytest_swh_core plugin which we do not need here and which would require some more dependencies (e.g. requests). 25 September 2020, 13:24:29 UTC
e0b4b94 model: remove deprecated and unused PID methods Use the new SWHID naming convention instead of SWH PID. 18 September 2020, 15:05:02 UTC
a273718 python: Reorder imports with isort Related to T2610 17 September 2020, 15:57:03 UTC
5bda22f pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 15:57:02 UTC
eff3c63 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:56:04 UTC
7404486 cli: speedup the `swh` cli command startup time move most import statements in functions. Related to T2575. 10 September 2020, 14:19:18 UTC
12fe1f7 model: Fix "unused 'type: ignore' comment" error with mypy 0.782 25 August 2020, 13:32:51 UTC
c85990b Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. 25 August 2020, 08:40:13 UTC
6dd6ace model: Raise error on naive datetimes. We may unknowingly pass naive datetimes to the storage through them, causing the underlying DB to assign them a timezone that might not match the actual one. It already happens in swh.model and swh.loader.package tests. 14 August 2020, 12:12:35 UTC
d1db7b9 model.Content.to_dict: Remove ctime entry when it's None Same as for the field data, it helps for code not yet migrated to use model object. 07 August 2020, 07:53:31 UTC
b1a16b1 model: Add Sha1 alias Related to T645 07 August 2020, 07:50:45 UTC
09dcc04 model: Add final object_type field on metadata related model objects 06 August 2020, 17:00:21 UTC
37cdd84 setup.py: Really use the correct keyword Related to T2105 06 August 2020, 16:43:30 UTC
3b2e6c0 setup.py: Use the correct keywords Related to T2105 06 August 2020, 16:13:47 UTC
dab3d72 setup.py: Migrate from vcversion from setuptools-scm Related to T2105 04 August 2020, 12:06:52 UTC
f9fc106 add ImmutableDict.__repr__ It can help in pytest's diffs 30 July 2020, 13:41:29 UTC
b58d901 Fix incorrectly typed null constants in extra_headers byte strings 29 July 2020, 10:46:18 UTC
8f609e5 Import Mapping from collections.abc instead of collections to fix a deprecationg warning. 29 July 2020, 10:46:09 UTC
81f9fbc Declare pytest markers to prevent warnings 29 July 2020, 10:41:12 UTC
back to top