09dcc04 | Antoine R. Dumont (@ardumont) | 06 August 2020, 17:00:21 UTC | model: Add final object_type field on metadata related model objects | 06 August 2020, 17:00:21 UTC |
37cdd84 | Antoine R. Dumont (@ardumont) | 06 August 2020, 16:43:30 UTC | setup.py: Really use the correct keyword Related to T2105 | 06 August 2020, 16:43:30 UTC |
3b2e6c0 | Antoine R. Dumont (@ardumont) | 06 August 2020, 16:13:47 UTC | setup.py: Use the correct keywords Related to T2105 | 06 August 2020, 16:13:47 UTC |
dab3d72 | Antoine R. Dumont (@ardumont) | 04 August 2020, 12:06:52 UTC | setup.py: Migrate from vcversion from setuptools-scm Related to T2105 | 04 August 2020, 12:06:52 UTC |
f9fc106 | Valentin Lorentz | 30 July 2020, 13:41:29 UTC | add ImmutableDict.__repr__ It can help in pytest's diffs | 30 July 2020, 13:41:29 UTC |
b58d901 | David Douard | 29 July 2020, 10:43:23 UTC | Fix incorrectly typed null constants in extra_headers byte strings | 29 July 2020, 10:46:18 UTC |
8f609e5 | David Douard | 29 July 2020, 10:42:03 UTC | Import Mapping from collections.abc instead of collections to fix a deprecationg warning. | 29 July 2020, 10:46:09 UTC |
81f9fbc | David Douard | 29 July 2020, 10:32:34 UTC | Declare pytest markers to prevent warnings | 29 July 2020, 10:41:12 UTC |
3b2d72c | Valentin Lorentz | 20 July 2020, 09:35:27 UTC | Rename MetadataAuthorityType.DEPOSIT to MetadataAuthorityType.DEPOSIT_CLIENT. D3560 | 20 July 2020, 09:35:27 UTC |
bf43536 | Nicolas Dandrimont | 09 July 2020, 17:35:21 UTC | Rework dia -> pdf pipeline for inkscape 1.0 - Use dia directly to convert from .dia to .svg (inkscape would use dia via a plugin anyway) - Add proper runes to detect inkscape >= 1 and use the export options for that. | 09 July 2020, 17:35:21 UTC |
0547a51 | Antoine Lambert | 08 July 2020, 14:18:40 UTC | identifiers: Add to_dict method to SWHID class | 08 July 2020, 14:18:40 UTC |
52ef52e | Valentin Lorentz | 07 July 2020, 15:34:41 UTC | Use attr instead of NamedTuple to generate SWHID. As NamedTuple inherits from tuple, msgpack serializes it like a tuple, which makes it indistinguishable from a tuple when deserializing, which is an issue for the RPC API. | 07 July 2020, 15:34:41 UTC |
bea256e | Valentin Lorentz | 07 July 2020, 13:12:44 UTC | Make SWHID immutable and hashable. | 07 July 2020, 13:12:44 UTC |
06837d5 | Valentin Lorentz | 07 July 2020, 13:10:53 UTC | Implement ImmutableDict.__hash__. | 07 July 2020, 13:10:53 UTC |
c4dad17 | Valentin Lorentz | 07 July 2020, 13:04:54 UTC | Allow passing an ImmutableDict as argument to ImmutableDict's constructor. It allows easy conversion of Union[ImmutableDict, Dict] to ImmutableDict. | 07 July 2020, 13:04:54 UTC |
9e475a7 | Valentin Lorentz | 03 July 2020, 15:10:08 UTC | Implement to_dict and from_dict for metadata-related classes. | 07 July 2020, 11:31:05 UTC |
af0dd1a | Valentin Lorentz | 03 July 2020, 15:08:19 UTC | Add a new ImmutableDict class, and use it in model objects. So they are truly immutable now. | 07 July 2020, 11:31:05 UTC |
78fc5f7 | Valentin Lorentz | 02 July 2020, 15:53:13 UTC | Add raw metadata to the model. This will allow swh-storage to have a signature for *_metadata_add that is consistent with other *_add endpoints. | 07 July 2020, 09:48:19 UTC |
a7d9aca | David Douard | 01 July 2020, 13:13:59 UTC | Extract the extra_headers from metadata on the Revision model class Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers. | 06 July 2020, 09:57:55 UTC |
1ff0516 | Antoine Lambert | 03 July 2020, 10:18:38 UTC | identifiers: Rename some functions and types related to SWHIDs When Software Heritage persistent identifiers were introduced, they were not yet abbreviated as SWHIDs. Now that abbreviation is growing adoption, rename some functions and types in swh.model.identifiers for consistency: - PersistentId -> SWHID - persistent_identifier -> swhid - parse_persistent_identifier -> parse_swhid Backward compatibility with previous naming is maintained but deprecation warnings are introduced to encourage the use of the new names. Numerous variables in swh.model codebase have also been renamed accordingly. Also rework and improve documentation. | 03 July 2020, 12:11:32 UTC |
8863b5c | Antoine R. Dumont (@ardumont) | 02 July 2020, 09:57:44 UTC | Refactor common loader behavior within from_disk.iter_directory | 02 July 2020, 13:09:50 UTC |
363b165 | Antoine R. Dumont (@ardumont) | 02 July 2020, 08:58:43 UTC | Unify object_type some more within the merkle and from_disk modules | 02 July 2020, 13:03:04 UTC |
40a40f5 | Antoine R. Dumont (@ardumont) | 22 June 2020, 16:15:32 UTC | model.OriginVisit: Drop obsolete fields Related to T2310 | 29 June 2020, 09:08:06 UTC |
e632abe | David Douard | 13 May 2020, 15:03:01 UTC | Tag model entities with their "object_type" this aims at preventing constant usage of isinstance() based dispatch code when writing generic code handling model entities. For example, the "object_type" argument of JournalWriter.write_addition() has become superflous now we only pass model entities, etc. This idea comes olasd's reading of mypy doc: https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions This comes with a refactoring of from_dict.DiskBackedContent to make it *not* inherit from model.Content: object_type being Final, it cannot be overloaded. | 24 June 2020, 15:39:02 UTC |
661b7c2 | Antoine R. Dumont (@ardumont) | 24 June 2020, 07:13:01 UTC | OriginVisitStatus: Allow "created" status Related to T2310 | 24 June 2020, 07:16:50 UTC |
636f8c2 | Antoine R. Dumont (@ardumont) | 23 June 2020, 15:29:53 UTC | model.OriginVisit: Make obsolete fields optional Related to T2310 | 23 June 2020, 15:29:53 UTC |
f349bdc | Antoine R. Dumont (@ardumont) | 22 June 2020, 08:14:30 UTC | swh.model.model.OriginVisit: Drop the dateutil.parser.parse use | 22 June 2020, 08:14:30 UTC |
ba0c4e1 | Antoine R. Dumont (@ardumont) | 16 June 2020, 17:10:53 UTC | model.hypothesis_strategies: Make metadata always none on origin_visit This is not used. This is broken storage wise (origin-visit-add does not deal correctly with it and it so happens there is no test around it). And finally, this will soon go away with T2310. | 16 June 2020, 17:10:53 UTC |
f723eb1 | David Douard | 16 June 2020, 08:04:58 UTC | Fix the model: Revision.message can be None And adapt the revisions_d() strategy accordingly. | 16 June 2020, 08:35:02 UTC |
b70b281 | David Douard | 16 June 2020, 08:03:37 UTC | Fix message generation in hypothesis strategy releases_d() This can be None, according to the model. | 16 June 2020, 08:35:02 UTC |
5c5f34f | David Douard | 16 June 2020, 08:02:17 UTC | Use the optional() strategy instead of one_of(none(), ...) when possible for the sake of consistency. | 16 June 2020, 08:34:54 UTC |
a427e18 | David Douard | 11 June 2020, 15:00:50 UTC | Allow negative_utc to be None in normalize_timestamp() thus in TimestampWithTimezone.from_dict(). This is needed to help consuming existing (invalid) messages from kafka. Warning: tests added in this revision do not cover the whole normalize_timestamp() function. | 15 June 2020, 07:40:43 UTC |
3d9f694 | David Douard | 25 May 2020, 11:17:00 UTC | Use Tuple instead of List in model declarations. This is a step forward having model objects, declared as frozen, immutable. This requires attrs_strict >= 0.0.7. | 03 June 2020, 09:32:05 UTC |
340656d | David Douard | 03 June 2020, 09:23:00 UTC | Fix origin_visit hypothesis strategies the visit attribute is expected to be strictly positive. | 03 June 2020, 09:23:00 UTC |
a95646f | David Douard | 29 May 2020, 15:14:31 UTC | Exclude [Skipped]Content.ctime from hash/eq computation this attribute is not an intrinsic property of a content object, so it should not be used when comparing or hashing. | 29 May 2020, 15:14:31 UTC |
29312df | David Douard | 19 May 2020, 14:04:30 UTC | Add support for model object anonymization Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable. For Person, the method returns a Person whith hashed fullname (and unset name and email). For Revision and Release, the method returns an anonymized version of the object, i.e. with instance of Person replaced by anonymized ones. | 20 May 2020, 14:28:01 UTC |
cce3036 | Stefano Zacchiroli | 14 May 2020, 13:47:46 UTC | SWHID spec: fix typos ";;" which made some examples fail | 14 May 2020, 13:47:46 UTC |
091498e | Valentin Lorentz | 05 May 2020, 10:03:39 UTC | Make aware_datetimes() generate only ISO8601-encodable datetimes. | 05 May 2020, 10:03:39 UTC |
9f5d266 | Stefano Zacchiroli | 30 April 2020, 16:58:54 UTC | SWHID spec: full reread Reviewers: rdicosmo Reviewed By: rdicosmo Differential Revision: https://forge.softwareheritage.org/D3108 | 30 April 2020, 17:06:41 UTC |
b80b135 | Stefano Zacchiroli | 29 April 2020, 16:32:31 UTC | setup.py: add documentation link | 29 April 2020, 16:32:31 UTC |
08fd228 | Valentin Lorentz | 29 April 2020, 11:02:22 UTC | hypothesis_strategies: Generate aware datetimes instead of naive ones. Production should only use aware datetimes. | 29 April 2020, 11:02:22 UTC |
0fad886 | Stefano Zacchiroli | 28 April 2020, 14:57:47 UTC | doc: check-in IANA registration template for the "swh" URI scheme Closes T1003 | 29 April 2020, 07:34:30 UTC |
8367eec | Roberto Di Cosmo | 28 April 2020, 18:47:50 UTC | Restructure SWHID documentation in preparation for T2385 - merge grammars into a single one - explain better that SWHIDs are made up of core identifier + qualifiers - separate qualifier into context and fragment onex - add reference to swh-identify | 28 April 2020, 18:47:50 UTC |
f97d216 | Stefano Zacchiroli | 28 April 2020, 14:04:42 UTC | SWHID spec: bump version to 1.3 and add last modified date | 28 April 2020, 14:04:42 UTC |
d230938 | Stefano Zacchiroli | 28 April 2020, 14:04:19 UTC | SWHID spec: make SWHIDs plural where needed | 28 April 2020, 14:04:19 UTC |
1379385 | Stefano Zacchiroli | 27 April 2020, 13:17:50 UTC | SWHID spec: simplify and generalize escaping requirements | 27 April 2020, 13:17:50 UTC |
3ef4843 | Stefano Zacchiroli | 26 April 2020, 14:44:51 UTC | SWHID spec: add support for IRI Closes T2379 | 26 April 2020, 14:44:51 UTC |
56cf99a | Stefano Zacchiroli | 24 April 2020, 14:56:47 UTC | SWHID: deal with escaping in origin qualifiers | 24 April 2020, 14:56:47 UTC |
3f38808 | Stefano Zacchiroli | 24 April 2020, 08:11:41 UTC | SWHID doc: improve wording of intrinsic parts v. the rest | 24 April 2020, 08:11:45 UTC |
1037e88 | David Douard | 21 April 2020, 12:49:14 UTC | Add a split_content argument to object_dicts() and objects() strategies Make it possible to generate Content and SkippedContent under different object types (namely "content" and "skipped_content"). Default to False to keep backward compat. | 21 April 2020, 12:49:14 UTC |
ebd3807 | David Douard | 21 April 2020, 09:33:32 UTC | Add a blacklist_types argument to object_dicts() and objects() hypothesis strategies so one can choose not to generate some of the object types. Blacklist "origin_visit_status" by default to prevent breaking dependent packages' tests. | 21 April 2020, 12:48:33 UTC |
bfba3bd | Antoine R. Dumont (@ardumont) | 20 April 2020, 14:46:17 UTC | Fix hypothesis strategies alias for origin visit update objects | 20 April 2020, 15:37:56 UTC |
e5227e2 | Antoine R. Dumont (@ardumont) | 20 April 2020, 15:37:50 UTC | setup: Update the minimum required runtime python3 version Related to T2367 | 20 April 2020, 15:37:56 UTC |
d52549f | Stefano Zacchiroli | 17 April 2020, 15:42:16 UTC | CLI: add test for swh identify w/o args and user required=True to check that, as it is the preferred way | 17 April 2020, 15:42:16 UTC |
7b2cc1f | Stefano Zacchiroli | 17 April 2020, 15:25:03 UTC | CLI: require explicit "-" to identify via stdin | 17 April 2020, 15:25:03 UTC |
6ac6cb7 | Stefano Zacchiroli | 17 April 2020, 15:11:38 UTC | SWHID doc: fix minor grammar issue hat tip to @rdicosmo for noticing | 17 April 2020, 15:11:38 UTC |
098f76a | Stefano Zacchiroli | 17 April 2020, 14:42:46 UTC | SWHID doc: fix link in CISE paper reference | 17 April 2020, 14:42:46 UTC |
36f921b | Stefano Zacchiroli | 17 April 2020, 14:23:13 UTC | identifiers.py: reference to SWHIDs using explicit anchors | 17 April 2020, 14:23:13 UTC |
94242ca | Stefano Zacchiroli | 17 April 2020, 14:22:41 UTC | swh identify: embrace SWHID naming in user-facing doc/messages | 17 April 2020, 14:22:41 UTC |
4c78d47 | Stefano Zacchiroli | 17 April 2020, 14:22:11 UTC | PID doc: embrace the SWHID naming | 17 April 2020, 14:22:11 UTC |
0ab482e | Stefano Zacchiroli | 17 April 2020, 14:21:46 UTC | PID doc: add reference to CISE paper | 17 April 2020, 14:21:46 UTC |
2ae347d | Stefano Zacchiroli | 16 April 2020, 14:25:09 UTC | doc: document identify CLI | 16 April 2020, 14:25:14 UTC |
401bc17 | Antoine R. Dumont (@ardumont) | 10 April 2020, 08:43:20 UTC | model: Rename OriginVisitUpdate to OriginVisitStatus This also adapts the hypothesis strategies, using the plural form origin_visit_statuses. That plural form is acceptable because in our context, the statuses are countable. Related to T2310 | 10 April 2020, 08:43:20 UTC |
6f8c66c | Antoine R. Dumont (@ardumont) | 10 April 2020, 08:43:04 UTC | model: Black formatting | 10 April 2020, 08:43:04 UTC |
94da010 | David Douard | 08 April 2020, 20:16:56 UTC | Add a pyproject.toml file to target py37 for black | 08 April 2020, 20:16:56 UTC |
bf3f1ce | David Douard | 08 April 2020, 14:53:06 UTC | Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. | 08 April 2020, 14:53:06 UTC |
5d6883b | Daniele Serafini | 07 April 2020, 15:28:09 UTC | from_disk: path parameter to dir_filter functions | 08 April 2020, 09:31:22 UTC |
c7c1a57 | Antoine R. Dumont (@ardumont) | 01 April 2020, 14:28:50 UTC | docs/data-model: Update visits chapter definition Hinting at the origin_visit_update model Related to T2310 | 02 April 2020, 14:32:02 UTC |
64a7f62 | Antoine Lambert | 02 April 2020, 10:44:45 UTC | model: Make message field optional in Release model A release may have an empty message, for instance those derived from a Mercurial repository. So make that field optional to avoid type validation errors. | 02 April 2020, 12:00:30 UTC |
074c210 | Antoine Lambert | 01 April 2020, 21:43:44 UTC | hypothesis: Fix some issues in snapshots strategy and add tests Fix keyword parameters transmission to snapshots_d strategy. Ensure max_size constraint is respected when fixing snapshot aliases. | 02 April 2020, 09:45:59 UTC |
ca0f6a1 | David Douard | 23 March 2020, 09:31:03 UTC | model: add support for ctime in [Skipped]Content.from_[data,dict]() With support for str representation of date. Mostly for testing purpose. | 01 April 2020, 09:07:24 UTC |
414a655 | David Douard | 23 March 2020, 09:30:00 UTC | model: small code improvement of SkippedContent.from_dict | 01 April 2020, 09:07:24 UTC |
6ce0f71 | David Douard | 23 March 2020, 09:27:52 UTC | model: fix SkippedContent origin to be a str instead of a reference to an Origin entity. | 01 April 2020, 09:07:24 UTC |
f513271 | David Douard | 23 March 2020, 09:32:39 UTC | hypothesis: split hypothesis strategies as a dict + entity instance for each entity model `Model`, provide a `models_d` strategy that produces dicts suitable for using as argument for the `Model.from_dict` factory method, and reimplement the `models` generator using this former hypothesis generator. This is needed to help writing low level tests for model entities. | 01 April 2020, 09:07:24 UTC |
10b0699 | David Douard | 12 March 2020, 15:01:55 UTC | model: improve a bit the TimestampWithTimezone model - add a validator for negative_utc (can be True iff offset is 0), - update the timestamps_with_timezone hypothesis strategy, - add low-level tests for it. | 01 April 2020, 08:57:07 UTC |
ac9d4c8 | David Douard | 12 March 2020, 13:27:23 UTC | tests: add low level tests for the Timestamp model entity | 01 April 2020, 08:57:07 UTC |
85ca7d7 | David Douard | 20 March 2020, 11:59:56 UTC | model: use attrs_static to enforce type validation of model objects This ensures all instanciated model entities have valid types for attributes. Related to T2308. | 01 April 2020, 08:57:07 UTC |
e9a4c75 | Antoine R. Dumont (@ardumont) | 25 March 2020, 17:00:03 UTC | model: Add new OriginVisitUpdate model object + test strategy (pairing with @vlorentz) Related to T2310 | 31 March 2020, 16:01:54 UTC |
accca60 | Roberto Di Cosmo | 30 March 2020, 12:13:59 UTC | Typo | 30 March 2020, 12:13:59 UTC |
b6e92ea | Roberto Di Cosmo | 30 March 2020, 12:11:55 UTC | Further clarifications in the PID extension | 30 March 2020, 12:11:55 UTC |
d14883e | Roberto Di Cosmo | 28 March 2020, 17:22:11 UTC | Clarify ambiguities in PID extensions | 28 March 2020, 17:22:11 UTC |
0767c81 | Roberto Di Cosmo | 28 March 2020, 14:16:04 UTC | Extend SWH PID definition with additional context qualifiers. | 28 March 2020, 14:16:04 UTC |
4a2233c | Antoine Pietri | 23 March 2020, 18:09:47 UTC | identifiers: encode origin URLs in utf-8 | 23 March 2020, 18:09:47 UTC |
97af886 | David Douard | 11 March 2020, 15:10:08 UTC | tests/identifiers: fix 'target', 'directory' and 'parents' object types These are expected to be bytes, not str. | 12 March 2020, 13:28:22 UTC |
56ae59c | David Douard | 11 March 2020, 14:41:49 UTC | test/model: do not test direct instanciation of model objects this does not work in the general case since there is no (recursive) convertion of objects used as model object initialization. We can only check when using the from_dict() factory. | 11 March 2020, 14:41:49 UTC |
c746960 | David Douard | 11 March 2020, 14:39:32 UTC | tests/models: use d.copy() instead of dict(d) for better clarity on the code author's intention. | 11 March 2020, 14:39:32 UTC |
f533f62 | David Douard | 11 March 2020, 14:01:23 UTC | model: kill Origin.type attribute it was still here for bw-compat but should not be necessary any more. | 11 March 2020, 14:01:23 UTC |
0a6d7e0 | David Douard | 11 March 2020, 12:15:32 UTC | Extract the dictify() function from BaseModel.to_dict() this function does not need to be a local function of the to_dict namespace. | 11 March 2020, 12:15:32 UTC |
a5a9f57 | Valentin Lorentz | 02 March 2020, 14:57:55 UTC | Add classmethod Person.from_address, to parse from 'name <email>' strings. This will allow deduplicating code across loaders. | 04 March 2020, 10:52:29 UTC |
5ccf8a8 | Nicolas Dandrimont | 02 March 2020, 13:03:58 UTC | Draw contents from a byte string instead of generating arbitrary hashes This generates more realistic contents and avoids spurious HashCollisions when generating a set of objects using these hypothesis strategies, at the cost of slightly worse "boundary checking" (i.e. we won't check contents with a length > 4096 bytes). | 02 March 2020, 15:22:59 UTC |
ded150d | Nicolas Dandrimont | 02 March 2020, 09:35:05 UTC | Add a method to generate Content/SkippedContent from binary data This lets us generate Content objects directly from a bytestring, with the proper set of hashes auto-generated from the contents. | 02 March 2020, 15:22:43 UTC |
cb075eb | Nicolas Dandrimont | 27 February 2020, 17:02:22 UTC | model.hypothesis: use the proper strategy name for building `Person`s | 27 February 2020, 17:03:18 UTC |
a9a42ea | Antoine R. Dumont (@ardumont) | 27 February 2020, 15:27:40 UTC | model.hypothesis: Fix person generation | 27 February 2020, 15:27:40 UTC |
f7f18a3 | Valentin Lorentz | 27 February 2020, 14:12:07 UTC | Make attributes name and email of Person optional. Required by loaders, when they can't parse the fullname. | 27 February 2020, 14:12:07 UTC |
750d147 | Valentin Lorentz | 27 February 2020, 13:33:07 UTC | Add from_datetime and from_iso8601 constructors for TimestampWithTimezone. Will be used by loaders. | 27 February 2020, 14:10:31 UTC |
9cf7a04 | Valentin Lorentz | 24 February 2020, 14:59:14 UTC | Add method MerkleNode.iter_tree, to visit all nodes in the subtree of a node. | 27 February 2020, 13:26:12 UTC |
c0ce38e | Valentin Lorentz | 24 February 2020, 15:00:14 UTC | Take the value of MerkleNode.data into account to compute equality. It just makes more sense that way. eg. before this change, all leafs would be equal to each other. | 24 February 2020, 15:07:03 UTC |
6da524c | Valentin Lorentz | 20 February 2020, 15:50:23 UTC | Add to_model() method to from_disk.{Content,Directory}, to convert to canonical model objects. They will be used by loaders, so they can deal only with model objects, instead of having to do the same conversion themselves. This removes the `data` and `save_path` arguments of `from_file` and `from_disk`, as data loading is always deferred from now on. To access it, users are now expected to either open the data files themselves, or us `.to_model().with_data()`. | 24 February 2020, 15:06:24 UTC |
ad6a030 | Valentin Lorentz | 21 February 2020, 14:56:23 UTC | Fix tests of special devices. Regular files were created, as the 'mode' argument of os.mknod was missing. However, creating devices requires root; so we can't reasonably do that in tests. Instead, we're using /dev/null instead of creating one. And while we're at it, let's also use /dev/zero (which, if not handled properly, will result in an infinite read). | 21 February 2020, 15:03:11 UTC |
4c070f9 | Valentin Lorentz | 20 February 2020, 15:46:56 UTC | Sort from_disk.Directory entries. It should be cheap enough to do it, and it makes tests easier. | 21 February 2020, 12:39:05 UTC |