97af886 | David Douard | 11 March 2020, 15:10:08 UTC | tests/identifiers: fix 'target', 'directory' and 'parents' object types These are expected to be bytes, not str. | 12 March 2020, 13:28:22 UTC |
56ae59c | David Douard | 11 March 2020, 14:41:49 UTC | test/model: do not test direct instanciation of model objects this does not work in the general case since there is no (recursive) convertion of objects used as model object initialization. We can only check when using the from_dict() factory. | 11 March 2020, 14:41:49 UTC |
c746960 | David Douard | 11 March 2020, 14:39:32 UTC | tests/models: use d.copy() instead of dict(d) for better clarity on the code author's intention. | 11 March 2020, 14:39:32 UTC |
f533f62 | David Douard | 11 March 2020, 14:01:23 UTC | model: kill Origin.type attribute it was still here for bw-compat but should not be necessary any more. | 11 March 2020, 14:01:23 UTC |
0a6d7e0 | David Douard | 11 March 2020, 12:15:32 UTC | Extract the dictify() function from BaseModel.to_dict() this function does not need to be a local function of the to_dict namespace. | 11 March 2020, 12:15:32 UTC |
a5a9f57 | Valentin Lorentz | 02 March 2020, 14:57:55 UTC | Add classmethod Person.from_address, to parse from 'name <email>' strings. This will allow deduplicating code across loaders. | 04 March 2020, 10:52:29 UTC |
5ccf8a8 | Nicolas Dandrimont | 02 March 2020, 13:03:58 UTC | Draw contents from a byte string instead of generating arbitrary hashes This generates more realistic contents and avoids spurious HashCollisions when generating a set of objects using these hypothesis strategies, at the cost of slightly worse "boundary checking" (i.e. we won't check contents with a length > 4096 bytes). | 02 March 2020, 15:22:59 UTC |
ded150d | Nicolas Dandrimont | 02 March 2020, 09:35:05 UTC | Add a method to generate Content/SkippedContent from binary data This lets us generate Content objects directly from a bytestring, with the proper set of hashes auto-generated from the contents. | 02 March 2020, 15:22:43 UTC |
cb075eb | Nicolas Dandrimont | 27 February 2020, 17:02:22 UTC | model.hypothesis: use the proper strategy name for building `Person`s | 27 February 2020, 17:03:18 UTC |
a9a42ea | Antoine R. Dumont (@ardumont) | 27 February 2020, 15:27:40 UTC | model.hypothesis: Fix person generation | 27 February 2020, 15:27:40 UTC |
f7f18a3 | Valentin Lorentz | 27 February 2020, 14:12:07 UTC | Make attributes name and email of Person optional. Required by loaders, when they can't parse the fullname. | 27 February 2020, 14:12:07 UTC |
750d147 | Valentin Lorentz | 27 February 2020, 13:33:07 UTC | Add from_datetime and from_iso8601 constructors for TimestampWithTimezone. Will be used by loaders. | 27 February 2020, 14:10:31 UTC |
9cf7a04 | Valentin Lorentz | 24 February 2020, 14:59:14 UTC | Add method MerkleNode.iter_tree, to visit all nodes in the subtree of a node. | 27 February 2020, 13:26:12 UTC |
c0ce38e | Valentin Lorentz | 24 February 2020, 15:00:14 UTC | Take the value of MerkleNode.data into account to compute equality. It just makes more sense that way. eg. before this change, all leafs would be equal to each other. | 24 February 2020, 15:07:03 UTC |
6da524c | Valentin Lorentz | 20 February 2020, 15:50:23 UTC | Add to_model() method to from_disk.{Content,Directory}, to convert to canonical model objects. They will be used by loaders, so they can deal only with model objects, instead of having to do the same conversion themselves. This removes the `data` and `save_path` arguments of `from_file` and `from_disk`, as data loading is always deferred from now on. To access it, users are now expected to either open the data files themselves, or us `.to_model().with_data()`. | 24 February 2020, 15:06:24 UTC |
ad6a030 | Valentin Lorentz | 21 February 2020, 14:56:23 UTC | Fix tests of special devices. Regular files were created, as the 'mode' argument of os.mknod was missing. However, creating devices requires root; so we can't reasonably do that in tests. Instead, we're using /dev/null instead of creating one. And while we're at it, let's also use /dev/zero (which, if not handled properly, will result in an infinite read). | 21 February 2020, 15:03:11 UTC |
4c070f9 | Valentin Lorentz | 20 February 2020, 15:46:56 UTC | Sort from_disk.Directory entries. It should be cheap enough to do it, and it makes tests easier. | 21 February 2020, 12:39:05 UTC |
e109241 | Valentin Lorentz | 21 February 2020, 10:45:53 UTC | Add test for git directory entries order. | 21 February 2020, 12:39:05 UTC |
60c3aa1 | Valentin Lorentz | 20 February 2020, 15:42:48 UTC | Add support for skipping large contents in from_disk. It will be useful to loaders, as they currently load the entire content in memory before deciding to skip it. | 21 February 2020, 12:39:05 UTC |
a5b5818 | Nicolas Dandrimont | 18 February 2020, 11:03:29 UTC | Re-introduce the swh.core dependency in swh.model[cli] This allows the `swh identify` command to work again. Close T2288 | 18 February 2020, 11:03:29 UTC |
2c1e02b | Valentin Lorentz | 14 February 2020, 17:07:44 UTC | Add method BaseModel.hashes(). Can be useful to deduplicate code in swh-storage. | 14 February 2020, 17:07:44 UTC |
fcfbd4d | Valentin Lorentz | 07 February 2020, 15:52:58 UTC | Make OriginVisit.snapshot optional. It already is in practice. | 07 February 2020, 15:53:03 UTC |
73053a6 | Valentin Lorentz | 07 February 2020, 15:08:19 UTC | Make 'visible' the default status for present Contents. | 07 February 2020, 15:08:19 UTC |
05b89f2 | Valentin Lorentz | 07 February 2020, 15:02:04 UTC | Make content length mandatory. The current postgresql model refuses NULL values. | 07 February 2020, 15:02:04 UTC |
8ebbd21 | Valentin Lorentz | 04 February 2020, 14:49:31 UTC | Split Content class into two classes, for missing and non-missing contents. | 05 February 2020, 10:26:41 UTC |
4b779e1 | Antoine R. Dumont (@ardumont) | 30 January 2020, 15:42:21 UTC | test_model: Simplify and align model checks | 30 January 2020, 15:43:50 UTC |
b54adf7 | Antoine R. Dumont (@ardumont) | 30 January 2020, 14:38:44 UTC | model: Update revision date types to be optional Related to P589 | 30 January 2020, 15:37:07 UTC |
57a0e08 | David Douard | 29 January 2020, 13:55:15 UTC | cli: add support for reading a file content from stdin in 'swh identify' command This allows for example to type: curl -s https://archive.softwareheritage.org/browse/content/sha1_git:64582b78792cd6c2d67d35da5a11bb80886a6409/raw/ | swh identify swh:1:cnt:64582b78792cd6c2d67d35da5a11bb80886a6409 - | 29 January 2020, 14:22:37 UTC |
16e4425 | Antoine Lambert | 17 January 2020, 17:25:40 UTC | model: Fix sphinx warning Related to T2188 | 17 January 2020, 17:25:40 UTC |
7b6f474 | Antoine Lambert | 02 December 2019, 18:56:37 UTC | hypothesis_strategies/snapshots: Explain last post-processing step | 02 December 2019, 18:56:37 UTC |
4e4c4ff | Antoine Lambert | 28 November 2019, 16:01:48 UTC | model: Add automatic object identifier computation support Add support to automatically compute identifier in the following object models: Directory, Release, Revision, Snapshot. If the identifier is not provided as parameter, it will be computed when the model is initialized. | 29 November 2019, 14:51:35 UTC |
da64756 | Antoine Lambert | 28 November 2019, 15:48:44 UTC | identifiers: Fix release_identifier for snapshot target | 29 November 2019, 13:00:53 UTC |
276a528 | David Douard | 21 November 2019, 12:33:54 UTC | Add a pre-commit config file | 21 November 2019, 12:47:13 UTC |
c8ee973 | Nicolas Dandrimont | 21 November 2019, 12:46:11 UTC | Migrate tox.ini to extras = xxx instead of deps = .[testing] | 21 November 2019, 12:46:11 UTC |
3b26bf3 | Nicolas Dandrimont | 21 November 2019, 12:45:48 UTC | De-specify testenv:py3 Allows us to call tests on things other than python3, with the same settings. | 21 November 2019, 12:45:48 UTC |
1ff9e52 | Nicolas Dandrimont | 20 November 2019, 18:42:40 UTC | Include all requirements in MANIFEST.in | 20 November 2019, 18:51:46 UTC |
67e78eb | Stefano Zacchiroli | 20 November 2019, 11:56:50 UTC | PID doc: drop mention of ori PIDs they will remain for internal use only, so should not be mentioned in this public spec this reverts the documentation part of 67fade5f674a57fd8845ad57161a86a2d898d197 | 20 November 2019, 11:56:51 UTC |
0b9c5be | Valentin Lorentz | 30 October 2019, 13:36:25 UTC | Make OriginVisit.origin a string instead of a dict. | 30 October 2019, 13:36:25 UTC |
b064a0b | David Douard | 29 October 2019, 12:47:26 UTC | Add a test data generator module currently provides mainly 2 generators: - gen_origins() - gen_contents() | 29 October 2019, 15:43:15 UTC |
7564596 | David Douard | 29 October 2019, 13:33:57 UTC | model: make model entities frozen we do not really need them to be mutable, plus we gain their instances now being hashable, so we can add them in set() for example. | 29 October 2019, 13:47:13 UTC |
4b79a2b | Nicolas Dandrimont | 23 October 2019, 11:55:18 UTC | model: make to_dict() properly recursive Instead of relying on attr.asdict recursion, we do recursion ourselves. This simplifies a lot of the inherited to_dict() methods. | 23 October 2019, 12:22:15 UTC |
2e4558c | Stefano Zacchiroli | 20 October 2019, 07:38:54 UTC | test_cli.py: fill in valid snapshot ID | 20 October 2019, 19:23:53 UTC |
8e3ee39 | Stefano Zacchiroli | 20 October 2019, 07:38:07 UTC | test_cli.py: drop unused NoQA marker | 20 October 2019, 19:23:53 UTC |
4a74205 | Stefano Zacchiroli | 08 October 2019, 07:52:58 UTC | swh identify -t snapshot: add support for symbolic refs | 20 October 2019, 19:23:53 UTC |
b2c21d3 | Nicolas Dandrimont | 17 October 2019, 12:42:35 UTC | Don't export origin_visit['origin']['type'] | 18 October 2019, 11:21:43 UTC |
7a9fc39 | Nicolas Dandrimont | 17 October 2019, 12:42:35 UTC | model: Don't export origin['type'] | 18 October 2019, 09:55:40 UTC |
c103a6f | Antoine R. Dumont (@ardumont) | 09 October 2019, 13:16:02 UTC | tox.ini: Fix py3 environment to use packaged tests Related D2082 | 09 October 2019, 13:16:42 UTC |
131298c | Stefano Zacchiroli | 06 October 2019, 18:10:38 UTC | swh.model: document how origin PIDs are computed | 06 October 2019, 18:10:38 UTC |
298c942 | Stefano Zacchiroli | 04 October 2019, 17:12:57 UTC | CONTRIBUTORS: add @DanSeraf | 04 October 2019, 17:12:57 UTC |
e7cf550 | Stefano Zacchiroli | 04 October 2019, 17:10:35 UTC | Merge branch 'arcpatch-D2058' | 04 October 2019, 17:10:35 UTC |
375832f | Daniele Serafini | 04 October 2019, 17:08:16 UTC | PID: move validation checks to PersistentId constructor ... from test_persistent_identifier. Closes T1986 | 04 October 2019, 17:10:24 UTC |
6e7c3da | Stefano Zacchiroli | 01 October 2019, 15:58:26 UTC | mypi.ini: remove left-over "false positive" comment from dulwich exclude | 01 October 2019, 15:58:26 UTC |
b2d8bbf | Stefano Zacchiroli | 01 October 2019, 15:56:57 UTC | setup.py: move CLI dependencies to a dedicated swh-model[cli] subpackage It is now possible to install swh-model without dulwich (and Click, FWIW). Users who want to use "swh identify" should "pip install swh-model[cli]". | 01 October 2019, 15:56:57 UTC |
a9af3e7 | Stefano Zacchiroli | 01 October 2019, 14:19:32 UTC | swh identify: add support to compute snapshot PIDs of on-disk git repo | 01 October 2019, 14:19:32 UTC |
febe800 | Stefano Zacchiroli | 28 September 2019, 11:28:12 UTC | tox: anticipate mypy run to just after flake8 | 28 September 2019, 11:28:12 UTC |
340b001 | Stefano Zacchiroli | 27 September 2019, 07:12:30 UTC | init.py: switch to documented way of extending path make mypy 0.730 pass cleanly again | 27 September 2019, 07:12:30 UTC |
1295f45 | Stefano Zacchiroli | 20 September 2019, 13:49:10 UTC | MANIFEST.in: ship py.typed | 20 September 2019, 13:49:10 UTC |
70e5d50 | Stefano Zacchiroli | 20 September 2019, 10:13:58 UTC | identifiers.py: do not inherit from on-the-fly namedtuple | 20 September 2019, 10:13:58 UTC |
54c6642 | Stefano Zacchiroli | 20 September 2019, 10:05:34 UTC | mypy: ignore django-stubs, needed only by hypothesis | 20 September 2019, 10:05:34 UTC |
267ffee | Stefano Zacchiroli | 20 September 2019, 09:28:21 UTC | mypy.ini: remove left-over sample section | 20 September 2019, 09:28:21 UTC |
491dcc5 | Stefano Zacchiroli | 12 September 2019, 12:46:05 UTC | typing: minimal changes to make a no-op mypy run pass | 20 September 2019, 09:13:35 UTC |
d70b486 | Stefano Zacchiroli | 15 September 2019, 08:51:20 UTC | fix indentation and spelling: make "make check" happy | 15 September 2019, 08:51:46 UTC |
e77c94d | Valentin Lorentz | 04 September 2019, 12:36:01 UTC | Fix Revision.from_dict to allow optional fields. | 04 September 2019, 12:36:01 UTC |
e99a5f2 | Valentin Lorentz | 04 September 2019, 12:35:20 UTC | Make Origin type optional. Needed by the replayer in swh-journal. | 04 September 2019, 12:35:20 UTC |
100eb6d | Antoine Pietri | 03 September 2019, 14:41:35 UTC | docs: link pages together from a TOC | 03 September 2019, 14:41:35 UTC |
fd2e6da | Stefano Zacchiroli | 23 August 2019, 16:57:49 UTC | swh identify: add support for origin PIDs | 23 August 2019, 16:57:49 UTC |
880aff9 | Stefano Zacchiroli | 23 August 2019, 16:24:25 UTC | identifiers.py: add constants for 'swh:1' and sanitize namespace | 23 August 2019, 16:34:45 UTC |
9938380 | Valentin Lorentz | 21 August 2019, 16:01:13 UTC | Remove pointless validators. | 23 August 2019, 11:55:34 UTC |
6df68b0 | Valentin Lorentz | 21 August 2019, 15:50:00 UTC | Remove release metadata from serialization if it's None. It kind of matches the current state of the postgresql storage, which does not support it. | 22 August 2019, 12:02:09 UTC |
52b617c | Valentin Lorentz | 21 August 2019, 15:40:01 UTC | Add support for dangling snapshot branches. | 22 August 2019, 12:00:43 UTC |
4e26f7d | Valentin Lorentz | 21 August 2019, 15:46:11 UTC | Add missing fields status/type/snapshot/metadata to OriginVisit. | 21 August 2019, 15:46:11 UTC |
54957d2 | Valentin Lorentz | 21 August 2019, 15:44:41 UTC | Make OriginVisit use datetime for its date. But keep support for deserializing from str, like swh-storage does. | 21 August 2019, 15:44:41 UTC |
01a5d4c | Valentin Lorentz | 19 August 2019, 12:35:41 UTC | Add a get_hash helper method to Content. Code manipulating a Content object may want to access a hash of configurable name; this method allows it to do that without using getattr directly. | 20 August 2019, 09:38:08 UTC |
19634f2 | Valentin Lorentz | 19 August 2019, 12:33:58 UTC | Allow -1 as Content length. It denotes files whose length is unknown. | 19 August 2019, 14:56:10 UTC |
9582985 | Valentin Lorentz | 19 August 2019, 12:33:13 UTC | Add optional 'ctime' field to Content. | 19 August 2019, 12:33:13 UTC |
767ed20 | Valentin Lorentz | 19 August 2019, 12:31:40 UTC | Generated content with status=hidden should have a data field. | 19 August 2019, 12:31:40 UTC |
56eb29f | Valentin Lorentz | 05 August 2019, 08:56:02 UTC | Add a SHA1_SIZE constant for use by other packages. | 05 August 2019, 08:56:02 UTC |
a92af53 | Valentin Lorentz | 18 July 2019, 08:52:36 UTC | Add a 'metadata' field to releases. Loaders use it, but it is ignored by the pg storage. However, as the Cassandra storage uses swh-model to validate its input, it refuses this input from the loaders (and journal). | 18 July 2019, 10:24:41 UTC |
68142a7 | Stefano Zacchiroli | 11 July 2019, 14:30:26 UTC | add code of conduct document | 11 July 2019, 14:30:26 UTC |
67fade5 | Valentin Lorentz | 29 May 2019, 14:15:39 UTC | Add origin persistent identifiers. | 10 July 2019, 13:39:22 UTC |
d8f17f2 | Stefano Zacchiroli | 28 June 2019, 07:42:00 UTC | CONTRIBUTORS: add Ishan Bhanuka | 28 June 2019, 07:42:00 UTC |
dde39f5 | Ishan Bhanuka | 27 June 2019, 04:20:16 UTC | Reformat docstring for max line length | 27 June 2019, 16:43:42 UTC |
1072884 | Ishan Bhanuka | 18 June 2019, 12:18:38 UTC | Add pyblake2 platform specific dependency Remove version checking code, pyblake2 is installed by default on python 3.6+ | 18 June 2019, 13:49:42 UTC |
b3250d2 | Valentin Lorentz | 18 June 2019, 11:40:20 UTC | Remove dependency on swh-core. This is a fix to workaround pip's inability to correctly solve extra requirements (swh-model depends on swh-core[], but if other packages depend on swh-model and swh-core[http], the 'http' extra does not always get installed). | 18 June 2019, 11:40:56 UTC |
d7ec4a6 | David Douard | 15 May 2019, 13:44:21 UTC | cli: add support for --help on the 'identify' cli tool | 11 June 2019, 08:06:30 UTC |
0815880 | David Douard | 15 May 2019, 13:31:13 UTC | setup: register the 'identify' cli subcommand | 11 June 2019, 08:06:30 UTC |
60c3f7d | David Douard | 15 May 2019, 13:30:06 UTC | cli: the 'objects' argument is in fact mandatory | 11 June 2019, 08:06:30 UTC |
c3a7e4e | Valentin Lorentz | 05 June 2019, 09:28:34 UTC | Prevent Hypothesis from writing the null character in the 'reason' field. pgsql does not support it. | 05 June 2019, 09:28:34 UTC |
b42d35c | Valentin Lorentz | 03 June 2019, 14:29:29 UTC | Prevent generation of empty branch names. | 03 June 2019, 14:29:29 UTC |
6ef1dc1 | Valentin Lorentz | 09 May 2019, 14:29:46 UTC | Explicitely implement from_dict instead of using introspection magic. There is more repetition, but it's easier to read and '%timeit Revision.from_dict(d)' is 5 times faster. | 10 May 2019, 08:25:55 UTC |
fc3d3c1 | Valentin Lorentz | 26 April 2019, 11:33:29 UTC | Prevent from_dict() from changing its input dict. | 26 April 2019, 11:33:29 UTC |
efc7e72 | Valentin Lorentz | 12 April 2019, 13:51:15 UTC | Add a from_dict() method to model classes, that does the inverse of to_dict(). | 16 April 2019, 08:17:44 UTC |
868b8c3 | Nicolas Dandrimont | 12 April 2019, 10:03:09 UTC | Update coverage gitignore | 12 April 2019, 10:03:09 UTC |
fee3a41 | Nicolas Dandrimont | 11 April 2019, 10:03:10 UTC | Make sure timestamps can be represented by Python no matter the timezone | 11 April 2019, 10:03:10 UTC |
54490c9 | Valentin Lorentz | 09 April 2019, 16:30:50 UTC | Limit Content.length to what the pgsql storage supports. | 09 April 2019, 16:30:50 UTC |
f9641d2 | Valentin Lorentz | 08 April 2019, 19:46:28 UTC | Tune the model generation to work with the pgsql storage. | 09 April 2019, 15:06:34 UTC |
6909704 | Valentin Lorentz | 08 April 2019, 13:14:13 UTC | Check recursively that .to_dict() returns a nested dict. | 08 April 2019, 13:14:13 UTC |
9f80661 | Valentin Lorentz | 08 April 2019, 13:13:36 UTC | Remove debug prints. | 08 April 2019, 13:13:36 UTC |
d1b2156 | Valentin Lorentz | 05 April 2019, 17:15:16 UTC | Add a model based using 'attrs' and Hypothesis strategies to generate it. | 08 April 2019, 12:43:12 UTC |
4d40f4d | Valentin Lorentz | 04 April 2019, 18:46:15 UTC | Make snapshot_identifier add the cycle to the exception's arguments when it detects one. | 04 April 2019, 18:46:15 UTC |