https://github.com/SoftwareHeritage/swh-model

sort by:
Revision Author Date Message Commit Date
db855bf New upstream version 4.3.0 14 January 2022, 14:13:46 UTC
a0f5436 Restore 'offset' and 'negative_utc' arguments and make them optional This allows keeping compatibility with existing users of the TimestampWithTimezone constructor 14 January 2022, 12:44:59 UTC
be894c6 TimestampWithTimezone: Make 'offset_bytes' required and remove 'negative_utc'. 'offset' becomes a property instead of an attribute and constructor argument. This also removes both from the output of `.to_dict()`. This is step 6 of https://forge.softwareheritage.org/T3752 This will break packages that still use the constructor directly, ie. swh-storage and swh-loader-git (and tests of swh-loader-core and swh-loader-svn) 13 January 2022, 11:17:48 UTC
dbf185c Fix TimestampWithTimezone.from_dict() on datetimes before 1970 with non-integer seconds 12 January 2022, 13:41:38 UTC
94b00d6 docs: Add anchors to important sections of persistent-identifiers.rst swh-deposit needs to link to some of them. 11 January 2022, 13:27:05 UTC
a8806fb New upstream version 4.2.0 10 January 2022, 14:59:17 UTC
56460e1 git_objects: Use raw offset_bytes to format dates, and remove format_offset() This is needed when recomputing the manifest of git objects with weird offsets. 07 January 2022, 12:57:52 UTC
a0b08bc New upstream version 4.1.0 22 December 2021, 15:01:39 UTC
c6ec4d8 hashutil: drop all pre-3.6 blake2 workarounds blake2s and blake2b have been provided by the stdlib hashlib since Python 3.6, and we declare 3.7 as minimum Python version supported. 22 December 2021, 14:46:01 UTC
948bb65 New upstream version 4.0.0 22 December 2021, 12:28:53 UTC
9d517c1 hypothesis_strategies: Generate raw_manifest 22 December 2021, 12:20:53 UTC
2abbf73 model: Exclude 'raw_manifest' from dictionaries when it is null 1. Most objects do not need it so it's a waste of space 2. This means we just extend the existing format (some objects will have that key in their dict) instead of changing it (retroactively adding it to all objects) 22 December 2021, 12:20:52 UTC
f6144a1 model: Add a raw_manifest attribute This will be used to store the original manifest of 'weird' git objects, when we cannot reasonably represent them otherwise. 22 December 2021, 12:20:52 UTC
9e1e14a data-model: Remove mention of modification timestamps We don't store those 21 December 2021, 14:15:18 UTC
e342397 hypothesis_strategies: Generate only consistent directory entry permissions. 21 December 2021, 13:38:47 UTC
9d1d4a3 docs: Update the data model description Revision and release do not generally allow 'arbitrary' metadata; and it was missing ExtIDs and REMD 17 December 2021, 13:56:40 UTC
abb089d Pin mypy and drop type annotations which makes mypy unhappy This also drops spurious copyright headers to those files if present. Related to T3812 16 December 2021, 14:56:37 UTC
9b21d2d test_model: Fix compatibility with pytest-xdist Using .now() produces data that differs between xdist processes, as files are imported after forking, and xdist requires consistent data across processes. 15 December 2021, 13:01:20 UTC
3a41bdf New upstream version 3.2.0 15 December 2021, 12:39:35 UTC
59ed64b model: Add a check() method to model objects It calls attr.validate() (which calls the validators), and recomputes the hash of HashableObject instances. A future commit will also make it check the raw_manifest attribute when relevant 08 December 2021, 15:24:44 UTC
5eb6539 model: Deduplicate calls to hashlib. It's just simpler this way 08 December 2021, 15:20:56 UTC
2d0105b model: Add support for None to the type checker For the sake of completeness (a future commit may depend on it). 08 December 2021, 15:17:40 UTC
c322904 Add attribute TimestampWithTimezone.offset_bytes, to store raw Git offsets For now it is filled from 'offset' and 'negative_utc', but it will replace them in a future commit. This is to simplify and add support for more 'weird' offsets we do not currently support. 08 December 2021, 14:46:29 UTC
52d6e0a hypothesis_strategies: Ensure to generate valid directory entry name (again) 08 December 2021, 14:46:09 UTC
4803d78 Revert "hypothesis_strategies: Ensure to generate valid directory entry name" This reverts commit c525484e473720ffcde89534d3ba89093bc8b016. 08 December 2021, 14:46:09 UTC
f6e0a28 from_disk: Implement Directory.__contains__ It enables to easily check if a path exists from a root directory. 08 December 2021, 13:42:02 UTC
c525484 hypothesis_strategies: Ensure to generate valid directory entry name Since rDMOD8d96dfedee34203a4118e48a6208ee507511590b, directory entry names are validated in DirectoryEntry model and thus must not contain any slash characters. So update directory_entries_d hypothesis strategy to ensure such names are generated to avoid flaky tests. 07 December 2021, 10:34:39 UTC
af9cd2c New upstream version 3.1.0 06 December 2021, 18:51:46 UTC
37364c2 hashutil: Add support for md5 sum Enable to compute md5 sum through the hashutil.MultiHash class. Nevertheless, md5 is not put in DEFAULT_ALGORITHMS set and must be explicitely requested by client code. Related to T2400 06 December 2021, 18:26:31 UTC
243520d test_hashutil: Port tests from unittest to pytest 06 December 2021, 18:26:31 UTC
8d96dfe model: Validate Directory entry names I don't know any instance of these, but there is no harm in checking them. 01 December 2021, 17:20:47 UTC
2ffe5db Give model and swhid objects a nicer repr() 1. hashes are now repr()ed as `hash_to_bytes("1234...")` instead of b"\x12\x34..."` 2. SWHID objects are now repr()ed as `CoreSWHID.from_string('swh:1:...:1234...')` instead of `CoreSWHID(scheme='swh', version='1', object_type=..., object_id=b'\x12\x34')` 3. enums are now repr()ed as `MyEnum.NAME` instead of "<MyEnum.NAME: 'value'>` Thanks to these three changes, using repr() on a model object now prints a string that can be pasted directly in a `.py` file to write a new test case. 05 November 2021, 10:28:53 UTC
916627e type_validator: Re-allow subclasses The previous replaced attrs-strict's type validator with our own, stricter and faster, validator. However, the strictness can be a burden in other packages; for example, swh-storage tests rely on it to insert dummy data that raises exception when accessed, and it would be hard to do while using the exact expected type. This commit reverts the strict behavior, but keeps the performance optimization, by always checking with type equality, but in case type equality fails (which would raise an error before this commit), it gives the value a 'second chance', by trying isinstance. This means that, outside tests, isinstance should not be used at all, or very rarely. 01 October 2021, 12:31:41 UTC
99eb654 New upstream version 3.0.0 28 September 2021, 14:05:17 UTC
734b081 model: Replace attrs-strict with stricter validation This reimplements attrs_strict.type_validator(), using type equality instead of isinstance. This makes my checksum validation script (that mostly just instantiates model objects, computes a checksum, then discard) run twice as fast. 28 September 2021, 09:51:15 UTC
e30eb7d persistent-identifiers.rst: Update references to manifest formats 24 September 2021, 10:08:47 UTC
7034b16 Add module-level docstrings. 23 September 2021, 18:22:32 UTC
4968b74 Move SWHID-related tests to test_swhids.py For consistency, as the classes are now in swhids.py 23 September 2021, 18:01:09 UTC
d5fd652 git_objects: Fix type annotations to accept the old dicts + raise warnings 23 September 2021, 15:13:38 UTC
f56becc Deprecate identifiers.py 1. Add a warning 2. Move identifier/manifest documentation to git_objects.py 3. Remove all imports of that module. Motivation: * SWHID classes were moved to swhids.py * manifest computation functions were moved to git_objects.py * Only reexports and trivial wrappers of model.py remain 23 September 2021, 14:52:31 UTC
df73e74 test_identifiers.py: Make sha1_git literals more consistent. 23 September 2021, 12:42:26 UTC
510df60 Remove identifier_to_bytes and identifier_to_hex They are not used anywhere. 23 September 2021, 12:42:26 UTC
9e8a547 Move manifest computation functions from identifiers.py to git_objects.py Since they are used by the vault for non-identifier-related stuff, I think it makes sense to move them to a new module. identifiers.py is now an empty shell, as all its features were moved to other modules and it only contains reexports and backward-compat functions. Therefore, it should be considered deprecated from now on. 23 September 2021, 12:42:26 UTC
57ae405 Refactor identifiers & model to make *_git_object() functions work on model classes instead of dicts Since we now use these classes everywhere, computing hashes required using to_dict() just to compute identifiers, which can be a performance bottleneck in code computing many checksums. 23 September 2021, 12:42:10 UTC
6a72f88 test_identifiers.py: Fix/update malformed data dicts A future commit will make identifier computation use the attrs classes, which are strict about what they accept. 23 September 2021, 09:08:36 UTC
9ec6832 Move SWHID classes and functions from identifiers.py to swhids.py identifiers.py initially worked only on bare sha1_git. I chose to add the SWHID classes in that module because of the name, but the SWHID code didn't actually interact with the other functions in the module, so it now feels out of place to me. 23 September 2021, 09:08:36 UTC
0dd33cd Add bazaar as supported revision type We're about to have a Bazaar loader 23 September 2021, 08:56:38 UTC
8a6faa5 New upstream version 2.9.0 16 September 2021, 12:24:47 UTC
b6f5e30 HashableObject: Add type annotation for 'id' attribute This class is a mixin that only works with classes that define this attribute, so it makes sense to declare it. It also allows generic functions (that take a HashableObject parameter) to access it without 'type: ignore'. 16 September 2021, 12:09:15 UTC
e879497 Run Black 16 September 2021, 12:07:41 UTC
4f04c39 New upstream version 2.8.0 27 July 2021, 14:26:09 UTC
a32652f add a CVS revision type for use with the CVS loader 27 July 2021, 12:23:28 UTC
7b42ce6 New upstream version 2.7.0 23 July 2021, 14:53:43 UTC
1545ef7 Add an extid_version field to ExtIDs This allows distinguishing multiple potential versions of the mapping between external objects and their counterparts archived in Software Heritage, for instance when a loader has a backwards-incompatible change that should result in objects being loaded again. The field defaults to zero, in which case it's backwards-compatible with the previous implementation in terms of identifier computation. 23 July 2021, 14:08:30 UTC
18ee672 New upstream version 2.6.4 02 July 2021, 16:11:30 UTC
153c6e8 make deduplication optional when iterating over the merkle tree 02 July 2021, 09:51:55 UTC
cfb3073 hypothesis_strategies: generate non-ASCII IRIs for origin and authority 'urls'. We agreed a while ago they should be IRIs and not just URIs. This will trigger crashes in swh.storage.cassandra, as currently expects (wrongly) that origin urls are ASCII. 25 June 2021, 15:24:40 UTC
4009b3a hypothesis_strategies: Restrict size and alphabets for metadata_fetchers and raw_extrinsic_metadata * empty fetcher name or version is not accepted by cassandra (and is nonsensical anyway) * ditto for non-ASCII (and any non-printable is nonsensical) * null bytes/chars are accepted by neither postgresql or cassandra 25 June 2021, 15:08:16 UTC
4e9ec79 New upstream version 2.6.3 25 June 2021, 14:17:33 UTC
90b477e hypothesis_strategies: Generate None metadata instead of {} This is the only value we should use from now on; and the value is ignored by swh-storage anyway. 25 June 2021, 14:13:37 UTC
d4c6fbc New upstream version 2.6.2 25 June 2021, 10:44:32 UTC
baff6ba hypothesis_strategies: Add raw_extrinsic_metadata() strategy It will be used by swh-web. 25 June 2021, 09:22:32 UTC
e4566a6 from_disk: get swhid from Content/Directory objects Closes T3393 21 June 2021, 15:06:09 UTC
abfab0a New upstream version 2.6.1 16 June 2021, 10:03:23 UTC
e09446a encode exclude patterns before extracting regex objects - add typing annotation to avoid such error in the future Fixes T3383 15 June 2021, 16:23:43 UTC
b5e4052 New upstream version 2.6.0 15 June 2021, 14:56:07 UTC
8ec0cf6 Add a TimestampWithTimezone.to_datetime() method 15 June 2021, 09:22:38 UTC
428c170 Fix normalize_timestamp() for datetime < epoch with microsecond>0 the problem was for datetime<epoch, the timestamp is negative, but since it's a float that includes the microseconds, if both are true (< epoch and microsecond > 0), then the computed (int) timestamp was off by one. Add dedicated tests for this. 15 June 2021, 09:22:38 UTC
ae50e43 cli: add recursive option 11 June 2021, 14:25:24 UTC
8540c67 mypy: Fix errors with release >= v0.900 09 June 2021, 13:36:37 UTC
e38c659 New upstream version 2.5.1 20 May 2021, 13:40:16 UTC
4808fc2 Fix snapshot entries in swh_model_data test data make sure the snapshot id in OriginVisitStatus refers to existing Snapshot objects. 19 May 2021, 13:21:18 UTC
4b9ef2d New upstream version 2.5.0 11 May 2021, 10:07:46 UTC
96cc355 tox: Disable coverage in py3-minimal If run after py3-full, it overwrites the full coverage info, so most lines are incorrectly reported as uncovered in Phabricator diffs. 11 May 2021, 09:24:43 UTC
8c904dc identifiers: Expose git_object instead of manifest The git_object is what will be actually useful to the vault. It's also easier to test, because test_identifier.py has the entire git_object in its test data. 11 May 2021, 08:59:28 UTC
523ab64 identifiers: Expose manifest computation Before this commit, manifests were only computed internally before hashing, so they were not available to outside modules. This makes testing the module very painful, because identifier functions can only be tested by checking the hash; so test failures did not show mismatches between the computed manifest and the expected one. Additionally, the 'git bare cooker' of the vault is likely to use these as well, as it needs to format git objects in the same format. 11 May 2021, 08:33:36 UTC
7e77c2b New upstream version 2.4.2 06 May 2021, 12:35:41 UTC
31cb72e Blacklist attr 21.1.0 There is a regression that breaks attr.evolve() when updating attributes that contain an attr class; which we use (eg. for Person or TimestampWithTimezone). v21.2.0 is expected to fix the issue, but won't be released immediately: https://github.com/python-attrs/attrs/issues/804#issuecomment-833471190 06 May 2021, 12:18:22 UTC
df036ef docs/persistent-identifiers: Add guidelines for fixing invalid SWHIDs. 30 April 2021, 11:01:02 UTC
fae3159 New upstream version 2.4.1 29 April 2021, 12:23:20 UTC
f7e9d5c tox: Add sphinx environments to check sane doc build Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258 28 April 2021, 12:01:35 UTC
446bd2b Fix swh_model_data hardcoded id values and add a test to keep them correct. 23 April 2021, 15:24:08 UTC
1f6b3b9 swh_model_data: add parents to test revision 15 April 2021, 14:37:01 UTC
8d0352c Fix various Sphinx warnings 15 April 2021, 08:20:39 UTC
c0e9822 New upstream version 2.4.0 13 April 2021, 13:31:21 UTC
74b024f identifiers: Fix parsing of SWHID qualifier value containing '=' According to the SWHID specification, it is not forbidden for a qualifier value to contain a '=' character (for instance in origin URL). So update parsing code to handle that special case. 13 April 2021, 12:02:33 UTC
15d5bab identifiers: Fix some invalid ValidationError template string formats Some ValidationError exceptions could not be serialized to string due to these format errors. Related to T3234 13 April 2021, 10:43:55 UTC
f2dba17 docs: Ask readers to install swh.model[cli] to fully use swh-identify Otherwise, they will get an error asking them to install Click (or Dulwich if Click installed and they use -t snapshot) 12 April 2021, 11:18:58 UTC
27a05d6 tox: Check swh-identify can run even if Dulwich isn't installed 12 April 2021, 11:18:58 UTC
c62f13f swh-identify: Hide tracebacks if Click or Dulwich is not installed And show nice human-readable errors instead 09 April 2021, 13:14:52 UTC
eeedac7 Remove accidental dependency of 'swh-identify' on swh-core Was added in be8f1a559d8209710a08ca48d93b7f513fa1c42f 08 April 2021, 14:16:42 UTC
9523be0 Model test data: add Release with no author/date Some releases don't have authors and date fields, this case should be checked in the tests. 26 March 2021, 11:05:51 UTC
7f124d8 New upstream version 2.3.0 19 March 2021, 16:17:47 UTC
af5e461 Truncate RawExtrinsicMetadata.discovery_date to a second This truncation is already enshrined at the identifier level. Truncate the object itself as well, to reduce the possibility multiple different metadata objects with the same identifier. 18 March 2021, 09:57:02 UTC
377d1df New upstream version 2.2.0 15 March 2021, 09:35:24 UTC
975e989 model: Add a swhid() method to RawExtrinsicMetadata. All other hashable objects but ExtId have one. It will be used by swh-deposit. 12 March 2021, 12:58:37 UTC
da9fc11 New upstream version 2.1.0 11 March 2021, 13:21:39 UTC
71be461 Add an ExtID object this object aims at being able to keep in the SWH Archive an SWHID <-> External object ID map, e.g. to be able to keep track of Mercurial ids so the Mercurial loader can be made more efficient. Related to T2849. 10 March 2021, 09:39:28 UTC
fca3658 Fix MetadataAuthority.from_dict() was modifying the dict given as argument. 08 March 2021, 15:15:08 UTC
d66a180 New upstream version 2.0.0 05 March 2021, 09:14:33 UTC
back to top