https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
bdfe9d9 Updated backport on buster-swh from debian/1.0.0-1_swh1 (unstable-swh) 24 February 2022, 11:27:28 UTC
730eeed Merge tag 'debian/1.0.0-1_swh1' into debian/buster-swh 24 February 2022, 11:27:28 UTC
3612cc6 Updated debian changelog for version 1.0.0 24 February 2022, 11:13:57 UTC
eae87d8 Update upstream source from tag 'debian/upstream/1.0.0' Update to upstream version '1.0.0' with Debian dir 8ae626ab798d36848604eb33d6358be5da4d7885 24 February 2022, 11:13:57 UTC
35e7f0c New upstream version 1.0.0 24 February 2022, 11:13:55 UTC
61acdf3 Prepare v1: bump dependency to swh.core 2 24 February 2022, 11:02:29 UTC
215162b Update for swh.core 2.0.0 - Add expected entry points for swh.core 2 db handling new features: - add a ``swh.storge.get_datastore()`` function - add ``swh.storage.postgreql.storage.Storage.get_current_version()`` method - move sql migration scripts in ``swh/storage/sql/upgrades`` - modify sql initialization scripts to match swh.core 2 (remove dbversion management code). - Update tests to use the new template-based database handling; this should have only minimal impact on test execution performances. 24 February 2022, 10:05:01 UTC
386fb4d Add types-toml to requirements-test.txt 24 February 2022, 10:05:01 UTC
f578377 pre-commit: Bump hooks and add new one to check commit message spelling To install the new hook: $ pre-commit install -t commit-msg 10 February 2022, 16:29:28 UTC
e4c4983 Updated backport on buster-swh from debian/0.43.1-1_swh1 (unstable-swh) 08 February 2022, 09:45:21 UTC
2f90249 Merge tag 'debian/0.43.1-1_swh1' into debian/buster-swh 08 February 2022, 09:45:20 UTC
97244e6 Updated debian changelog for version 0.43.1 08 February 2022, 09:31:33 UTC
631eb17 Update upstream source from tag 'debian/upstream/0.43.1' Update to upstream version '0.43.1' with Debian dir baaa2fcae1f2b0d13d9d811af8f9f21ca52024d8 08 February 2022, 09:31:33 UTC
10d367f New upstream version 0.43.1 08 February 2022, 09:31:31 UTC
8bf07c3 revision_walker: Actually pass ignore_displaynam to revision_log I somehow forgot to stage this change in the previous commit. 04 February 2022, 15:46:07 UTC
0f9f54b revision_walker: Add support for ignore_displayname. This is needed by the vault. 04 February 2022, 15:14:59 UTC
75a7f09 Add typing to revision_walker.py and make the state a dataclass 04 February 2022, 15:14:53 UTC
a3a63d8 Require pytest to be <7.0.0 04 February 2022, 14:02:23 UTC
25fd5e2 Updated backport on buster-swh from debian/0.43.0-1_swh1 (unstable-swh) 02 February 2022, 18:10:01 UTC
40dc7c6 Merge tag 'debian/0.43.0-1_swh1' into debian/buster-swh 02 February 2022, 18:10:01 UTC
191992d Updated debian changelog for version 0.43.0 02 February 2022, 17:56:34 UTC
48357a7 Update upstream source from tag 'debian/upstream/0.43.0' Update to upstream version '0.43.0' with Debian dir 2f67565746d5bbcfcedb31d8154c0ed0db035567 02 February 2022, 17:56:33 UTC
ec42d59 New upstream version 0.43.0 02 February 2022, 17:56:31 UTC
4544d7c Introduce a new displayname field for persons in the PostgreSQL storage Extend the APIs for Revisions and Releases to honor the field by default, unless the new `ignore_displayname` argument is set. 01 February 2022, 12:03:52 UTC
97caa93 Make test_release_add_get_arbitrary non-flaky It was made flaky by d4ddd41535d0ce1cd50d51d297e154bf0ab6e649. 01 February 2022, 12:03:52 UTC
f868f3c Mostly use normalized Person objects in tests This opens up the possibility of eventually ignoring the `name` and `email` fields stored in database in favor of parsing them again from the fullname field (and therefore to update our parsing logic without having to affect stored data). 31 January 2022, 20:56:47 UTC
d4ddd41 postgresql: Use Person.from_fullname if name and email are None This allows us to populate sensible name and email values out of the new displayname field, without having to store them. 31 January 2022, 19:12:09 UTC
271a259 Updated backport on buster-swh from debian/0.42.0-1_swh1 (unstable-swh) 25 January 2022, 16:08:11 UTC
8079f65 Merge tag 'debian/0.42.0-1_swh1' into debian/buster-swh 25 January 2022, 16:08:10 UTC
b752a68 Updated debian changelog for version 0.42.0 25 January 2022, 15:54:46 UTC
c2a93a6 Update upstream source from tag 'debian/upstream/0.42.0' Update to upstream version '0.42.0' with Debian dir 2b9b41be406880810778652bc1db36aade0058e2 25 January 2022, 15:54:45 UTC
bedc372 New upstream version 0.42.0 25 January 2022, 15:54:44 UTC
6f02524 Fix directory_add to actually insert the manifest + add directory_get_raw_manifest I don't expect directory_get_raw_manifest to be used, but it is needed for tests, so why not. 25 January 2022, 09:46:27 UTC
5874905 Stop using the deprecated 'TimestampWithTimezone.offset' attribute It will be replaced by what is currently called 'offset_bytes' 21 January 2022, 14:01:31 UTC
2e74138 Remove 'offset' and 'negative_utc' This only keeps 'offset_bytes' to store the timezone, to support swh-model v5.0.0. However, this keeps writing 'offset' and 'negative_utc' to the postgresql database, just in case we need to roll back this change. But they are not read anymore. 21 January 2022, 13:07:02 UTC
c68a4fd postgres: Add indices to keep track of objects with a raw_manifest They should be a rare occurence, so adding these indices allows us to count and enumerate them without expensive full table scans. 18 January 2022, 10:04:28 UTC
228de33 Fix sphinx error [2022-01-17T16:03:27.448Z] /var/lib/jenkins/workspace/DSTO/tests-on-diff@2/docs/index.rst:25:hardcoded link 'https://archive.softwareheritage.org/api/' could be replaced by an extlink (try using ':swh_web:`api/`' instead) 18 January 2022, 09:16:05 UTC
40a57d4 cassandra: Make content_missing run in linear time instead of quadratic Assuming all contents passed to content_missing() have (at least) a missing algo, the function used to iterate over the size of the arg squared in the worst case (when all contents are found). With this commit, it starts with bucketing them by hash, so it does not need to iterate over *all* found contents for each content passed as arg. 12 January 2022, 10:38:20 UTC
d5f1f0e cassandra: Rewrite content_missing to run queries concurrently. This is twice as fast, according to https://forge.softwareheritage.org/T3577#72791 12 January 2022, 10:37:57 UTC
4a24505 cassandra: Use concurrent queries in *_missing() instead of naive grouping Instead of grouping ids in queries in arbitrary batches (which forces the server node to coordinate with other nodes to complete the query), this sends queries with one id each, directly to the right node. This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791 which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions. 06 January 2022, 11:43:09 UTC
4618e7f Updated backport on buster-swh from debian/0.41.1-1_swh1 (unstable-swh) 04 January 2022, 15:08:37 UTC
3ce1ffb Merge tag 'debian/0.41.1-1_swh1' into debian/buster-swh 04 January 2022, 15:08:37 UTC
f63d2e7 Updated debian changelog for version 0.41.1 04 January 2022, 14:55:33 UTC
71abcc3 Update upstream source from tag 'debian/upstream/0.41.1' Update to upstream version '0.41.1' with Debian dir cffa1d419282197d3c69b493f6fc958541b9c38d 04 January 2022, 14:55:32 UTC
c7cbc9a New upstream version 0.41.1 04 January 2022, 14:55:31 UTC
259bf6f Improve documentation of the replay command 04 January 2022, 11:02:44 UTC
1071781 Move the 'error_reporter' config entry in a dedicated 'replayer' section 04 January 2022, 11:02:44 UTC
f4622d7 Updated backport on buster-swh from debian/0.41.0-1_swh1 (unstable-swh) 22 December 2021, 15:36:09 UTC
891de85 Merge tag 'debian/0.41.0-1_swh1' into debian/buster-swh 22 December 2021, 15:36:09 UTC
e8c51f3 Updated debian changelog for version 0.41.0 22 December 2021, 15:22:18 UTC
f99996d Update upstream source from tag 'debian/upstream/0.41.0' Update to upstream version '0.41.0' with Debian dir 4a663880267eda8e75f3f01cab624e5d5e7bbe3b 22 December 2021, 15:22:17 UTC
aa3ce30 New upstream version 0.41.0 22 December 2021, 15:22:15 UTC
f3232e6 Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel 22 December 2021, 12:29:11 UTC
f09a54d Pin mypy and drop type annotations which makes mypy unhappy This also drops: - spurious copyright headers to those files if present. - fix a type issue revealed by the new mypy Related to T3812 16 December 2021, 15:18:36 UTC
c40ceb3 Add tests checking round-tripping of dir/rev/rel/snp objects generated by Hypothesis 15 December 2021, 12:11:56 UTC
45687d8 Add test_revision_add_fractional_timezone 15 December 2021, 12:11:51 UTC
fb1b3a0 postgresql: Fix one-by-one error in db_to_date on negative dates Using `int()` on `date.timestamp()` rounded it up (toward zero), but the semantics of `model.Timestamp` is that the actual time is `ts.seconds + ts.microseconds/1000000`, so all negative dates were shifted one second up. In particular, this causes dates from `1969-12-31T23:59:59.000001` to `1969-12-31T23:59:59.999999` (inclusive) to smash into dates from `1970-01-01T00:00:00.000001` to `1970-01-01T00:00:00.999999`, which is how I discovered the issue. 13 December 2021, 11:54:35 UTC
34ca67e postgresql: Add tests for db_to_date. 09 December 2021, 15:23:44 UTC
7cb4128 proxies/retry: Remove no longer needed tenacity workarounds Now that we have packaged tenacity 6.2 for debian buster and use it in production, we can remove the workarounds to support tenacity < 5. 08 December 2021, 11:09:15 UTC
615fb99 test_cassandra: Fix failing tests since swh-model update Directory entries are now checked for name duplicates in swh-model so we must ensure the CrashyEntry class is properly initialized. Closes T3776 07 December 2021, 12:36:43 UTC
96d7d7d Updated backport on buster-swh from debian/0.40.0-2_swh1 (unstable-swh) 17 November 2021, 09:20:41 UTC
4e093d6 Merge tag 'debian/0.40.0-2_swh1' into debian/buster-swh 17 November 2021, 09:20:40 UTC
ba105df d/changelog: version 0.40.0-2~swh1 17 November 2021, 08:46:19 UTC
9af9b26 Update dependencies in d/control 17 November 2021, 08:44:58 UTC
0b1b158 Updated debian changelog for version 0.40.0 16 November 2021, 09:45:39 UTC
1b6c7fc Update upstream source from tag 'debian/upstream/0.40.0' Update to upstream version '0.40.0' with Debian dir 0decf5aa20880a80efe3b342417439b86249cc5f 16 November 2021, 09:45:38 UTC
d8994d5 New upstream version 0.40.0 16 November 2021, 09:45:37 UTC
850a755 Add support for a redis-based reporting for invalid mirrorred objects The idea is that we check the BaseModel validity at journal deserialization time so that we still have access to the raw object from kafka for complete reporting (object id plus raw message from kafka). This uses a new ModelObjectDeserializer class that is responsible for deserializing the kafka message (still using kafka_to_value) then immediately create the BaseModel object from that dict. Its `convert` method is then passed as `value_deserializer` argument of the `JournalClient`. Then, for each deserialized object from kafka, if it's a HashableObject, check its validity by comparing the computed hash with its id. If it's invalid, report the error in logs, and if configured, register the invalid object in via the `reporter` callback. In the cli code, a `Redis.set()` is used a such a callback (if configured). So it simply stores invalid objects using the object id a key (typically its swhid), and the raw kafka message value as value. Related to T3693. 09 November 2021, 15:36:34 UTC
04bd15a Refactor fixer.fix_objects() to extract the inner object_fixers dict allowing to use this dict independently of the fix_objects() function. 09 November 2021, 15:36:34 UTC
d655c85 Remove now useless fixers keep the the fix_objects() function for bw compat for now. 09 November 2021, 15:36:34 UTC
55eed77 Add a --type option to 'swh storage replay' allows to choose replayed object types from the cli. 09 November 2021, 15:36:34 UTC
0262f1c Update extrinsic-metadata-specification.rst to match the current implementation * merged origin and artifact metadata * added metametadata * uses structures instead of dict * removed raw_extrinsic_metadata_get_latest 09 November 2021, 14:47:31 UTC
f6af4b4 Updated backport on buster-swh from debian/0.39.0-1_swh1 (unstable-swh) 29 October 2021, 09:23:51 UTC
0ced1ee Merge tag 'debian/0.39.0-1_swh1' into debian/buster-swh 29 October 2021, 09:23:50 UTC
467aa27 Updated debian changelog for version 0.39.0 29 October 2021, 09:17:36 UTC
8c63421 Update upstream source from tag 'debian/upstream/0.39.0' Update to upstream version '0.39.0' with Debian dir fc10c75ae5be9e40348d539c00fc26603ce1ecd6 29 October 2021, 09:17:35 UTC
3ce052a New upstream version 0.39.0 29 October 2021, 09:17:33 UTC
a5bfe5b interface: Add origin_snapshot_get_all method It enables to return in an efficient way the list of unique snapshot identifiers resulting from the visits of an origin. Previously it was required to query all visits of an origin then query all visit statuses for each visit to extract such information. Introduced method enables to extract origin snaphots information in a single datase query. Related to T3631 28 October 2021, 12:23:14 UTC
49a932c algos/revisions_walker: Handle case of revision without committer date Some revisions in the archive do not have committer date so workaround it to avoid errors when walking on such revisions when using the class CommitterDateRevisionsWalker. 22 October 2021, 09:32:28 UTC
c02be8e test_revisions_walker: Migrate from unittest to pytest 21 October 2021, 14:50:57 UTC
a986710 cassandra: Fix incomplete check of content existence in object_find_by_sha1_git content_missing_by_sha1_git only checks the index and not the main table. This is incorrect, because contents should not be considered written before an entry is written to the main table, even if an entry exists in one of the indexes. 18 October 2021, 11:10:54 UTC
4e1c0da Updated backport on buster-swh from debian/0.38.0-1_swh1 (unstable-swh) 11 October 2021, 15:04:39 UTC
497ba10 Merge tag 'debian/0.38.0-1_swh1' into debian/buster-swh 11 October 2021, 15:04:38 UTC
b639a49 Updated debian changelog for version 0.38.0 11 October 2021, 14:58:23 UTC
e86eafe Update upstream source from tag 'debian/upstream/0.38.0' Update to upstream version '0.38.0' with Debian dir fb3ad9c54e04b857bee1ba72970de66ed96cae17 11 October 2021, 14:58:22 UTC
8567342 New upstream version 0.38.0 11 October 2021, 14:58:20 UTC
e9fd74d serializers: Prepare rename of 'identifiers_enum' to 'swhids_enum'. This will be done in three steps to avoid any disruption: 1. (this step) add support for decoding the new name, but keep encoding as the old one 2. start encoding as the new name 3. remove support for decoding the old name 11 October 2021, 11:58:23 UTC
ea86a86 Rename imports of swh.model.identifiers to fix deprecation warnings. 11 October 2021, 11:58:23 UTC
3441f68 buffer: add some debug logging for number of objects sent 08 October 2021, 14:53:27 UTC
b604014 buffer: add a threshold for the estimated size of revision and release batches The size of individual revisions and releases is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of revisions or releases with large messages or other metadata. The duration of the database operations for revision_add or release_add is also commensurate to the size of the objects added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on estimated sizes for batches of revision and release objects makes this overuse of memory and of database transaction time much less likely. 08 October 2021, 14:53:27 UTC
7c5b0ec buffer: add a threshold for the number of revision parents in one batch The size of individual revisions is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of revisions with extensive histories. The duration of the database operation for revision_add is also commensurate to the number of revision parents added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on cumulated number of revision parents per batch makes this overuse of memory and of database transaction time much less likely. 08 October 2021, 13:42:35 UTC
5edc0ba buffer: add a threshold for the number of directory entries in one batch The size of individual directories is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of (very) large directories. The duration of the database operation for directory_add is also commensurate to the number of directory entries added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on cumulated number of directory entries per batch makes this overuse of memory and of database transaction time much less likely. 08 October 2021, 13:13:05 UTC
abe95b3 filter: add filtering for release_add 08 October 2021, 12:11:32 UTC
c52b7b6 filter: do not call the underlying functions if there's nothing to add 08 October 2021, 10:20:22 UTC
5d5d4c9 buffer: Ensure that we don't send data from empty buffers This was already the case (as grouper called on an empty iterator just returns no batches), but add a test to enforce it. 08 October 2021, 09:45:25 UTC
c7bc213 Updated backport on buster-swh from debian/0.37.1-1_swh1 (unstable-swh) 29 September 2021, 10:19:01 UTC
1044918 Merge tag 'debian/0.37.1-1_swh1' into debian/buster-swh 29 September 2021, 10:19:01 UTC
91091fe Updated debian changelog for version 0.37.1 29 September 2021, 10:12:31 UTC
4ec651f Update upstream source from tag 'debian/upstream/0.37.1' Update to upstream version '0.37.1' with Debian dir 1d9eff5555cc4c457089da74ffcf0997ef88b7c2 29 September 2021, 10:12:30 UTC
4c0ea11 New upstream version 0.37.1 29 September 2021, 10:12:28 UTC
back to top