Revision history - None - origin: https://github.com/SoftwareHeritage/swh-storage

visit type:

Revision	Author	Date	Message	Commit Date
b752a68	Jenkins for Software Heritage	25 January 2022, 15:54:46 UTC	Updated debian changelog for version 0.42.0	25 January 2022, 15:54:46 UTC
c2a93a6	Jenkins for Software Heritage	25 January 2022, 15:54:45 UTC	Update upstream source from tag 'debian/upstream/0.42.0' Update to upstream version '0.42.0' with Debian dir 2b9b41be406880810778652bc1db36aade0058e2	25 January 2022, 15:54:45 UTC
bedc372	Jenkins for Software Heritage	25 January 2022, 15:54:44 UTC	New upstream version 0.42.0	25 January 2022, 15:54:44 UTC
6f02524	Valentin Lorentz	24 January 2022, 11:52:04 UTC	Fix directory_add to actually insert the manifest + add directory_get_raw_manifest I don't expect directory_get_raw_manifest to be used, but it is needed for tests, so why not.	25 January 2022, 09:46:27 UTC
5874905	Valentin Lorentz	21 January 2022, 12:01:42 UTC	Stop using the deprecated 'TimestampWithTimezone.offset' attribute It will be replaced by what is currently called 'offset_bytes'	21 January 2022, 14:01:31 UTC
2e74138	Valentin Lorentz	12 January 2022, 15:36:24 UTC	Remove 'offset' and 'negative_utc' This only keeps 'offset_bytes' to store the timezone, to support swh-model v5.0.0. However, this keeps writing 'offset' and 'negative_utc' to the postgresql database, just in case we need to roll back this change. But they are not read anymore.	21 January 2022, 13:07:02 UTC
c68a4fd	Valentin Lorentz	17 January 2022, 15:59:37 UTC	postgres: Add indices to keep track of objects with a raw_manifest They should be a rare occurence, so adding these indices allows us to count and enumerate them without expensive full table scans.	18 January 2022, 10:04:28 UTC
228de33	Valentin Lorentz	18 January 2022, 09:16:01 UTC	Fix sphinx error [2022-01-17T16:03:27.448Z] /var/lib/jenkins/workspace/DSTO/tests-on-diff@2/docs/index.rst:25:hardcoded link 'https://archive.softwareheritage.org/api/' could be replaced by an extlink (try using ':swh_web:`api/`' instead)	18 January 2022, 09:16:05 UTC
40a57d4	Valentin Lorentz	07 January 2022, 12:04:25 UTC	cassandra: Make content_missing run in linear time instead of quadratic Assuming all contents passed to content_missing() have (at least) a missing algo, the function used to iterate over the size of the arg squared in the worst case (when all contents are found). With this commit, it starts with bucketing them by hash, so it does not need to iterate over all found contents for each content passed as arg.	12 January 2022, 10:38:20 UTC
d5f1f0e	Valentin Lorentz	18 October 2021, 11:25:20 UTC	cassandra: Rewrite content_missing to run queries concurrently. This is twice as fast, according to https://forge.softwareheritage.org/T3577#72791	12 January 2022, 10:37:57 UTC
4a24505	Valentin Lorentz	06 January 2022, 11:41:45 UTC	cassandra: Use concurrent queries in *_missing() instead of naive grouping Instead of grouping ids in queries in arbitrary batches (which forces the server node to coordinate with other nodes to complete the query), this sends queries with one id each, directly to the right node. This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791 which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions.	06 January 2022, 11:43:09 UTC
f63d2e7	Jenkins for Software Heritage	04 January 2022, 14:55:33 UTC	Updated debian changelog for version 0.41.1	04 January 2022, 14:55:33 UTC
71abcc3	Jenkins for Software Heritage	04 January 2022, 14:55:32 UTC	Update upstream source from tag 'debian/upstream/0.41.1' Update to upstream version '0.41.1' with Debian dir cffa1d419282197d3c69b493f6fc958541b9c38d	04 January 2022, 14:55:32 UTC
c7cbc9a	Jenkins for Software Heritage	04 January 2022, 14:55:31 UTC	New upstream version 0.41.1	04 January 2022, 14:55:31 UTC
259bf6f	David Douard	09 December 2021, 12:11:46 UTC	Improve documentation of the replay command	04 January 2022, 11:02:44 UTC
1071781	David Douard	09 December 2021, 12:05:16 UTC	Move the 'error_reporter' config entry in a dedicated 'replayer' section	04 January 2022, 11:02:44 UTC
e8c51f3	Jenkins for Software Heritage	22 December 2021, 15:22:18 UTC	Updated debian changelog for version 0.41.0	22 December 2021, 15:22:18 UTC
f99996d	Jenkins for Software Heritage	22 December 2021, 15:22:17 UTC	Update upstream source from tag 'debian/upstream/0.41.0' Update to upstream version '0.41.0' with Debian dir 4a663880267eda8e75f3f01cab624e5d5e7bbe3b	22 December 2021, 15:22:17 UTC
aa3ce30	Jenkins for Software Heritage	22 December 2021, 15:22:15 UTC	New upstream version 0.41.0	22 December 2021, 15:22:15 UTC
f3232e6	Valentin Lorentz	15 December 2021, 18:01:04 UTC	Add columns {,committer_}date_offset to rev/rel and raw_manifest to dir/rev/rel	22 December 2021, 12:29:11 UTC
f09a54d	Antoine R. Dumont (@ardumont)	16 December 2021, 15:18:36 UTC	Pin mypy and drop type annotations which makes mypy unhappy This also drops: - spurious copyright headers to those files if present. - fix a type issue revealed by the new mypy Related to T3812	16 December 2021, 15:18:36 UTC
c40ceb3	Valentin Lorentz	09 December 2021, 16:27:14 UTC	Add tests checking round-tripping of dir/rev/rel/snp objects generated by Hypothesis	15 December 2021, 12:11:56 UTC
45687d8	Valentin Lorentz	09 December 2021, 16:04:40 UTC	Add test_revision_add_fractional_timezone	15 December 2021, 12:11:51 UTC
fb1b3a0	Valentin Lorentz	09 December 2021, 15:25:20 UTC	postgresql: Fix one-by-one error in db_to_date on negative dates Using `int()` on `date.timestamp()` rounded it up (toward zero), but the semantics of `model.Timestamp` is that the actual time is `ts.seconds + ts.microseconds/1000000`, so all negative dates were shifted one second up. In particular, this causes dates from `1969-12-31T23:59:59.000001` to `1969-12-31T23:59:59.999999` (inclusive) to smash into dates from `1970-01-01T00:00:00.000001` to `1970-01-01T00:00:00.999999`, which is how I discovered the issue.	13 December 2021, 11:54:35 UTC
34ca67e	Valentin Lorentz	09 December 2021, 15:23:44 UTC	postgresql: Add tests for db_to_date.	09 December 2021, 15:23:44 UTC
7cb4128	Antoine Lambert	08 December 2021, 11:09:04 UTC	proxies/retry: Remove no longer needed tenacity workarounds Now that we have packaged tenacity 6.2 for debian buster and use it in production, we can remove the workarounds to support tenacity < 5.	08 December 2021, 11:09:15 UTC
615fb99	Antoine Lambert	07 December 2021, 12:36:28 UTC	test_cassandra: Fix failing tests since swh-model update Directory entries are now checked for name duplicates in swh-model so we must ensure the CrashyEntry class is properly initialized. Closes T3776	07 December 2021, 12:36:43 UTC
ba105df	David Douard	16 November 2021, 13:40:04 UTC	d/changelog: version 0.40.0-2~swh1	17 November 2021, 08:46:19 UTC
9af9b26	David Douard	16 November 2021, 13:39:25 UTC	Update dependencies in d/control	17 November 2021, 08:44:58 UTC
0b1b158	Jenkins for Software Heritage	16 November 2021, 09:45:39 UTC	Updated debian changelog for version 0.40.0	16 November 2021, 09:45:39 UTC
1b6c7fc	Jenkins for Software Heritage	16 November 2021, 09:45:38 UTC	Update upstream source from tag 'debian/upstream/0.40.0' Update to upstream version '0.40.0' with Debian dir 0decf5aa20880a80efe3b342417439b86249cc5f	16 November 2021, 09:45:38 UTC
d8994d5	Jenkins for Software Heritage	16 November 2021, 09:45:37 UTC	New upstream version 0.40.0	16 November 2021, 09:45:37 UTC
850a755	David Douard	27 October 2021, 15:31:18 UTC	Add support for a redis-based reporting for invalid mirrorred objects The idea is that we check the BaseModel validity at journal deserialization time so that we still have access to the raw object from kafka for complete reporting (object id plus raw message from kafka). This uses a new ModelObjectDeserializer class that is responsible for deserializing the kafka message (still using kafka_to_value) then immediately create the BaseModel object from that dict. Its `convert` method is then passed as `value_deserializer` argument of the `JournalClient`. Then, for each deserialized object from kafka, if it's a HashableObject, check its validity by comparing the computed hash with its id. If it's invalid, report the error in logs, and if configured, register the invalid object in via the `reporter` callback. In the cli code, a `Redis.set()` is used a such a callback (if configured). So it simply stores invalid objects using the object id a key (typically its swhid), and the raw kafka message value as value. Related to T3693.	09 November 2021, 15:36:34 UTC
04bd15a	David Douard	27 October 2021, 14:56:53 UTC	Refactor fixer.fix_objects() to extract the inner object_fixers dict allowing to use this dict independently of the fix_objects() function.	09 November 2021, 15:36:34 UTC
d655c85	David Douard	27 October 2021, 14:55:03 UTC	Remove now useless fixers keep the the fix_objects() function for bw compat for now.	09 November 2021, 15:36:34 UTC
55eed77	David Douard	26 October 2021, 14:38:26 UTC	Add a --type option to 'swh storage replay' allows to choose replayed object types from the cli.	09 November 2021, 15:36:34 UTC
0262f1c	Valentin Lorentz	21 October 2021, 11:09:32 UTC	Update extrinsic-metadata-specification.rst to match the current implementation * merged origin and artifact metadata * added metametadata * uses structures instead of dict * removed raw_extrinsic_metadata_get_latest	09 November 2021, 14:47:31 UTC
467aa27	Jenkins for Software Heritage	29 October 2021, 09:17:36 UTC	Updated debian changelog for version 0.39.0	29 October 2021, 09:17:36 UTC
8c63421	Jenkins for Software Heritage	29 October 2021, 09:17:35 UTC	Update upstream source from tag 'debian/upstream/0.39.0' Update to upstream version '0.39.0' with Debian dir fc10c75ae5be9e40348d539c00fc26603ce1ecd6	29 October 2021, 09:17:35 UTC
3ce052a	Jenkins for Software Heritage	29 October 2021, 09:17:33 UTC	New upstream version 0.39.0	29 October 2021, 09:17:33 UTC
a5bfe5b	Antoine Lambert	27 October 2021, 16:26:20 UTC	interface: Add origin_snapshot_get_all method It enables to return in an efficient way the list of unique snapshot identifiers resulting from the visits of an origin. Previously it was required to query all visits of an origin then query all visit statuses for each visit to extract such information. Introduced method enables to extract origin snaphots information in a single datase query. Related to T3631	28 October 2021, 12:23:14 UTC
49a932c	Antoine Lambert	21 October 2021, 15:12:32 UTC	algos/revisions_walker: Handle case of revision without committer date Some revisions in the archive do not have committer date so workaround it to avoid errors when walking on such revisions when using the class CommitterDateRevisionsWalker.	22 October 2021, 09:32:28 UTC
c02be8e	Antoine Lambert	21 October 2021, 14:50:57 UTC	test_revisions_walker: Migrate from unittest to pytest	21 October 2021, 14:50:57 UTC
a986710	Valentin Lorentz	18 October 2021, 11:00:00 UTC	cassandra: Fix incomplete check of content existence in object_find_by_sha1_git content_missing_by_sha1_git only checks the index and not the main table. This is incorrect, because contents should not be considered written before an entry is written to the main table, even if an entry exists in one of the indexes.	18 October 2021, 11:10:54 UTC
b639a49	Jenkins for Software Heritage	11 October 2021, 14:58:23 UTC	Updated debian changelog for version 0.38.0	11 October 2021, 14:58:23 UTC
e86eafe	Jenkins for Software Heritage	11 October 2021, 14:58:22 UTC	Update upstream source from tag 'debian/upstream/0.38.0' Update to upstream version '0.38.0' with Debian dir fb3ad9c54e04b857bee1ba72970de66ed96cae17	11 October 2021, 14:58:22 UTC
8567342	Jenkins for Software Heritage	11 October 2021, 14:58:20 UTC	New upstream version 0.38.0	11 October 2021, 14:58:20 UTC
e9fd74d	Valentin Lorentz	07 October 2021, 09:22:43 UTC	serializers: Prepare rename of 'identifiers_enum' to 'swhids_enum'. This will be done in three steps to avoid any disruption: 1. (this step) add support for decoding the new name, but keep encoding as the old one 2. start encoding as the new name 3. remove support for decoding the old name	11 October 2021, 11:58:23 UTC
ea86a86	Valentin Lorentz	07 October 2021, 09:18:45 UTC	Rename imports of swh.model.identifiers to fix deprecation warnings.	11 October 2021, 11:58:23 UTC
3441f68	Nicolas Dandrimont	08 October 2021, 13:55:29 UTC	buffer: add some debug logging for number of objects sent	08 October 2021, 14:53:27 UTC
b604014	Nicolas Dandrimont	08 October 2021, 13:44:42 UTC	buffer: add a threshold for the estimated size of revision and release batches The size of individual revisions and releases is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of revisions or releases with large messages or other metadata. The duration of the database operations for revision_add or release_add is also commensurate to the size of the objects added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on estimated sizes for batches of revision and release objects makes this overuse of memory and of database transaction time much less likely.	08 October 2021, 14:53:27 UTC
7c5b0ec	Nicolas Dandrimont	08 October 2021, 13:13:59 UTC	buffer: add a threshold for the number of revision parents in one batch The size of individual revisions is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of revisions with extensive histories. The duration of the database operation for revision_add is also commensurate to the number of revision parents added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on cumulated number of revision parents per batch makes this overuse of memory and of database transaction time much less likely.	08 October 2021, 13:42:35 UTC
5edc0ba	Nicolas Dandrimont	08 October 2021, 12:23:01 UTC	buffer: add a threshold for the number of directory entries in one batch The size of individual directories is essentially unbounded. This means that, when the buffer storage is used as a way of limiting memory use for an ingestion process, it is still possible to go beyond the expected memory use when adding a batch of (very) large directories. The duration of the database operation for directory_add is also commensurate to the number of directory entries added in a batch, so using the buffer proxy to limit the time individual database operations takes was not effective. Adding a threshold on cumulated number of directory entries per batch makes this overuse of memory and of database transaction time much less likely.	08 October 2021, 13:13:05 UTC
abe95b3	Nicolas Dandrimont	06 October 2021, 16:26:04 UTC	filter: add filtering for release_add	08 October 2021, 12:11:32 UTC
c52b7b6	Nicolas Dandrimont	06 October 2021, 16:25:21 UTC	filter: do not call the underlying functions if there's nothing to add	08 October 2021, 10:20:22 UTC
5d5d4c9	Nicolas Dandrimont	06 October 2021, 16:21:26 UTC	buffer: Ensure that we don't send data from empty buffers This was already the case (as grouper called on an empty iterator just returns no batches), but add a test to enforce it.	08 October 2021, 09:45:25 UTC
91091fe	Jenkins for Software Heritage	29 September 2021, 10:12:31 UTC	Updated debian changelog for version 0.37.1	29 September 2021, 10:12:31 UTC
4ec651f	Jenkins for Software Heritage	29 September 2021, 10:12:30 UTC	Update upstream source from tag 'debian/upstream/0.37.1' Update to upstream version '0.37.1' with Debian dir 1d9eff5555cc4c457089da74ffcf0997ef88b7c2	29 September 2021, 10:12:30 UTC
4c0ea11	Jenkins for Software Heritage	29 September 2021, 10:12:28 UTC	New upstream version 0.37.1	29 September 2021, 10:12:28 UTC
113088a	David Douard	29 September 2021, 08:59:12 UTC	replay: add type annotation for process_replay_objects()	29 September 2021, 09:01:57 UTC
9a3589f	David Douard	28 September 2021, 15:13:26 UTC	replay: fix raw_extrinsic_metadata insertion and type annotation due to missing type annotation of the storage argument of _insert_objects(), we missed a bug in the processing of raw_extrinsic_metadata objects, passing set() as arguments of storage add methods.	29 September 2021, 08:48:13 UTC
21aff2d	David Douard	28 September 2021, 15:04:18 UTC	replay: fix annotation of collision_aware_content_add() now the callable is expected to return a dict.	29 September 2021, 08:48:13 UTC
8ba232c	Valentin Lorentz	28 September 2021, 14:11:55 UTC	Fix support of swh-model 3.0.0	28 September 2021, 15:59:20 UTC
42bad90	Valentin Lorentz	27 September 2021, 13:30:14 UTC	postgresql: Don't raise StorageArgumentException in case of write conflicts	27 September 2021, 13:30:14 UTC
ec548ee	Raphaël Gomès	22 September 2021, 14:18:05 UTC	Add bazaar as supported revision type This has a corresponding change in swh.model	23 September 2021, 08:55:35 UTC
61e9e4a	Valentin Lorentz	15 September 2021, 13:13:09 UTC	cassandra: Make _content_get_from_hashes run concurrently This is used by directory_ls and content_get.	21 September 2021, 16:14:28 UTC
59e63db	Antoine Lambert	21 September 2021, 10:25:25 UTC	postgresql: Fix regression introduced in previous commit Methods snapshot_get and snapshot_get_branches should return None if the snapshot does not exist in the archive. Add missing tests to cover that case.	21 September 2021, 10:25:42 UTC
9465054	Antoine Lambert	16 September 2021, 12:08:33 UTC	postgresql: Fix get_snapshot_branches return value for empty search When searching for branches in an existing snapshot, a PartialBranches object must be returned regardless the number of found branches. None should only be returned when a snapshot does not exist. This fixes an inconsistency between the postgresql and cassandra backends. Related to T3413	16 September 2021, 12:09:36 UTC
3da47c9	Antoine R. Dumont (@ardumont)	16 September 2021, 07:53:26 UTC	d/changelog: Bump new release Related to T3578	16 September 2021, 07:53:26 UTC
0db1105	Antoine R. Dumont (@ardumont)	16 September 2021, 07:40:00 UTC	d/control: Update missing deps Related to T3578	16 September 2021, 07:40:00 UTC
5abd216	Jenkins for Software Heritage	16 September 2021, 06:42:23 UTC	Updated debian changelog for version 0.37.0	16 September 2021, 06:42:23 UTC
9263b69	Jenkins for Software Heritage	16 September 2021, 06:42:21 UTC	Update upstream source from tag 'debian/upstream/0.37.0' Update to upstream version '0.37.0' with Debian dir 304ad56e032c6f722cf702178ece7d62e3dcef12	16 September 2021, 06:42:21 UTC
cb10244	Jenkins for Software Heritage	16 September 2021, 06:42:19 UTC	New upstream version 0.37.0	16 September 2021, 06:42:19 UTC
a9fde72	Antoine R. Dumont (@ardumont)	13 September 2021, 12:25:34 UTC	Allow filtering extids per extid_version/extid_type when reading This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints. Whe extid_version/extid_type are not provided, this keeps the existing behavior of returning all extids matching. Related to T3567	15 September 2021, 16:34:59 UTC
589d20e	Valentin Lorentz	14 September 2021, 09:15:34 UTC	migrate_extrinsic_metadata: Fix missing f-stringification	14 September 2021, 09:15:41 UTC
1c8337f	Valentin Lorentz	10 September 2021, 17:25:51 UTC	migrate_extrinsic_metadata: Fix crash on deposit hal-02355563	10 September 2021, 17:25:51 UTC
3315738	Valentin Lorentz	10 September 2021, 17:25:03 UTC	migrate_extrinsic_metadata: Fix remaining pypi issues All packages now pass	10 September 2021, 17:25:03 UTC
8e94afa	Valentin Lorentz	10 September 2021, 16:17:56 UTC	migrate_extrinsic_metadata: Fix off-by-one error, causing the first_id to be skipped	10 September 2021, 16:17:56 UTC
5facf66	Valentin Lorentz	09 September 2021, 13:45:30 UTC	cassandra: Make directory_ls fetch contents in batch instead of one-by-one This should make it run up to 100 times faster, even on average directories.	09 September 2021, 13:45:30 UTC
0570a42	Valentin Lorentz	09 September 2021, 13:30:09 UTC	content_get: Fetch rows concurrently Instead of fetching them one-by-one, with the very high latency this entails. This is preliminary work to make `directory_ls` less painfully slow.	09 September 2021, 13:43:42 UTC
50fb54f	Valentin Lorentz	09 September 2021, 09:35:22 UTC	directory_entry_add_batch: Remove the temporary prepared statement entirely And fall back to concurrent insertion.	09 September 2021, 09:35:22 UTC
da7e63e	Valentin Lorentz	08 September 2021, 09:56:49 UTC	directory_entry_add_batch: Reduce churn of prepared statements By reusing the 'steady state' main statement (which is quite large) across calls.	08 September 2021, 09:56:49 UTC
fc950de	Valentin Lorentz	26 August 2021, 09:08:15 UTC	cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient	08 September 2021, 09:54:57 UTC
7dc2863	Valentin Lorentz	06 September 2021, 12:45:40 UTC	migrate_extrinsic_metadata: Add an option to limit the number of revisions This will be used as a second pass on objects that failed with older versions of the script.	06 September 2021, 12:45:40 UTC
834a49d	Valentin Lorentz	03 September 2021, 12:56:15 UTC	test_directory_get_entries_pagination: don't depend on result order	03 September 2021, 12:56:15 UTC
e8aad0f	Valentin Lorentz	27 August 2021, 09:45:18 UTC	cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now.	31 August 2021, 08:41:48 UTC
3ad1bec	Vincent SELLIER	30 August 2021, 14:55:57 UTC	postgresql: Fix a column order mismatch between the query and object builder resulting in OriginVisitStatus trying to put a snapshot id in the metadata field Related to T3539	30 August 2021, 15:39:42 UTC
999ea6b	Vincent SELLIER	30 August 2021, 15:25:59 UTC	cassandra: generate statsd metrics on method calls Related to T3517	30 August 2021, 15:25:59 UTC
47a6919	Valentin Lorentz	27 August 2021, 09:32:03 UTC	Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster.	27 August 2021, 11:31:37 UTC
ffe636f	Jenkins for Software Heritage	24 August 2021, 15:01:32 UTC	Updated debian changelog for version 0.36.0	24 August 2021, 15:01:32 UTC
6924a7e	Jenkins for Software Heritage	24 August 2021, 15:01:30 UTC	Update upstream source from tag 'debian/upstream/0.36.0' Update to upstream version '0.36.0' with Debian dir 179c1ad6d3ce02e0f64d5944d38e3e3d48e86d89	24 August 2021, 15:01:30 UTC
3a224a9	Jenkins for Software Heritage	24 August 2021, 15:01:29 UTC	New upstream version 0.36.0	24 August 2021, 15:01:29 UTC
b110d1b	Nicolas Dandrimont	24 August 2021, 14:38:15 UTC	Add cvs as supported revision_type	24 August 2021, 14:39:03 UTC
8f1cdf6	Valentin Lorentz	20 August 2021, 18:11:51 UTC	Add test for origin_visit_get_latest in presence of mismatched id and date orders It was unclear this actually worked; I had to write this test to realize the code wasn't buggy. Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing.	24 August 2021, 13:14:39 UTC
cf880db	Valentin Lorentz	20 August 2021, 16:12:26 UTC	cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id.	24 August 2021, 13:14:39 UTC
54b5abf	Valentin Lorentz	20 August 2021, 11:52:17 UTC	cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.	24 August 2021, 13:14:39 UTC
7113198	Vincent SELLIER	24 August 2021, 11:52:32 UTC	backfill: add extra where clause to use the right index for extid requests Related to T3485	24 August 2021, 11:52:32 UTC
ae70564	Jenkins for Software Heritage	20 August 2021, 10:01:16 UTC	Updated debian changelog for version 0.35.1	20 August 2021, 10:01:16 UTC
ca5ee3d	Jenkins for Software Heritage	20 August 2021, 10:01:15 UTC	Update upstream source from tag 'debian/upstream/0.35.1' Update to upstream version '0.35.1' with Debian dir 313295a88c0c4c4f7c924c7c36bea4c310f2cbca	20 August 2021, 10:01:15 UTC
1c038f0	Jenkins for Software Heritage	20 August 2021, 10:01:14 UTC	New upstream version 0.35.1	20 August 2021, 10:01:14 UTC

Newer
Older