a9fde72 | Antoine R. Dumont (@ardumont) | 13 September 2021, 12:25:34 UTC | Allow filtering extids per extid_version/extid_type when reading This impacts both the `extid_get_from_extid` and `extid_get_from_target` endpoints. Whe extid_version/extid_type are not provided, this keeps the existing behavior of returning all extids matching. Related to T3567 | 15 September 2021, 16:34:59 UTC |
589d20e | Valentin Lorentz | 14 September 2021, 09:15:34 UTC | migrate_extrinsic_metadata: Fix missing f-stringification | 14 September 2021, 09:15:41 UTC |
1c8337f | Valentin Lorentz | 10 September 2021, 17:25:51 UTC | migrate_extrinsic_metadata: Fix crash on deposit hal-02355563 | 10 September 2021, 17:25:51 UTC |
3315738 | Valentin Lorentz | 10 September 2021, 17:25:03 UTC | migrate_extrinsic_metadata: Fix remaining pypi issues All packages now pass | 10 September 2021, 17:25:03 UTC |
8e94afa | Valentin Lorentz | 10 September 2021, 16:17:56 UTC | migrate_extrinsic_metadata: Fix off-by-one error, causing the first_id to be skipped | 10 September 2021, 16:17:56 UTC |
5facf66 | Valentin Lorentz | 09 September 2021, 13:45:30 UTC | cassandra: Make directory_ls fetch contents in batch instead of one-by-one This should make it run up to 100 times faster, even on average directories. | 09 September 2021, 13:45:30 UTC |
0570a42 | Valentin Lorentz | 09 September 2021, 13:30:09 UTC | content_get: Fetch rows concurrently Instead of fetching them one-by-one, with the very high latency this entails. This is preliminary work to make `directory_ls` less painfully slow. | 09 September 2021, 13:43:42 UTC |
50fb54f | Valentin Lorentz | 09 September 2021, 09:35:22 UTC | directory_entry_add_batch: Remove the temporary prepared statement entirely And fall back to concurrent insertion. | 09 September 2021, 09:35:22 UTC |
da7e63e | Valentin Lorentz | 08 September 2021, 09:56:49 UTC | directory_entry_add_batch: Reduce churn of prepared statements By reusing the 'steady state' main statement (which is quite large) across calls. | 08 September 2021, 09:56:49 UTC |
fc950de | Valentin Lorentz | 26 August 2021, 09:08:15 UTC | cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient | 08 September 2021, 09:54:57 UTC |
7dc2863 | Valentin Lorentz | 06 September 2021, 12:45:40 UTC | migrate_extrinsic_metadata: Add an option to limit the number of revisions This will be used as a second pass on objects that failed with older versions of the script. | 06 September 2021, 12:45:40 UTC |
834a49d | Valentin Lorentz | 03 September 2021, 12:56:15 UTC | test_directory_get_entries_pagination: don't depend on result order | 03 September 2021, 12:56:15 UTC |
e8aad0f | Valentin Lorentz | 27 August 2021, 09:45:18 UTC | cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. | 31 August 2021, 08:41:48 UTC |
3ad1bec | Vincent SELLIER | 30 August 2021, 14:55:57 UTC | postgresql: Fix a column order mismatch between the query and object builder resulting in OriginVisitStatus trying to put a snapshot id in the metadata field Related to T3539 | 30 August 2021, 15:39:42 UTC |
999ea6b | Vincent SELLIER | 30 August 2021, 15:25:59 UTC | cassandra: generate statsd metrics on method calls Related to T3517 | 30 August 2021, 15:25:59 UTC |
47a6919 | Valentin Lorentz | 27 August 2021, 09:32:03 UTC | Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. | 27 August 2021, 11:31:37 UTC |
b110d1b | Nicolas Dandrimont | 24 August 2021, 14:38:15 UTC | Add cvs as supported revision_type | 24 August 2021, 14:39:03 UTC |
8f1cdf6 | Valentin Lorentz | 20 August 2021, 18:11:51 UTC | Add test for origin_visit_get_latest in presence of mismatched id and date orders It was unclear this actually worked; I had to write this test to realize the code wasn't buggy. Also replaced a conditional that is always False (because Cassandra always returns results in the order of the clustering key) with an assertion, so the code is less confusing. | 24 August 2021, 13:14:39 UTC |
cf880db | Valentin Lorentz | 20 August 2021, 16:12:26 UTC | cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id. | 24 August 2021, 13:14:39 UTC |
54b5abf | Valentin Lorentz | 20 August 2021, 11:52:17 UTC | cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys. | 24 August 2021, 13:14:39 UTC |
7113198 | Vincent SELLIER | 24 August 2021, 11:52:32 UTC | backfill: add extra where clause to use the right index for extid requests Related to T3485 | 24 August 2021, 11:52:32 UTC |
9f00eb9 | Valentin Lorentz | 06 August 2021, 12:59:29 UTC | cassandra: Fix crash when using _missing() functions with more than 100 ids with ScyllaDB. | 06 August 2021, 12:59:29 UTC |
912d04e | Antoine R. Dumont (@ardumont) | 27 July 2021, 14:58:02 UTC | sql: Adapt extid.extid_version comment | 27 July 2021, 14:58:02 UTC |
7a38045 | Nicolas Dandrimont | 23 July 2021, 14:13:08 UTC | Implement storage of the ExtID.extid_version field This fields allows having multiple version of the ExtID -> SWHID mapping, for instance when the implementation of a loader changes in a backwards-incompatible way. For now, we don't change the API used to query or store ExtIDs. When querying for the SWHIDs corresponding to a given external objects, all versions are returned, and the client is expected to do the filtering. | 23 July 2021, 15:37:12 UTC |
9747aed | Vincent SELLIER | 06 July 2021, 14:54:57 UTC | cassandra: Allow to configure the consistency level to use The default ONE level is used to keep the previous behaviour Related to T3396 | 07 July 2021, 12:26:47 UTC |
f1cac4f | Valentin Lorentz | 28 June 2021, 15:28:15 UTC | postgresql: Add type annotation for 'db' argument This allows mypy to actually type-check calls to db methods. This commit also fixes an issue found by mypy. | 28 June 2021, 15:28:15 UTC |
dd8a590 | Valentin Lorentz | 28 June 2021, 15:21:18 UTC | --amend | 28 June 2021, 15:21:18 UTC |
c5beb49 | Valentin Lorentz | 28 June 2021, 13:30:41 UTC | Add endpoint raw_extrinsic_metadata_get_authorities This will make it easier for users of swh-web to discover metadata on a given SWHID, as you otherwise need to specify an authority to fetch metadata. | 28 June 2021, 13:30:41 UTC |
ec2fac4 | Valentin Lorentz | 25 June 2021, 15:26:53 UTC | cassandra: Add support for non-ASCII origin 'URLs'. We agreed a while ago they are IRIs, and we have some of them in the postgresql database already. | 25 June 2021, 15:26:53 UTC |
47575a6 | Valentin Lorentz | 14 June 2021, 15:06:18 UTC | Add endpoints to access REMD by id This will be used by swh-web to allow downloading them from a non-JSON endpoint. | 15 June 2021, 13:08:23 UTC |
036d227 | Antoine Lambert | 09 June 2021, 12:58:43 UTC | mypy: Fix errors with release >= v0.900 | 09 June 2021, 12:58:43 UTC |
1d880a5 | Valentin Lorentz | 18 May 2021, 13:35:41 UTC | cassandra: Add partial support for ScyllaDB All features work but snapshot_count_branches, because ScyllaDB does not support user-defined aggregates yet. Migration tests hang when run after the regular tests, but I can't figure out why. This should not be an issue for now, as we won't run Scylla tests on the CI. | 21 May 2021, 10:14:43 UTC |
8e3731a | Antoine R. Dumont (@ardumont) | 21 May 2021, 07:32:58 UTC | Finalize the config "local" deprecation in favor of "postgresql" This will remove further deprecation warnings from the tests, especially the ones from other modules depending on the storage's pytest-plugin. This also fixes some edge case configuration for the backfill and the storage rpc backend which would have been broken if we switched to that new name prior to this. Related to b487a21f | 21 May 2021, 07:38:55 UTC |
a92a968 | Valentin Lorentz | 18 May 2021, 13:34:17 UTC | tests: Make test parameters order deterministic, so they don't crash pytest-xdist pytest-xdist expects the parameters to be in the same order in all processes. | 19 May 2021, 08:49:09 UTC |
5a8d605 | Valentin Lorentz | 18 May 2021, 13:33:19 UTC | test_cassandra: Improve error when the process is started but not listening | 19 May 2021, 08:49:01 UTC |
0ed4a97 | David Douard | 18 May 2021, 10:56:15 UTC | Make the TenaciousProxyStorage also handle content_add_metadata | 18 May 2021, 10:56:15 UTC |
53c21d4 | Nicolas Dandrimont | 14 May 2021, 16:31:00 UTC | Add missing schema migration for swh_directory_get_entries | 14 May 2021, 16:31:00 UTC |
f328367 | Valentin Lorentz | 10 May 2021, 19:46:50 UTC | content_get: Add support for queries by sha1_git Before this commit, the only way to get Content objects from their sha1_git was to call content_find for each object. This was obviously neither convenient nor efficient. Using this endpoint to batch calls reduces the runtime of the git-bare vault cooker by 30%. | 11 May 2021, 12:36:30 UTC |
e3cbd5e | Valentin Lorentz | 10 May 2021, 14:12:05 UTC | Add endpoint directory_get_entries, to quickly list a directory's entries It spares a join with the content table, which should hopefully make the vault (and possibly other users) faster when they don't need this join. | 11 May 2021, 10:00:27 UTC |
f140f63 | Valentin Lorentz | 10 May 2021, 12:13:20 UTC | cassandra: Add tests checking directory_add and snapshot_add are atomic. | 11 May 2021, 08:22:23 UTC |
b487a21 | David Douard | 10 May 2021, 12:56:44 UTC | Deprecate the "local" storage cls in favor of "postgresql" | 10 May 2021, 12:56:44 UTC |
9105253 | David Douard | 10 May 2021, 12:55:07 UTC | Move all proxy storages in swh/storage/proxies/ to clean a bit the swh.storage namespace. | 10 May 2021, 12:55:07 UTC |
7617099 | David Douard | 05 May 2021, 09:43:09 UTC | Make the TenaciousProxyStorage retry when a single object add fails give a chance to one-object batches to be ingested, and reduce the number of objects wrongly reported as non-ingested, e.g. during a replayer session, where this situation can occur. | 07 May 2021, 11:46:00 UTC |
35ae94a | Valentin Lorentz | 06 May 2021, 12:23:09 UTC | Use swh.core 0.14 It renamed db_name to dbname, which is a breaking change. | 06 May 2021, 12:23:09 UTC |
652e3d5 | Valentin Lorentz | 06 May 2021, 09:56:32 UTC | tenacious: Document potential issues about objects being dropped | 06 May 2021, 09:56:32 UTC |
e170fb2 | Valentin Lorentz | 04 May 2021, 14:04:38 UTC | Stop storing authority/fetcher metadata. We still don't have a use for them, and they are causing issues; such as being unable to add an authority/fetcher based only on a REMD object, which is needed by the replayer. | 05 May 2021, 10:54:04 UTC |
77ef651 | David Douard | 04 May 2021, 14:06:02 UTC | Make postgresql's origin_add not raise an error in case of conflict there is no need for an url insertion in the origin table to result in a unicity error. Conflicting insertion of the same URL in this table may happen in case of concurrent process (loading or in a replayer session). | 05 May 2021, 10:18:44 UTC |
ffb38f7 | David Douard | 18 June 2020, 16:41:24 UTC | Add a new TenaciousProxyStorage This proxy storage attempt to add buckets of objects, but in case of failure, it splits the bucket in parts so every valid object in the bucket get a chance to be inserted. Also provides an error rate-limiting feature. This proxy storage is mainly dedicated to help mirrorring an archive using the replayer stack. | 05 May 2021, 09:57:58 UTC |
051b771 | Valentin Lorentz | 23 April 2021, 08:43:27 UTC | cassandra: Add a test of a 'complex' migration, with a PK update | 03 May 2021, 15:40:37 UTC |
f233461 | Valentin Lorentz | 22 April 2021, 18:27:33 UTC | cassandra: Add 'check_missing' option, to allow updating objects as part of a migration. Also write a first test that simulates how a simple migration would go. | 03 May 2021, 15:40:36 UTC |
92d551a | David Douard | 29 April 2021, 10:00:24 UTC | Normalize all Storage.xxx_add() methods to return a summary but origin_visit_add() which requires more work to do so. Note that this will change the way 'raw_extrinsinc_metadata_add()' report statsd metrics: the 'method_name' tag will now remain 'raw_extrinsic_metadata_add' instead of a forged '<type_name>_metadata_add'. | 29 April 2021, 10:40:33 UTC |
ff7ecb4 | David Douard | 28 April 2021, 15:15:25 UTC | Properly annotate output of Storage.xxx_add() methods as Dict[str, int] when applicable. | 29 April 2021, 10:03:10 UTC |
98804f9 | David Douard | 28 April 2021, 10:06:19 UTC | Add a fixer for ExtrinsicRawMetadata the 'type' attribute has been removed in swh.model v1.0.0 in favor of an ExtendedSWHID 'target'. | 28 April 2021, 12:12:22 UTC |
615d719 | Antoine Lambert | 26 April 2021, 16:09:05 UTC | tox: Add sphinx environments to check sane doc build Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258 | 27 April 2021, 11:57:23 UTC |
2c477ec | David Douard | 23 April 2021, 13:38:14 UTC | Fix storage_data hardcoded id values and add a test to check this stays accurate, so that these objects can pass throught the validate proxy storage, for example. | 23 April 2021, 13:45:35 UTC |
eb8c147 | Valentin Lorentz | 22 April 2021, 10:21:22 UTC | cassandra: Deduplicate table names This removes all table names from cassandra/cql.py, and gets them from cassandra/schema.py instead. When possible, this uses existing constants (BaseRow.TABLE), otherwise it uses a function to compute these names. This is needed to support schema migrations, as updating a table's primary key requires creating a new table with a different name. | 22 April 2021, 15:22:18 UTC |
a1fc5fb | Valentin Lorentz | 15 April 2021, 13:54:20 UTC | cassandra: Use prepared statements in extid_index_* All other statements are, and there is no reason for them not to be too | 15 April 2021, 13:56:32 UTC |
3b00e3a | Valentin Lorentz | 13 April 2021, 19:50:50 UTC | Fix various Sphinx warnings | 15 April 2021, 08:19:23 UTC |
b999952 | Antoine Lambert | 14 April 2021, 16:39:49 UTC | sql/Makefile: Also call dropdb prior createdb when using pifpaf Now that PGDATABASE value from pifpaf is used, that call is now needed otherwise the overall swh doc build in development mode fails. | 14 April 2021, 16:41:20 UTC |
1bacea5 | Valentin Lorentz | 13 April 2021, 15:12:13 UTC | docs: Fix db-schema.svg generation to use pifpaf-created database This makes 'tox -e sphinx-dev' not rely on the existence of the database on the system. | 13 April 2021, 15:12:13 UTC |
c96942b | KShivendu | 05 April 2021, 11:39:33 UTC | Cassandra: Deduplicate lists passed to *_add endpoints Previously only release_add supported deduplication. This commit aligns other _add endpoints with it | 12 April 2021, 11:27:22 UTC |
933289e | Antoine Lambert | 09 April 2021, 13:07:35 UTC | Remove last references to no longer used SQLAlchemy package | 09 April 2021, 13:07:35 UTC |
50becef | Antoine Lambert | 09 April 2021, 11:37:40 UTC | docs: Fix db-schema.svg inclusion when building full swh documentation The image was correctly included when building standalone swh-storage documentation but was not when building the full swh one. Closes T3227 | 09 April 2021, 11:37:53 UTC |
ccaac11 | Valentin Lorentz | 07 April 2021, 12:20:01 UTC | migrate_extrinsic_metadata: Allow 'atom:title' as alternative to 'title' Some revisions use it instead. | 07 April 2021, 12:20:19 UTC |
39507b2 | David Douard | 02 April 2021, 14:10:28 UTC | Make the replayer drop the Revision.metadata this attribute is deprecated and on the verge of being replaced by RawExtrinsicMetadata objects, and the kafka journal currently in production contains a few invalid metadata entries that makes the replayer unhappy. Closes T3201. | 06 April 2021, 14:31:49 UTC |
84dcbe3 | David Douard | 02 April 2021, 10:56:53 UTC | Merge test_replay's _check_replayed and check_replayed in a single function | 06 April 2021, 14:01:37 UTC |
36a7fd3 | David Douard | 06 April 2021, 13:57:40 UTC | Fix pg Storage.extid_add(): write ExtID objects to the journal and explicitely check for extid objects in the journal in TestStorage. | 06 April 2021, 14:01:01 UTC |
0a270d1 | Valentin Lorentz | 30 March 2021, 15:51:55 UTC | migrate_extrinsic_metadata: Filter out git revisions They can't have any extrinsic metadata, so fetching git revisions wastes a lot of time. | 30 March 2021, 15:51:55 UTC |
3309765 | Valentin Lorentz | 29 March 2021, 14:49:07 UTC | buffer: Add support for 'extid' Will be used by the extid migration script, and loaders can probably use it too. | 30 March 2021, 15:33:00 UTC |
cfb2417 | Valentin Lorentz | 26 March 2021, 15:03:27 UTC | extid: remove unicity on (extid_type, extid) and (target_type, target) It did not make sense for multiple reasons: 1. two extids can point to the same target (eg. extids with type git and git-sha256; or two package managers with different checksums) 2. inserting two objects with the same target or extid in a single call actually wrote both, but would crash when reading 3. inserting extid1 then extid2 would write both to Kafka, but only extid1 would be inserted. When replaying on a new DB, extid2 may be inserted and extid1 ignored Points 2 and 3 are simply fixable bugs, but 1 is an issue by design, and this commit fixes all of them at once. | 26 March 2021, 15:08:13 UTC |
ac6f642 | Valentin Lorentz | 26 March 2021, 14:17:12 UTC | origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing. | 26 March 2021, 14:30:43 UTC |
eff2383 | Valentin Lorentz | 05 February 2021, 13:33:49 UTC | raw_extrinsic_metadata: Make (target, authority_id, discovery_date, fetcher_id) non-unique Uniqueness is only based on the id from now on. Also adds the 'id' column to the Cassandra schema (it was already present in postgresql's schema) | 22 March 2021, 11:42:46 UTC |
2d540b0 | Valentin Lorentz | 05 February 2021, 12:56:15 UTC | Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id. | 22 March 2021, 08:53:16 UTC |
8dd9f7b | Valentin Lorentz | 15 March 2021, 13:35:03 UTC | Document the existing metadata formats | 15 March 2021, 14:59:07 UTC |
ffc0841 | Valentin Lorentz | 15 March 2021, 11:50:41 UTC | content_add: Write to the objstorage before the DB or Kafka Must add to the objstorage before the DB and journal. Otherwise: 1. in case of a crash the DB may "believe" we have the content, but we didn't have time to write to the objstorage before the crash 2. the objstorage mirroring, which reads from the journal, may attempt to read from the objstorage before we finished writing it This is already done in the postgresql backend unintentionally since 209de5dbaa127dacd114fbbd084f22632982eb77. This commit documents it, makes the cassandra backend behave that way too, and adds a test. | 15 March 2021, 11:55:29 UTC |
b565201 | Antoine Lambert | 05 March 2021, 15:33:29 UTC | storage: Allow to filter out branches by prefix when counting them Add an optional branch_name_exclude_prefix parameter to the snapshot_count_branches method of the Storage interface. It enables to filter out branches whose name starts with a given prefix when counting. The purpose is to get accurate counters in swh-web as pull request branches will be filtered out by default. Related to T2782 | 12 March 2021, 14:23:54 UTC |
93301a1 | Antoine Lambert | 02 March 2021, 13:42:57 UTC | storage: Add branch names filtering support in snapshot_get_branches Add optional branch_name_include_substring parameter to snapshot_get_branches, if provided only branches whose name contains the given substring will be returned. Add optional branch_name_exclude_prefix parameter to snapshot_get_branches, if provided branches whose name starts with the given prefix will not be returned. Purpose of these new features: add a search form in the branches view of swh-web and filter out pull request branches (whose names start with "refs/pull/") by default. Related to T2782 | 12 March 2021, 14:23:28 UTC |
b8e10f0 | David Douard | 09 December 2020, 15:57:44 UTC | Add ExtID query support to the Storage These endpoints allow to add and query the storage for known ExtID from SWHID (typically get original VCS' revision intrinsic identifier from SWHID). The underlying data structure is to be filled typically by loaders using the `extid_add()` endpoint. This only provides the Postgresql implementation. Related to T2849. | 11 March 2021, 13:20:18 UTC |
6a77732 | David Douard | 09 December 2020, 15:54:25 UTC | Add hg revisions to the test data set | 10 March 2021, 15:25:00 UTC |
e83452b | David Douard | 10 March 2021, 15:21:26 UTC | Import TEST_OBJECTS from swh.model instead of swh.journal this later has been deprecated for a while now. | 10 March 2021, 15:25:00 UTC |
82ce7bf | David Douard | 10 March 2021, 14:46:29 UTC | Make sure test_backfill does not depend on 2 dict keys being miraculously listed the same. | 10 March 2021, 14:49:48 UTC |
c4fdd6d | Nicolas Dandrimont | 09 March 2021, 16:45:06 UTC | Add support for raw_extrinsic_metadata in the replayer This also checks the basic raw_extrinsic_metadata codepaths in the backfiller tests. | 10 March 2021, 13:07:11 UTC |
53a58fa | Nicolas Dandrimont | 09 March 2021, 16:42:18 UTC | Add basic support for raw_extrinsic_metadata in the backfiller | 10 March 2021, 13:00:05 UTC |
89ae0a1 | Nicolas Dandrimont | 09 March 2021, 16:40:27 UTC | Add simple unit test for the backfill.byte_ranges function | 10 March 2021, 08:34:27 UTC |
0d785d2 | Nicolas Dandrimont | 09 March 2021, 16:43:51 UTC | Add support for reading RawExtrinsicMetadata with raw URL targets We convert the target attribute to a hashed ExtendedSWHID before returning the object. | 10 March 2021, 08:33:53 UTC |
88ff2c2 | Antoine Lambert | 03 March 2021, 15:20:39 UTC | postgresql: Ensure a minimum limit for the snapshot branches query With small limits (< 10), the snapshot branches query can degenerate into using the deduplication index on snapshot_branch (name, target, target_type), and the postgresql planner happily scans several hundred million rows. So ensure a minimum limit value of 10 before executing the query for optimal performances when a small branches_count value is provided to the snapshot_get_branches method of the Storage interface. Related to P966 | 03 March 2021, 16:49:20 UTC |
ce8335d | Valentin Lorentz | 03 March 2021, 07:44:46 UTC | Remove the remaining references to the deprecated SWHID class | 03 March 2021, 16:46:50 UTC |
f46244b | Antoine Lambert | 02 March 2021, 15:38:34 UTC | tests: Drop hypothesis < 6 requirement Ensure tests can be executed using hypothesis >= 6 by suppressing the function_scoped_fixture health check on tests that use a function scope fixture in combination with @given that does not need to be reset between individual hypothesis examples. | 03 March 2021, 10:53:08 UTC |
14739c5 | Valentin Lorentz | 01 March 2021, 16:38:44 UTC | RawExtrinsicMetadata: update to use the API in swh-model 1.0.0 | 01 March 2021, 16:38:44 UTC |
2388748 | Valentin Lorentz | 04 February 2021, 12:59:09 UTC | storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released. | 25 February 2021, 15:33:40 UTC |
f3ef6e6 | Antoine Lambert | 11 February 2021, 10:23:58 UTC | storage: Implement visit types filtering in origin_search method Enable to filter searched origins by visit types. Add a new optional visit_types parameter to origin_search method in StorageInterface. Implement visit types filtering in storage backends, an origin wil be returned if it has any of the requested visit types. This is clearly not designed to be used in production due to performance issues but rather in testing environments with small archive dataset. Related to T2869 | 19 February 2021, 10:36:29 UTC |
7b4c124 | Antoine R. Dumont (@ardumont) | 17 February 2021, 09:18:26 UTC | 167: Make the migration script unblocking | 17 February 2021, 09:18:26 UTC |
cc3eb4b | Nicolas Dandrimont | 16 February 2021, 16:02:43 UTC | Switch anonymized replayer test to use pytest parametrization This allows us to only read the kafka topics once instead of twice in the same tests, which is apparently a hard thing to do in a way compatible with both confluent-kafka 1.5 and 1.6. | 16 February 2021, 16:09:03 UTC |
e0e88b2 | Antoine R. Dumont (@ardumont) | 04 February 2021, 16:16:57 UTC | storage: Refactor OriginVisitStatus instantiation | 09 February 2021, 16:01:26 UTC |
d30ca93 | Antoine R. Dumont (@ardumont) | 04 February 2021, 16:11:18 UTC | db: Unify sql joins on origin_visit_status using "USING" | 09 February 2021, 16:01:26 UTC |
046fe57 | Antoine R. Dumont (@ardumont) | 04 February 2021, 16:08:33 UTC | storage.postgresql: Use origin_visit_status.type value as source This stops using the origin_visit.type as fallback values as now, the database has been migrated. So this makes the origin_visit_status.type a not nullable column. This also drops now redundant join instructions on origin_visit table when reading. Related to T2968 | 09 February 2021, 16:01:25 UTC |
51df58e | Antoine Lambert | 09 February 2021, 09:46:37 UTC | test_replay: Fix hang since confluent-kafka 1.6 release Side effect of the following commit in librdkafka 1.6: https://github.com/edenhill/librdkafka/commit/f418e0f721518d71ff533759698b647cb2e89b80 Tests was relying on a buggy behavior of the mocked kafka cluster: two subsequent consumers setup with the same group id should receive a different set of messages, rather than the same set of messages. Also explicitly commit messages once consumed. | 09 February 2021, 14:56:15 UTC |
b038383 | Valentin Lorentz | 04 February 2021, 13:29:38 UTC | postgresql: Fix dbversion() to return the max version instead of a random one. | 08 February 2021, 11:13:03 UTC |
efd8815 | Nicolas Dandrimont | 04 February 2021, 08:57:50 UTC | buffer: ensure objects are flushed in topological order This new integration test checks that, when flushing the buffer storage, the addition functions of the underlying storage backend are called in topological order (content, directory, revision, release then snapshot). This reduces the probability of "data consistency" regressions caused by the use of the buffering storage proxy alone. | 04 February 2021, 18:17:11 UTC |
1526107 | Nicolas Dandrimont | 04 February 2021, 13:24:50 UTC | Return an accurate summary from buffer's flush() method The earlier implementation would only return summary data from keys that existed in the last `_add` backend method run, rather than collating all the results. | 04 February 2021, 18:14:03 UTC |