swh:1:snp:eb70f1f85391e4b077c211bec36af0061c4bf937

sort by:
Revision Author Date Message Commit Date
77960ca in_memory: fix tie-breaking when two visits have the same date. swh-loader-core's tests depend on this behavior. 28 July 2020, 07:54:55 UTC
119d01e storage*: origin_visit_get_by -> Optional[OriginVisit] Related to T645 27 July 2020, 12:44:16 UTC
2d51be9 Rename object_metadata to raw_extrinsic_metadata. For consistency with the name in swh-model. 27 July 2020, 12:18:49 UTC
57e305e storage*: origin_visit_find_by_date -> Optional[OriginVisit] Related to T645 27 July 2020, 10:46:35 UTC
b31c304 algos.origin: Simplify origin_get_latest_visit_status function 27 July 2020, 09:56:33 UTC
5344a6f storage*: type origin_visit_get_latest endpoint result The endpoint returns an optional OriginVisit object instead of a dict: ``` def origin_visit_get_latest(...) -> Optional[OriginVisit] ``` It also fixes the in-memory storage implementation which filtered data too early. It only filtered on the latest origin visit status associated to the origin visit. So depending on filters, this could have been wrong. It was not much of a problem as there is no longer any direct clients of this api (they are using [1] now). [1] swh.storage.algos.origin.origin_get_latest_origin_visit_status function Related to T645 27 July 2020, 09:56:33 UTC
789972f metadata_{authority,fetcher}_add: Fix crash when the iterable argument is empty. 24 July 2020, 07:28:33 UTC
7e94767 storage*: origin_get(Iterable[str]) -> Iterable[Optional[Origin]] This: - drops the legacy behavior (no more input as list of dicts or even one dict). - aligns with other _get endpoints (only 1 iterable of identifiers as input, here the origin urls). - migrates towards returning an iterable of optional origin model objects (again the optional part is alignment with existing get endpoint) Related to T645 23 July 2020, 16:16:38 UTC
d8583eb storage*.origin_visit_get_random: Read model objects Related to T645 23 July 2020, 12:10:07 UTC
b2055f4 pgstorage: Drop unnecessary indirection from reading origin_visit It's a missing left-over from the migration to making the origin-visit immutable. 23 July 2020, 08:28:09 UTC
ccbd2e9 pytest_plugin: Make sample_data an object Note that this: - drops the no longer needed copy done by the StorageData instance (used by sample_data) since now it returned immutable BaseModel objects. - centralizes some left-over tests to use sample_data as well 22 July 2020, 10:41:08 UTC
67a909e pytest_plugin: Rename sample_data_model to sample_data Related to T2494 22 July 2020, 09:37:40 UTC
e005900 pytest_plugin: Drop sample_data in favor of sample_data_model Related to T2494 22 July 2020, 09:30:33 UTC
bbe840e storage_data: Expose snapshots as model objects Related to T2494 21 July 2020, 15:43:35 UTC
d0cf317 storage_data: Expose release as model objects Related to T2494 21 July 2020, 15:32:03 UTC
3be5327 storage_data: Expose origin_visits as model objects Related to T2494 21 July 2020, 15:21:20 UTC
bcc0aee storage_data: Expose origins as model objects Related to T2494 21 July 2020, 15:03:26 UTC
d4cd33c storage_data: Expose revisions as model objects Related to T2494 21 July 2020, 14:24:16 UTC
955b6e2 storage_data: Expose directories as directory model objects Related to T2494 21 July 2020, 13:53:49 UTC
95dbdf7 storage_data: Remove unused fixture data Less to maintain Related to T2494 21 July 2020, 13:53:49 UTC
98a87fe storage_data: Expose contents as content model object Related to T2494 21 July 2020, 13:53:49 UTC
a23b748 pytest_plugin: Drop unnecessary back and forth conversion This is preparatory work to incrementally migrate the sample_data fixture to use model objects directly. Related to T2494 21 July 2020, 13:53:49 UTC
6338ad2 Drop validate proxy The validate proxy was initially an helper to ease the transition from the use of dicts towards model objects in "*_add" production endpoints. It was not removed immediately and grew some behavior it should not have (notably revision conversion so the comparison within those related tests work). After finally migrated away from dicts within the tests, we can now drop it [1]. Note that this moves the extra revision conversion behavior from the validate proxy to those related tests. This extra step will also disappear when we finally move the "*_get" endpoints to return model objects as well. Note: - This drops fixture redefinitions in the process (introduced so we could have that validate proxy at the time). - Remove the "validate" keyword from the get_storage function (so no longer possible to instantiate one [2]) [1] T2994 [2] which, practically, is the case today, nothing runs on production with it. Related to T2499 21 July 2020, 11:25:24 UTC
e0152b0 157: Fix migration script a posteriori Data has been fixed in production 21 July 2020, 10:35:40 UTC
96b2636 tests: Convert left-over dicts to model objects Related to T2494 21 July 2020, 08:41:22 UTC
42ae56d test_storage: Migrate last storage to use model objects Related to T2494 20 July 2020, 20:23:09 UTC
d4f896e test_storage: test_origin: Use data model object Related to T2494 20 July 2020, 20:21:56 UTC
6bdfd85 test_storage: origin_metadata: Centralize objects within sample_data_model Related to T2494 20 July 2020, 19:44:09 UTC
a6f70c3 test_storage: content_metadata: Centralize objects within sample_data_model Related to T2494 20 July 2020, 19:43:25 UTC
c9e921e test_storage: test_object_find_by_sha1: Use data model object Related to T2494 20 July 2020, 15:59:00 UTC
caa7f79 test_storage: content_find: Use data model object Related to T2494 20 July 2020, 15:55:35 UTC
6453504 test_storage: stat_counters: Use data model object Related to T2494 20 July 2020, 15:55:35 UTC
cf80d3c test_storage: snapshot: Use data model object Related to T2494 20 July 2020, 14:22:35 UTC
4f10171 test_storage: origin_visit/origin_visit_status: Use data model object Related to T2494 20 July 2020, 13:36:05 UTC
1c1bef9 test_storage: revision/release: Drop no longer needed conditionals Related to T2494 20 July 2020, 12:52:37 UTC
cdf6f58 test_storage: origin: Use data model object Related to T2494 20 July 2020, 12:51:22 UTC
bbdd7ed tests: Drop deprecated storage.origin_add_one use This is no longer used anywhere. Related to T2494 20 July 2020, 11:44:08 UTC
4971c25 test_storage: release: Use data model object Related to T2494 20 July 2020, 11:20:59 UTC
a17c412 test_storage: revision: Use data model object Related to T2494 20 July 2020, 11:03:18 UTC
03c6e15 Rename 'deposit' authority type to 'deposit_client'. It makes more sense semantically; as the client is the authority not the deposit server. 20 July 2020, 11:00:22 UTC
87b1070 test_storage: directory: Use data model object Related to T2494 20 July 2020, 09:58:34 UTC
1a2b85f test_storage: Make swh_contents fixture generate content model objects Note that this will be improved upon after the storage migration. We can take a look at the swh.model.tests.generate_testdata.gen_contents and make it generate BaseContent objects directly if it's possible. This was not done here so the impacts is limited to storage. Related to T2494 20 July 2020, 09:09:08 UTC
99a28ad tests.generate_data_test: Remove dead code The storage fixtures now uses the swh.model.tests.generate_data_test instead Related to T2494 20 July 2020, 09:01:26 UTC
03e17c3 test_storage: skipped_content/content_missing: Use data model object Related to T2494 20 July 2020, 09:01:26 UTC
7131dcb Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict. 20 July 2020, 08:48:35 UTC
997ec1d test.storage: content_add_metadata: Use data model object Related to T2494 17 July 2020, 15:51:08 UTC
8c2d635 test.storage: content_add: Use data model object Related to T2494 17 July 2020, 15:16:20 UTC
2b239f0 test_cassandra: Use data model object Related to T2494 17 July 2020, 14:37:31 UTC
eb2bf8c test_db: Drop redundant test This is already tested through the test_storage scenario 17 July 2020, 12:50:34 UTC
04d25df test_cli: Use snapshot model object within test That commit is not so interesting. But at least we validate the snapshot is correct prior to sending it. Also that removes a bit duplicated storage configuration. Related to T2494 16 July 2020, 15:52:50 UTC
2d4f727 algos.test_origin: Use data model object and drop validate proxy use Related to T2494 16 July 2020, 15:16:36 UTC
97a0721 algos.test_snapshot: Use model objects from sample_data_model This opens up origin_visits and add new snapshots to the fixture. So we can reuse those. Related to T2494 16 July 2020, 13:34:23 UTC
b6971b5 pytest_plugin: Ensure fixture instantiates correctly Related to T2484 Should fix [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLDBASE/job/gbp-buildpackage/154/console 16 July 2020, 13:13:05 UTC
3abf6b3 pytest_plugin: Do not expose the validate proxy storage Only the storage module needs it. This should fix the debian jenkins build [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLDBASE/job/gbp-buildpackage/153/console 16 July 2020, 12:00:46 UTC
a688e82 test_revision_bw_compat: Use revision model object Related to T2494 16 July 2020, 10:08:33 UTC
21efe2a test_filter: Use model objects in tests and drop validate proxy 16 July 2020, 09:28:21 UTC
2ff4c6f test_buffer: Use model objects in tests and drop validate proxy 16 July 2020, 09:28:21 UTC
df45641 test_retry: Drop validate proxy when we can When we use the sample_data_model (almost all object types except the metadata ones), we can use a storage with no validate proxy. Depends on D3510 16 July 2020, 09:28:21 UTC
14b1648 test_retry: Use sample_data_model fixture to manipulate model objects 16 July 2020, 09:28:20 UTC
df3f46d pytest-plugin: Expose a sample_data_model fixture This is almost the same fixture as sample_data except: - it's BaseModel object instance within - not complete as we cannot convert yet the metadata objects (there is a diff pending which will allow it but right now we cannot). The next commits will use this fixture to allow the switch from dict to model objects. 16 July 2020, 09:28:20 UTC
8bc7944 pytest_plugin: Avoid fixture client to declare optional dependency Prior to this commit, this would make swh_storage_backend_config fixture clients need to declare an optional dependency on swh.journal. Otherwise, it would not work [1]. This commit fixes it by dropping this configuration in the main pytest plugin. It keeps the storage tests testing with that journal_writer collaborator though by declaring an override which still provides it. This fixes the debian build [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLDBASE/job/gbp-buildpackage/152/console 16 July 2020, 07:33:00 UTC
f5811da Allow cassandra binary path to be configured through env variable The current hard-coded value won't work for other distributions not relying on standard conventions (e.g. nixos...). This keeps the original behavior and only allow to diverge based on the environment variable SWH_CASSANDRA_BIN. This also: - fixes an issue on log path inexistence which raises. - renames the other env variable LOG_CASSANDRA to SWH_CASSANDRA_LOG (for consistency) 15 July 2020, 10:09:55 UTC
1a8924b 158: Make schema and migration converge so the migration works In the end, the order of the revision entry matters whether we select * or not. So the select must match the order defined in the revision_entry type type. Otherwise, a mismatch type error occurs [1] [1] psql:sql/upgrades/158.sql:74: ERROR: return type mismatch in function declared to return revision_entry 11 July 2020, 06:52:26 UTC
9219a23 in_memory: Fix snapshot_get_branches regression with target_types When providing target_types parameter, snapshot branches must be sorted when iterating otherwise wrong branches can be returned. 10 July 2020, 14:15:54 UTC
23318c2 setup: Do no expose the pytest-plugin any longer Defining the pytest-plugin though the pytest-plugin [1] makes it loaded by default. This creates loading issues on modules depending on storage but not on the pytest plugin storage exposes. It was explained in the doc and I did not realize [2] Instead we'll explicitely define to modules depending on the pytest plugins in their root conftest [3]: ``` pytest_plugins = [ "swh.storage.pytest_plugin" ] ``` [1] https://docs.pytest.org/en/stable/writing_plugins.html#setuptools-entry-points [2] https://docs.pytest.org/en/stable/writing_plugins.html#plugin-discovery-order-at-tool-startup [3] https://docs.pytest.org/en/stable/writing_plugins.html#requiring-loading-plugins-in-a-test-module-or-conftest-file Related to T2484 10 July 2020, 06:19:39 UTC
124e76d Rework dia -> pdf pipeline for inkscape 1.0 - Use dia directly to convert from .dia to .svg (inkscape would use dia via a plugin anyway) - Add proper runes to detect inkscape >= 1 and use the export options for that. 09 July 2020, 17:38:29 UTC
de38cd1 Remove overhead of to_dict/from_dict in test_snapshot_large. This should make it fast enough not to exceed the deadline. 09 July 2020, 16:02:10 UTC
e415488 in_memory: Fix quadratic run time in snapshot_get_branches. snapshot.branches is now an ImmutableDict, which is backed by a tuple of tuples; so random accesses now take a linear time instead of a constant time. This commit replaces random accesses with a single scan of all the items, and does existence checks in a set instead. 09 July 2020, 15:59:48 UTC
c3803ef Fix a typo I introduced in previous revision dict(x if x is not None else None) != dict(x) if x is not None else None... 09 July 2020, 09:56:35 UTC
8bf3794 Convert ImmutableDict to dict before passing it to json.dumps. To work with the new swh-model version, which uses ImmutableDict in model objects. 09 July 2020, 09:31:50 UTC
c21d0e3 Move sharable fixtures out of conftest into a dedicated pytest plugin This will allow loaders to reuse those dedicated fixtures within their code base without having to import the swh.storage.tests.conftest module. Related to T2484 08 July 2020, 09:50:21 UTC
e45ca76 Migrate from vcversioner to setuptools-scm Related to T2105 07 July 2020, 15:42:30 UTC
5ab7023 Extract revision's extra_header as a top level attribute Follows swh.model's evolution for the Revision model class. In Postgresql, store the extra_headers as a bytea[][]. Ensure data present in postgres with extra_headers in the metadata field are properly supported by the pg-backed storage. Get rid of the (now useless) git_headers_to_db() converter function. In Cassandra, store them as frozen<list<list<blob>>>. 07 July 2020, 14:49:14 UTC
8010848 storage: Send metrics from the origin_add endpoint Prior to this commit, since the loaders got migrated to use the main endpoint, no metrics were sent for the origin any longer. This commit fixes it. It also drops the send_metrics call from the deprecated endpoint origin_add_one (which, as an implementation details calls the other one). 06 July 2020, 07:45:40 UTC
95fd660 pg-storage: Add missing cur parameter passing Although, this also pulled a refactoring on the insertion query as the default naive approach ended up with issues on cur already being closed [1] [1] Related to P715 Related to D3416 03 July 2020, 15:54:04 UTC
348bc7b storage.db: Drop db.origin_visit_upsert behavior The initial desired behavior was to allow creation of origin-visit if they already had their id set. This is the what's needed for the replayer to actually work. But somehow, this left the possibility to update the origin-visit... This commit fixes it by dropping conflictual origin-visits if any. In effect, we can no longer overwrite origin-visits (pg-storage wise). Related to T2310 03 July 2020, 14:42:20 UTC
248c277 Move tests of content_metadata_* next to origin_metadata_* For consistency with the main code. 02 July 2020, 09:04:43 UTC
f2619b6 Rework 157 migration to ease replication setup Past experience showed that altering tables is more stressful than plain creation. As in here. Related to T2306 Related P707 01 July 2020, 13:41:16 UTC
312127a storage*: Drop intermediary conversion step into OriginVisit This is no longer possible as OriginVisit no longer hold the same information as OriginVisitStatus. This will allow to drop entirely those fields in the model. Related to T2310 30 June 2020, 13:54:01 UTC
953bd29 pg: use 'on conflict do nothing' strategy for duplicate metadata rows. "updates are a problem for postgresql logical replication" 30 June 2020, 13:25:53 UTC
00f97f0 Document the behavior of adding a duplicate non-intrinsic object is unspecified. 30 June 2020, 13:06:03 UTC
4c2bdad Make the code location of metadata endpoints consistent across backends. 30 June 2020, 12:56:20 UTC
ffe6b92 Add content_metadata_{add,get}. 30 June 2020, 10:31:59 UTC
869679a Add context columns to object_metadata table and object_metadata_{add,get}. Not used/tested yet; will be used when I introduce content_metadata_{get,add}. 30 June 2020, 10:31:59 UTC
27e9426 Generalize origin_metadata to allow support for other object types in the future. 30 June 2020, 10:31:21 UTC
1f0e256 Work around the segmentation faults caused by pytest-coverage + multiprocessing. 30 June 2020, 08:23:25 UTC
dc1878b Make release_add support adding the same object twice in the same call This is an edge case, but the mirror infrastructure is apparently hitting it. We modify the SQL query to be properly idempotent. Also ensure in_memory and cassandra backends behave the same. Note: this revision was mostly written by Nicolas Dandrimont <nicolas@dandrimont.eu>. 29 June 2020, 15:27:21 UTC
10443b8 Iterate over paginated visits in batches to retrieve latest visit/snapshot This should stops the current timeouts on origin with a high number of visits. Related to T2310 26 June 2020, 15:38:22 UTC
182ee49 storage*: Open order parameter to origin-visit-get endpoint This allows clients to search from most recent to oldest visit when calling the endpoint with the "order" parameter set to "desc" (visit id desc). This keeps and explicits the existing sorting order as visit id "asc". Related to T2310 26 June 2020, 11:22:40 UTC
f75cd41 tests*: Drop obsolete origin visit fields Related to T2310 26 June 2020, 10:28:06 UTC
8620519 replayer: Drop obsolete fields from origin-visit Otherwise, we won't be able to replay them. Related T2310 26 June 2020, 07:50:38 UTC
b991e69 test_storage: Add missing tests on origin_visit_get method 25 June 2020, 12:47:11 UTC
89e9dae storage: Given origin-visit index a name to avoid future dev/prod divergence Related to D3342#inline-23217 25 June 2020, 12:37:39 UTC
12d729b Relax checks on journal writes regarding origin-visit* 25 June 2020, 12:35:38 UTC
c6e6f33 replayer: Fix isoformat datetime string for origin-visit We no longer write datetime as strings in the journal. Still, the current journal must have those old values within. Related to D3336 Related to D3345 25 June 2020, 09:19:55 UTC
e5e80ef storage*: Drop obsolete fields from origin_visit Related to T2310 25 June 2020, 08:35:18 UTC
621fc8d Deprecate the origin_add_one() endpoint This endpoint is not really useful since the origin_add() can be used instead. Using a single API endpoint would also make the API a bit more consistant (most other endpoints only provide a xxx_add endpoint) ; having a single endpoint per object_type make is enough and make the whole API simpler. 23 June 2020, 14:07:09 UTC
fb603e1 Make Storage.add_origin() return a sumary dict make it consistent with other add_xxx methods by making it return a summary dict `{"origin:add": int}`. 23 June 2020, 13:58:54 UTC
2d497ff test_origin: Rename appropriately tests So one can trigger tests separately by name tagging. 22 June 2020, 12:39:32 UTC
e9f4554 algos: Improve origin visit get latest visit status algorithm Prior to this commit, this looked up only the latest visit information. This now looks up across multiple visits up (from most recent visit to the oldest) until one visit which match the criteria is elected. 22 June 2020, 12:39:32 UTC
back to top