32d455b | Valentin Lorentz | 24 January 2020, 17:10:40 UTC | Rename in_memory.Storage to in_memory.InMemoryStorage. For consistency with the other class names. | 29 January 2020, 12:34:38 UTC |
d4fb270 | Valentin Lorentz | 24 January 2020, 16:49:42 UTC | Move Storage documentation and endpoint paths to a new StorageInterface class Documentation was duplicated between the in-mem and postgresql storage, and one of them regularly goes out of date. This deduplicates them both to a new class. This new class is also the one declaring the API paths, as it did not make sense to have this declaration on the postgresql storage. Last but not least, this commit adds a test that checks backend classes have all the functions, and they have exactly the same signature as the interface. This will catch stupid bugs before production, eg. if an argument does not have the same name in all classes. | 29 January 2020, 11:16:55 UTC |
1775edd | Valentin Lorentz | 27 January 2020, 16:37:50 UTC | in_memory: Fix content_get_metadata when there is no 'data' key. | 27 January 2020, 16:37:57 UTC |
0f51e8a | Valentin Lorentz | 24 January 2020, 13:41:08 UTC | Remove cur/db arguments from the in-mem storage. They shouldn't be there; bad copy-pasting. | 24 January 2020, 15:53:27 UTC |
1cd53c1 | Valentin Lorentz | 24 January 2020, 15:46:47 UTC | Implement content_update for the in-mem storage. | 24 January 2020, 15:47:10 UTC |
c8389c2 | Antoine R. Dumont (@ardumont) | 24 January 2020, 13:54:26 UTC | 146: Fix typo | 24 January 2020, 13:54:26 UTC |
2ebcdf3 | Antoine R. Dumont (@ardumont) | 24 January 2020, 10:57:46 UTC | pgstorage: Empty temp tables instead of dropping them Due to our pattern of adding objects [1], vacuum is triggered regularly on pg_catalog.*, having an heavy impact on performance. This commit tries to avoid the dropping the temporary tables part, emptying them instead (they still are dropped at the end of the session but less often). This should decrease the bloat on pg_catalog.* tables. [1] - create temporary table - insert data from temporary table to production table with filtering - drop temporary table | 24 January 2020, 11:14:00 UTC |
cc25810 | Daniele Serafini | 22 January 2020, 14:03:41 UTC | assert list doesn't have too many values | 22 January 2020, 14:24:08 UTC |
2ebce62 | Daniele Serafini | 22 January 2020, 13:28:38 UTC | test endpoint: content_missing (sha1_git), snapshot_missing | 22 January 2020, 14:24:08 UTC |
c40d327 | Daniele Serafini | 22 January 2020, 13:27:36 UTC | in memory changes | 22 January 2020, 14:24:08 UTC |
55ebd23 | Daniele Serafini | 22 January 2020, 11:29:07 UTC | storage: Add endpoint to get missing content (by sha1_git) and missing snapshot | 22 January 2020, 14:24:08 UTC |
cfee7b5 | Valentin Lorentz | 22 January 2020, 11:25:24 UTC | Remove redundant config checks in load_and_check_config. 1. There is no reason to force the server to serve only the 'local' backend anymore 2. Missing arguments will error when instantiating the backend class. | 22 January 2020, 11:25:24 UTC |
2454a78 | Antoine Lambert | 17 January 2020, 16:02:35 UTC | docs: Fix sphinx warnings Related to T2188 | 17 January 2020, 16:02:35 UTC |
2dc17cd | Valentin Lorentz | 17 January 2020, 13:49:26 UTC | Remove 'id' and 'object_id' from the output of object_find_by_sha1_git. 'id' is not used anymore, and 'object_id' never was. This commit slightly simplifies existing code, and will allow some deduplication in the upcoming Cassandra backend. | 17 January 2020, 14:10:56 UTC |
b5a5084 | Valentin Lorentz | 16 January 2020, 15:17:44 UTC | Make origin_visit_get_random return None instead of {} if there are no results. This is more consistent with other endpoints. | 17 January 2020, 14:10:36 UTC |
dba9e04 | Valentin Lorentz | 16 January 2020, 13:16:11 UTC | Rewrite test_content_get_partition_empty to not assume partitions are based on sha1. This is not true of the upcoming Cassandra backend. | 17 January 2020, 14:10:11 UTC |
e584655 | Valentin Lorentz | 16 January 2020, 13:13:43 UTC | Remove test_content_*_same_input, which check for behavior we do not want to guarantee. They check that content_add deduplicates with existing content/duplicated input. This is unneeded (the loaders don't send such data), so providing these guarantees unnecessarily complicates swh-storage code, especially the upcoming Cassandra backend. | 17 January 2020, 14:09:38 UTC |
bf77f14 | Antoine R. Dumont (@ardumont) | 17 January 2020, 12:55:36 UTC | storage.retry: Fix objects loading when using generator parameters This will fix related retry error [1] [1] https://sentry.softwareheritage.org/share/issue/ddbbdd3c235b40ca826bf2c820989f14/ Related to cc29708564c35575f569e863f028a480a9905cf4 Related to D2543 | 17 January 2020, 12:55:36 UTC |
cc29708 | Antoine Lambert | 16 January 2020, 16:16:09 UTC | storage: Fix objects loading when using generator parameters Some objects (directories, origins, releases, revisions) will not be added into the storage if they are provided as generator parameters to the *_add methods instead of lists. So ensure to transform generators into lists before processing the objects. | 16 January 2020, 16:53:22 UTC |
8dcac2b | Antoine R. Dumont (@ardumont) | 14 January 2020, 12:40:56 UTC | retry: Implement content_add_metadata endpoint with retry policy | 14 January 2020, 12:45:11 UTC |
aa588c9 | Antoine R. Dumont (@ardumont) | 14 January 2020, 12:40:39 UTC | retry: Migrate to tenacity Which is a maintained fork of retry | 14 January 2020, 12:40:39 UTC |
4aa4d79 | Antoine R. Dumont (@ardumont) | 11 January 2020, 11:05:57 UTC | test_retry: Improve and align consistently assertion checks | 14 January 2020, 10:41:35 UTC |
2b7d770 | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:56:04 UTC | storage.retry: Implement snapshot_add with retry policy | 14 January 2020, 10:41:35 UTC |
df3f33f | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:47:57 UTC | storage.retry: Implement release_add with retry policy | 14 January 2020, 10:41:35 UTC |
54890f7 | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:44:20 UTC | storage.retry: Implement revision_add with retry policy | 14 January 2020, 10:41:35 UTC |
a8efa95 | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:38:48 UTC | storage.retry: Implement directory_add with retry policy | 14 January 2020, 10:41:34 UTC |
dddb6d9 | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:37:41 UTC | in_memory: Make directory_get_random return None when storage empty | 14 January 2020, 10:41:34 UTC |
2dd578c | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:25:31 UTC | storage.retry: Implement origin_visit_update with retry policy | 14 January 2020, 10:41:34 UTC |
32c460c | Antoine R. Dumont (@ardumont) | 11 January 2020, 10:05:13 UTC | storage.retry: Implement origin_metadata_add endpoint with retry policy | 14 January 2020, 10:41:34 UTC |
3cf7adb | Antoine R. Dumont (@ardumont) | 10 January 2020, 17:10:03 UTC | storage.retry: Implement metadata_provider_add endpoint with retry policy | 14 January 2020, 10:41:34 UTC |
08f2f38 | Antoine R. Dumont (@ardumont) | 10 January 2020, 15:48:11 UTC | storage.retry: Implement tool_add endpoint with retry policy | 14 January 2020, 10:41:34 UTC |
fe6440e | Antoine R. Dumont (@ardumont) | 10 January 2020, 15:37:24 UTC | storage.retry: Implement origin_visit_add endpoint with retry policy | 14 January 2020, 10:41:34 UTC |
351b977 | Antoine R. Dumont (@ardumont) | 10 January 2020, 14:59:52 UTC | storage.retry: Implement origin_add_one endpoint with retry policy | 14 January 2020, 10:41:34 UTC |
024eaea | Antoine R. Dumont (@ardumont) | 13 January 2020, 15:14:50 UTC | content_get_metadata: Change api to return Dict[bytes, List[Dict]] Clients will be able to introspect directly from the result whether a content is known or not. | 14 January 2020, 10:38:20 UTC |
07b6dc3 | Antoine R. Dumont (@ardumont) | 13 January 2020, 14:18:55 UTC | storage.content_get_metadata: Adapt to return nothing if inexistent id is passed as input | 13 January 2020, 14:18:55 UTC |
4837f46 | Antoine R. Dumont (@ardumont) | 09 January 2020, 16:17:30 UTC | storage: Add basic proxy storage with retry policy | 10 January 2020, 13:40:01 UTC |
fdf2a3c | Valentin Lorentz | 22 November 2019, 15:09:55 UTC | Add Storage.content_get_partition endpoint, to replace content_get_range. With no guarantees on the order or how partitioning is done, and with the new-style pagination. | 17 December 2019, 12:59:09 UTC |
0f94312 | Valentin Lorentz | 21 November 2019, 13:28:43 UTC | Add endpoint 'origin_list', that will replace 'origin_get_range'. And uses the new pagination scheme, instead of origin ids. | 16 December 2019, 14:17:21 UTC |
295144f | Valentin Lorentz | 13 December 2019, 13:06:12 UTC | Add {content,directory,revision,release,snapshot}_get_random. Will be used to pick random objects to use them in Icinga checks. | 16 December 2019, 12:38:13 UTC |
fe6ac8f | Valentin Lorentz | 12 December 2019, 18:09:01 UTC | Move origin_visit_get_random to the right place in the code and fix its docstring. | 16 December 2019, 12:37:10 UTC |
31b2fc5 | Valentin Lorentz | 12 December 2019, 18:01:38 UTC | Deduplicate server code and move metric handling to storage.py | 16 December 2019, 12:33:31 UTC |
869100b | Valentin Lorentz | 12 December 2019, 17:16:24 UTC | Deduplicate client code. | 12 December 2019, 17:26:14 UTC |
bd2a196 | Antoine R. Dumont (@ardumont) | 10 December 2019, 13:34:11 UTC | storage: Make origin_get_random simpler and faster | 10 December 2019, 13:34:11 UTC |
b440d3a | Antoine R. Dumont (@ardumont) | 09 December 2019, 12:43:53 UTC | storage: Prefer sample query on origin_visit The counter table is faster but may contain holes and be less up-to-date. | 09 December 2019, 12:51:08 UTC |
a2401b5 | Antoine R. Dumont (@ardumont) | 06 December 2019, 11:48:55 UTC | storage: Add endpoint to randomly pick an origin Related to T2120 | 09 December 2019, 12:46:13 UTC |
3855b6e | Antoine R. Dumont (@ardumont) | 09 December 2019, 12:40:57 UTC | tox.ini: Add a py3-dev environment | 09 December 2019, 12:46:13 UTC |
bee73be | Antoine R. Dumont (@ardumont) | 06 December 2019, 10:11:52 UTC | storage.buffer: Buffer release objects as well | 06 December 2019, 10:11:52 UTC |
382e500 | Antoine R. Dumont (@ardumont) | 06 December 2019, 10:11:25 UTC | storage.tests: Unify tests sample data | 06 December 2019, 10:11:25 UTC |
27281e8 | Antoine Lambert | 26 November 2019, 14:21:02 UTC | Makefile.local: Fix test target execution Explicitly pass tests folder as paramter when invoking pytest in order for the hypothesis profiles to be found. | 26 November 2019, 14:22:36 UTC |
2cac339 | Nicolas Dandrimont | 22 November 2019, 17:27:52 UTC | Implement origin lookup by sha1 Close T2045. | 25 November 2019, 14:18:44 UTC |
0fcb8bc | Valentin Lorentz | 22 November 2019, 15:08:37 UTC | Get rid of warnings about the 'args' argument to get_storage. | 22 November 2019, 15:08:37 UTC |
c294b73 | Valentin Lorentz | 22 November 2019, 12:23:54 UTC | Remove/fix wrong comments. | 22 November 2019, 12:23:54 UTC |
a3fd826 | Nicolas Dandrimont | 21 November 2019, 13:10:29 UTC | Migrate tox.ini to extras = xxx instead of deps = .[testing] | 21 November 2019, 13:10:29 UTC |
1594e25 | Nicolas Dandrimont | 21 November 2019, 13:07:08 UTC | Drop unused listener extra | 21 November 2019, 13:07:08 UTC |
df4df8b | Nicolas Dandrimont | 21 November 2019, 13:06:55 UTC | Merge tox test environment configurations | 21 November 2019, 13:06:55 UTC |
06bd050 | Valentin Lorentz | 21 November 2019, 12:38:03 UTC | Deduplicate code of test_origin_get_range. | 21 November 2019, 12:38:03 UTC |
29eb548 | David Douard | 20 November 2019, 10:18:57 UTC | Fix a few typos reported by codespell | 21 November 2019, 12:16:51 UTC |
bc0e81c | David Douard | 20 November 2019, 10:17:40 UTC | pre-commit: explicitely whitelist 'iff' when running codespell | 21 November 2019, 12:16:51 UTC |
1472c8e | David Douard | 20 November 2019, 10:00:57 UTC | fix trailing ws reported by pre-commit | 21 November 2019, 12:16:51 UTC |
264cd33 | David Douard | 20 November 2019, 09:27:42 UTC | Add a pre-commit-hooks.yaml config file | 21 November 2019, 12:16:51 UTC |
a97db93 | David Douard | 20 November 2019, 10:25:51 UTC | Fix swh-storage-add-dir to please mypy, at least. | 21 November 2019, 10:19:43 UTC |
c787808 | David Douard | 20 November 2019, 10:19:26 UTC | Remove utils/(dump|fix)_revisions scripts these are now deprecated. | 21 November 2019, 10:19:43 UTC |
b337b4a | Valentin Lorentz | 14 November 2019, 13:16:49 UTC | Add 'pipeline' storage "class" for more readable configurations. This would allow writing configurations like: ``` storage: cls: pipeline steps: - cls: filter - cls: buffer - cls: remote url: http://swh-storage:5002/ ``` or ``` storage: cls: filter storage: cls: buffer storage: cls: remote url: http://swh-storage:5002/ ``` instead of: ``` storage: cls: filter args: storage: cls: buffer args: storage: cls: remote args: url: http://swh-storage:5002/ ``` | 19 November 2019, 03:16:39 UTC |
ea9aa47 | David Douard | 18 November 2019, 10:16:18 UTC | and not only for an existing origin visit. This is needed in situations where the snapshot table is not in sync with the origin_visit one; typically occurs on mirrors. Also add tests for with_visit argument. | 18 November 2019, 12:28:45 UTC |
e296dfb | Antoine R. Dumont (@ardumont) | 14 November 2019, 09:26:41 UTC | swh.storage.schemata: Drop schemata from storage As this got migrated back to the swh.lister module | 14 November 2019, 09:26:41 UTC |
bb5d405 | Nicolas Dandrimont | 12 November 2019, 18:55:33 UTC | Add minimal test coverage for swh.storage.schemata | 13 November 2019, 12:08:27 UTC |
d788677 | Nicolas Dandrimont | 12 November 2019, 18:08:47 UTC | Fix bogus NotImplementedError on Area.index_uris This would raise when the iteration terminates even though the uris were generated. | 12 November 2019, 18:08:47 UTC |
d4540ed | Valentin Lorentz | 30 October 2019, 13:35:09 UTC | Make visit['origin'] a string everywhere (instead of a dict). | 30 October 2019, 14:08:29 UTC |
0606791 | Valentin Lorentz | 25 October 2019, 11:03:26 UTC | Stop supporting origin ids in API (except in origin_get_range). | 30 October 2019, 13:13:40 UTC |
4ff544a | David Douard | 18 October 2019, 13:40:01 UTC | tests: delete (now useless) storage_testing.py file | 30 October 2019, 09:25:17 UTC |
2a6bf45 | David Douard | 28 October 2019, 10:57:52 UTC | conftest: make it possible to configure SQL dump files used by SwhDatabaseJanitor so one can use the postgresql_fact fixture factory for other tests than Storage ones (eg. for swh-indexer storage tests). | 30 October 2019, 09:06:57 UTC |
e2402e0 | David Douard | 29 October 2019, 09:10:52 UTC | conftest: do not use hypothesis to generate origins and contents using gen_origins and gen_contents helper functions from swh.model for the Storage under test. This is required because 1/ it was a non-conventional use of hypothesis, and 2/ since hypotesis 4.42, tests using hypothesis-generated origins and contents are broken. Also increase default origins generated size to 100. | 30 October 2019, 08:58:10 UTC |
35bdea8 | David Douard | 29 October 2019, 15:14:57 UTC | make in_memory Storage compatible with frozen model entities since attr based model entities are now frozen in swh.model 0.0.50 | 29 October 2019, 15:17:12 UTC |
28818ab | Nicolas Dandrimont | 23 October 2019, 08:46:07 UTC | Remove origin['type'] This is now superseded by origin_visit['type'] | 23 October 2019, 11:08:48 UTC |
655d2ae | Nicolas Dandrimont | 23 October 2019, 08:44:42 UTC | Add missing files to MANIFEST.in | 23 October 2019, 11:08:48 UTC |
1ab9d64 | Nicolas Dandrimont | 23 October 2019, 08:44:11 UTC | Move hypothesis strategy definitions to swh/storage/tests/conftest.py | 23 October 2019, 11:08:48 UTC |
f68f4c3 | Nicolas Dandrimont | 17 October 2019, 12:42:35 UTC | Replace unwanted data.originX['type'] in tests | 23 October 2019, 11:08:48 UTC |
578d2c6 | Nicolas Dandrimont | 23 October 2019, 08:48:55 UTC | Use a wildcard to get the list of SQL files | 23 October 2019, 11:08:48 UTC |
484cebb | Antoine R. Dumont (@ardumont) | 14 October 2019, 12:56:20 UTC | schemata: Send only origin url for scheduling debian loader task Related D2135 | 17 October 2019, 15:53:53 UTC |
9984111 | Nicolas Dandrimont | 17 October 2019, 12:23:38 UTC | Remove origin_visit_add fallback for type=None This prepares the removal of the type column from the origin table | 17 October 2019, 13:45:10 UTC |
6d98503 | Nicolas Dandrimont | 17 October 2019, 12:21:20 UTC | Always use explicit visit type in origin_visit_add | 17 October 2019, 13:44:15 UTC |
6781be1 | Nicolas Dandrimont | 17 October 2019, 12:19:30 UTC | Remove useless origin visits from snapshot-only tests | 17 October 2019, 13:44:15 UTC |
d165c09 | Nicolas Dandrimont | 17 October 2019, 11:37:43 UTC | Remove now-useless triggers | 17 October 2019, 13:44:15 UTC |
74a6437 | Antoine R. Dumont (@ardumont) | 17 October 2019, 12:45:19 UTC | requirements-test: Set pytest-postgresql dependencies version | 17 October 2019, 13:25:27 UTC |
fbdad1c | Valentin Lorentz | 17 October 2019, 12:49:39 UTC | Remove fetch_history. It is not used anymore. | 17 October 2019, 12:59:45 UTC |
84bcfb3 | Antoine R. Dumont (@ardumont) | 17 October 2019, 11:38:35 UTC | tests: Move sample_data fixture to swh/storage/tests/conftest.py | 17 October 2019, 11:41:46 UTC |
3bb46f6 | David Douard | 08 October 2019, 13:39:11 UTC | test_storage: kill StorageTestDbFixture class it's not used any more. | 14 October 2019, 12:32:57 UTC |
dbea02c | David Douard | 10 October 2019, 15:30:36 UTC | test_api_client: refactor the code for new pytest based infra This uses the latest pytest fixtures added in swh.core to define a swh_storage fixture that setup a RPC client/server stack to execute tests defined in classes TestStorage and TestStorageGeneratedData (from swh.storage.tests.test_storage). | 14 October 2019, 12:32:57 UTC |
8003db6 | David Douard | 10 October 2019, 15:29:35 UTC | api: add (missing) refresh_stat_counters() endpoint so we do not need special cases in tests, at least. | 14 October 2019, 12:32:57 UTC |
8a9cfeb | David Douard | 14 October 2019, 08:50:46 UTC | tests/algos: rewrite test_snapshot with pytest | 14 October 2019, 12:32:57 UTC |
cadafef | David Douard | 08 October 2019, 14:10:18 UTC | in_memory: fix handling of 'hidden' content in InMemoryStorage and update tests This defines a local swh_storage fixture that uses an InMemoryStorage to execute tests defined in classes TestStorage and TestStorageGeneratedData (from swh.storage.tests.test_storage). Adapt tests for the InMemoryStorage to new storage test infra. | 14 October 2019, 12:32:14 UTC |
8529e7b | David Douard | 08 October 2019, 14:08:58 UTC | test_storage: rename TestStorageCommonProp as TestStorageGeneratedData somehow a bit clearer, maybe. Also adapt test_in_memory.py and test_api_client.py with this rename even if these tests are xfailed for now to prevent ImportError when running the whole test suite. | 14 October 2019, 12:14:08 UTC |
743f915 | David Douard | 08 October 2019, 14:06:58 UTC | test_storage: make test_origin_metadata_get more robust Nothing garantees us that Storage.origin_metadata_get_by() result is sorted, so don't expect it is. | 14 October 2019, 12:14:08 UTC |
204b9fa | David Douard | 08 October 2019, 13:40:03 UTC | test_storage: move db specific methods from main TestStorage to TestPgStorage which is actually TestAlteringSchema that have been renamed to be a bit more meaningful. | 14 October 2019, 12:14:08 UTC |
9f12202 | David Douard | 14 October 2019, 08:45:13 UTC | conftest: use an in memory obj storage | 14 October 2019, 12:14:08 UTC |
fb70f88 | David Douard | 30 September 2019, 10:14:47 UTC | tests: refactor main storage tests - use pytest instead of unittest.TestCase plumbing - extract data from the TestStorageData into a data `storage_data` module; this module also provide a simple helper `StorageData` class that mimics the original class (access by attributes), - implement a series of pytest fixtures for these storage specific tests, - get rid of most hypothesis-based tests, - replace usage of the use_url hypothesis boolean statetgy by pytest.mark.parametrize fixtures; this allows to prevent from the need of resetting the storage, since tests are truly executed twice (thus with a new swh_storage), - refactor test_db to use pytest-postgresql. Disable (xfail) tests from test_snapshot.py, test_api_client and test_in_memory for now. Fixes/refactorings come with following revisions. | 14 October 2019, 12:14:08 UTC |
62aff76 | Antoine R. Dumont (@ardumont) | 09 October 2019, 21:01:09 UTC | Remove indirection swh.storage.api.wsgi to start server | 09 October 2019, 21:01:09 UTC |
654a37e | Antoine R. Dumont (@ardumont) | 09 October 2019, 13:28:51 UTC | tox.ini: Fix py3 environment to use packaged tests Related D2082 | 09 October 2019, 13:28:51 UTC |
03d5a2c | Antoine R. Dumont (@ardumont) | 08 October 2019, 12:13:14 UTC | swh.storage.buffer: Add buffering proxy storage implementation Related T1389 | 08 October 2019, 14:40:51 UTC |
c83f1f9 | Antoine R. Dumont (@ardumont) | 08 October 2019, 12:09:23 UTC | swh.storage.filter: Add filtering storage implementation Also add a sample_data fixture to read default test data from. Related T1389 | 08 October 2019, 14:12:49 UTC |