https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
be225df New upstream version 0.0.171 06 February 2020, 14:07:35 UTC
2b029b7 Split 'content_add' method into 'content_add' and 'skipped_content_add'. Respectively to add present content and skipped content. This simplifies the logic of both methods, and is a necessary step to typing / using swh-model objects everywhere, as contents have quite different attributes depending on whether they are present or missing. 06 February 2020, 13:29:31 UTC
93ea487 Increase Cassandra requests timeout to 1 second. 100ms worked fine so far, but we're starting to get some timeouts on the Azure test cluster. Multiplying the timeout by 10 should give us ample room to work with. 04 February 2020, 12:38:32 UTC
a66e16c New upstream version 0.0.170 03 February 2020, 13:23:46 UTC
b315f9d Tune Cassandra test config for lower test latency. 03 February 2020, 12:31:28 UTC
25941d5 Make tests reuse the same keyspace/schema instead of recreating it for each test. This makes tests run 16 times faster than https://forge.softwareheritage.org/D2612 (which is itself 3 times faster than this commit's parent) 03 February 2020, 11:26:24 UTC
eb155ad Add Cassandra backend. 31 January 2020, 15:05:53 UTC
523f2eb New upstream version 0.0.169 30 January 2020, 13:26:21 UTC
cf45ec6 retry: Add retry behavior on pipeline storage with flushing failure Currently, wrong "hash collisions" are happening a lot on ingestion [1] [2] [3] The last loading step (flush) is failing on most loaders (git, npm, etc...). This commits adds the retry behavior to the current pipeline storage deployed. Which should decrease the frequency of that error. The remaining hash collision which won't subside should be then real hash collisions. [1] https://sentry.softwareheritage.org/share/issue/102aace238fe4ba6b49bcc5531f7c2bf/ [2] https://sentry.softwareheritage.org/share/issue/8e8b48a1d94c465b8109e76311ecdbe7/ [3] https://sentry.softwareheritage.org/share/issue/d4f1208b7eec4b43b11e38494ff039cc/ 30 January 2020, 11:22:16 UTC
3e6d2bf New upstream version 0.0.168 30 January 2020, 10:25:29 UTC
1608fcd Allow deprecated endpoints to be missing from a backend class. 29 January 2020, 15:50:10 UTC
68702b5 CONTRIBUTORS: add Daniele Serafini 29 January 2020, 13:24:09 UTC
32d455b Rename in_memory.Storage to in_memory.InMemoryStorage. For consistency with the other class names. 29 January 2020, 12:34:38 UTC
d4fb270 Move Storage documentation and endpoint paths to a new StorageInterface class Documentation was duplicated between the in-mem and postgresql storage, and one of them regularly goes out of date. This deduplicates them both to a new class. This new class is also the one declaring the API paths, as it did not make sense to have this declaration on the postgresql storage. Last but not least, this commit adds a test that checks backend classes have all the functions, and they have exactly the same signature as the interface. This will catch stupid bugs before production, eg. if an argument does not have the same name in all classes. 29 January 2020, 11:16:55 UTC
1775edd in_memory: Fix content_get_metadata when there is no 'data' key. 27 January 2020, 16:37:57 UTC
0f51e8a Remove cur/db arguments from the in-mem storage. They shouldn't be there; bad copy-pasting. 24 January 2020, 15:53:27 UTC
1cd53c1 Implement content_update for the in-mem storage. 24 January 2020, 15:47:10 UTC
e62d6e4 New upstream version 0.0.167 24 January 2020, 14:01:56 UTC
c8389c2 146: Fix typo 24 January 2020, 13:54:26 UTC
2ebcdf3 pgstorage: Empty temp tables instead of dropping them Due to our pattern of adding objects [1], vacuum is triggered regularly on pg_catalog.*, having an heavy impact on performance. This commit tries to avoid the dropping the temporary tables part, emptying them instead (they still are dropped at the end of the session but less often). This should decrease the bloat on pg_catalog.* tables. [1] - create temporary table - insert data from temporary table to production table with filtering - drop temporary table 24 January 2020, 11:14:00 UTC
74bb123 New upstream version 0.0.166 24 January 2020, 09:00:10 UTC
cc25810 assert list doesn't have too many values 22 January 2020, 14:24:08 UTC
2ebce62 test endpoint: content_missing (sha1_git), snapshot_missing 22 January 2020, 14:24:08 UTC
c40d327 in memory changes 22 January 2020, 14:24:08 UTC
55ebd23 storage: Add endpoint to get missing content (by sha1_git) and missing snapshot 22 January 2020, 14:24:08 UTC
cfee7b5 Remove redundant config checks in load_and_check_config. 1. There is no reason to force the server to serve only the 'local' backend anymore 2. Missing arguments will error when instantiating the backend class. 22 January 2020, 11:25:24 UTC
2454a78 docs: Fix sphinx warnings Related to T2188 17 January 2020, 16:02:35 UTC
2dc17cd Remove 'id' and 'object_id' from the output of object_find_by_sha1_git. 'id' is not used anymore, and 'object_id' never was. This commit slightly simplifies existing code, and will allow some deduplication in the upcoming Cassandra backend. 17 January 2020, 14:10:56 UTC
b5a5084 Make origin_visit_get_random return None instead of {} if there are no results. This is more consistent with other endpoints. 17 January 2020, 14:10:36 UTC
dba9e04 Rewrite test_content_get_partition_empty to not assume partitions are based on sha1. This is not true of the upcoming Cassandra backend. 17 January 2020, 14:10:11 UTC
e584655 Remove test_content_*_same_input, which check for behavior we do not want to guarantee. They check that content_add deduplicates with existing content/duplicated input. This is unneeded (the loaders don't send such data), so providing these guarantees unnecessarily complicates swh-storage code, especially the upcoming Cassandra backend. 17 January 2020, 14:09:38 UTC
079fa61 New upstream version 0.0.165 17 January 2020, 13:09:38 UTC
bf77f14 storage.retry: Fix objects loading when using generator parameters This will fix related retry error [1] [1] https://sentry.softwareheritage.org/share/issue/ddbbdd3c235b40ca826bf2c820989f14/ Related to cc29708564c35575f569e863f028a480a9905cf4 Related to D2543 17 January 2020, 12:55:36 UTC
4738b2e New upstream version 0.0.164 16 January 2020, 17:05:01 UTC
cc29708 storage: Fix objects loading when using generator parameters Some objects (directories, origins, releases, revisions) will not be added into the storage if they are provided as generator parameters to the *_add methods instead of lists. So ensure to transform generators into lists before processing the objects. 16 January 2020, 16:53:22 UTC
e362b9d New upstream version 0.0.163 14 January 2020, 16:17:43 UTC
8dcac2b retry: Implement content_add_metadata endpoint with retry policy 14 January 2020, 12:45:11 UTC
aa588c9 retry: Migrate to tenacity Which is a maintained fork of retry 14 January 2020, 12:40:39 UTC
4aa4d79 test_retry: Improve and align consistently assertion checks 14 January 2020, 10:41:35 UTC
2b7d770 storage.retry: Implement snapshot_add with retry policy 14 January 2020, 10:41:35 UTC
df3f33f storage.retry: Implement release_add with retry policy 14 January 2020, 10:41:35 UTC
54890f7 storage.retry: Implement revision_add with retry policy 14 January 2020, 10:41:35 UTC
a8efa95 storage.retry: Implement directory_add with retry policy 14 January 2020, 10:41:34 UTC
dddb6d9 in_memory: Make directory_get_random return None when storage empty 14 January 2020, 10:41:34 UTC
2dd578c storage.retry: Implement origin_visit_update with retry policy 14 January 2020, 10:41:34 UTC
32c460c storage.retry: Implement origin_metadata_add endpoint with retry policy 14 January 2020, 10:41:34 UTC
3cf7adb storage.retry: Implement metadata_provider_add endpoint with retry policy 14 January 2020, 10:41:34 UTC
08f2f38 storage.retry: Implement tool_add endpoint with retry policy 14 January 2020, 10:41:34 UTC
fe6440e storage.retry: Implement origin_visit_add endpoint with retry policy 14 January 2020, 10:41:34 UTC
351b977 storage.retry: Implement origin_add_one endpoint with retry policy 14 January 2020, 10:41:34 UTC
024eaea content_get_metadata: Change api to return Dict[bytes, List[Dict]] Clients will be able to introspect directly from the result whether a content is known or not. 14 January 2020, 10:38:20 UTC
07b6dc3 storage.content_get_metadata: Adapt to return nothing if inexistent id is passed as input 13 January 2020, 14:18:55 UTC
4837f46 storage: Add basic proxy storage with retry policy 10 January 2020, 13:40:01 UTC
fdf2a3c Add Storage.content_get_partition endpoint, to replace content_get_range. With no guarantees on the order or how partitioning is done, and with the new-style pagination. 17 December 2019, 12:59:09 UTC
0f94312 Add endpoint 'origin_list', that will replace 'origin_get_range'. And uses the new pagination scheme, instead of origin ids. 16 December 2019, 14:17:21 UTC
33925f7 New upstream version 0.0.162 16 December 2019, 13:41:38 UTC
295144f Add {content,directory,revision,release,snapshot}_get_random. Will be used to pick random objects to use them in Icinga checks. 16 December 2019, 12:38:13 UTC
fe6ac8f Move origin_visit_get_random to the right place in the code and fix its docstring. 16 December 2019, 12:37:10 UTC
31b2fc5 Deduplicate server code and move metric handling to storage.py 16 December 2019, 12:33:31 UTC
869100b Deduplicate client code. 12 December 2019, 17:26:14 UTC
6e74f85 New upstream version 0.0.161 10 December 2019, 14:08:13 UTC
bd2a196 storage: Make origin_get_random simpler and faster 10 December 2019, 13:34:11 UTC
b440d3a storage: Prefer sample query on origin_visit The counter table is faster but may contain holes and be less up-to-date. 09 December 2019, 12:51:08 UTC
a2401b5 storage: Add endpoint to randomly pick an origin Related to T2120 09 December 2019, 12:46:13 UTC
3855b6e tox.ini: Add a py3-dev environment 09 December 2019, 12:46:13 UTC
0599649 New upstream version 0.0.160 06 December 2019, 10:23:42 UTC
bee73be storage.buffer: Buffer release objects as well 06 December 2019, 10:11:52 UTC
382e500 storage.tests: Unify tests sample data 06 December 2019, 10:11:25 UTC
27281e8 Makefile.local: Fix test target execution Explicitly pass tests folder as paramter when invoking pytest in order for the hypothesis profiles to be found. 26 November 2019, 14:22:36 UTC
2cac339 Implement origin lookup by sha1 Close T2045. 25 November 2019, 14:18:44 UTC
0fcb8bc Get rid of warnings about the 'args' argument to get_storage. 22 November 2019, 15:08:37 UTC
c294b73 Remove/fix wrong comments. 22 November 2019, 12:23:54 UTC
6c99bca New upstream version 0.0.159 22 November 2019, 10:10:29 UTC
a3fd826 Migrate tox.ini to extras = xxx instead of deps = .[testing] 21 November 2019, 13:10:29 UTC
1594e25 Drop unused listener extra 21 November 2019, 13:07:08 UTC
df4df8b Merge tox test environment configurations 21 November 2019, 13:06:55 UTC
06bd050 Deduplicate code of test_origin_get_range. 21 November 2019, 12:38:03 UTC
29eb548 Fix a few typos reported by codespell 21 November 2019, 12:16:51 UTC
bc0e81c pre-commit: explicitely whitelist 'iff' when running codespell 21 November 2019, 12:16:51 UTC
1472c8e fix trailing ws reported by pre-commit 21 November 2019, 12:16:51 UTC
264cd33 Add a pre-commit-hooks.yaml config file 21 November 2019, 12:16:51 UTC
a97db93 Fix swh-storage-add-dir to please mypy, at least. 21 November 2019, 10:19:43 UTC
c787808 Remove utils/(dump|fix)_revisions scripts these are now deprecated. 21 November 2019, 10:19:43 UTC
b337b4a Add 'pipeline' storage "class" for more readable configurations. This would allow writing configurations like: ``` storage: cls: pipeline steps: - cls: filter - cls: buffer - cls: remote url: http://swh-storage:5002/ ``` or ``` storage: cls: filter storage: cls: buffer storage: cls: remote url: http://swh-storage:5002/ ``` instead of: ``` storage: cls: filter args: storage: cls: buffer args: storage: cls: remote args: url: http://swh-storage:5002/ ``` 19 November 2019, 03:16:39 UTC
ea9aa47 and not only for an existing origin visit. This is needed in situations where the snapshot table is not in sync with the origin_visit one; typically occurs on mirrors. Also add tests for with_visit argument. 18 November 2019, 12:28:45 UTC
2a46802 New upstream version 0.0.158 14 November 2019, 12:37:16 UTC
e296dfb swh.storage.schemata: Drop schemata from storage As this got migrated back to the swh.lister module 14 November 2019, 09:26:41 UTC
05d648e New upstream version 0.0.157 13 November 2019, 12:27:05 UTC
bb5d405 Add minimal test coverage for swh.storage.schemata 13 November 2019, 12:08:27 UTC
d788677 Fix bogus NotImplementedError on Area.index_uris This would raise when the iteration terminates even though the uris were generated. 12 November 2019, 18:08:47 UTC
67029e6 New upstream version 0.0.156 30 October 2019, 14:29:26 UTC
d4540ed Make visit['origin'] a string everywhere (instead of a dict). 30 October 2019, 14:08:29 UTC
0606791 Stop supporting origin ids in API (except in origin_get_range). 30 October 2019, 13:13:40 UTC
9064867 New upstream version 0.0.155 30 October 2019, 11:18:36 UTC
4ff544a tests: delete (now useless) storage_testing.py file 30 October 2019, 09:25:17 UTC
2a6bf45 conftest: make it possible to configure SQL dump files used by SwhDatabaseJanitor so one can use the postgresql_fact fixture factory for other tests than Storage ones (eg. for swh-indexer storage tests). 30 October 2019, 09:06:57 UTC
e2402e0 conftest: do not use hypothesis to generate origins and contents using gen_origins and gen_contents helper functions from swh.model for the Storage under test. This is required because 1/ it was a non-conventional use of hypothesis, and 2/ since hypotesis 4.42, tests using hypothesis-generated origins and contents are broken. Also increase default origins generated size to 100. 30 October 2019, 08:58:10 UTC
35bdea8 make in_memory Storage compatible with frozen model entities since attr based model entities are now frozen in swh.model 0.0.50 29 October 2019, 15:17:12 UTC
28818ab Remove origin['type'] This is now superseded by origin_visit['type'] 23 October 2019, 11:08:48 UTC
655d2ae Add missing files to MANIFEST.in 23 October 2019, 11:08:48 UTC
back to top