https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
273538c Updated backport on buster-swh from debian/1.4.0-1_swh1 (unstable-swh) 03 May 2022, 10:09:00 UTC
46332af Merge tag 'debian/1.4.0-1_swh1' into debian/buster-swh 03 May 2022, 10:08:59 UTC
002897e Updated debian changelog for version 1.4.0 03 May 2022, 09:55:07 UTC
2c4770f Update upstream source from tag 'debian/upstream/1.4.0' Update to upstream version '1.4.0' with Debian dir 73644a39fe481a5df44d85383ea7932ad94b8b0e 03 May 2022, 09:55:06 UTC
d0af35e New upstream version 1.4.0 03 May 2022, 09:55:04 UTC
9562953 Add function storage.algos.directory.directory_get It will be used i swh-storage to fetch a complete directory object, ie. with the raw_manifest and all branches. 02 May 2022, 10:26:23 UTC
fb55141 client: Migrate to _post method call to stop deprecation warnings 28 April 2022, 10:23:22 UTC
a942679 Bump mypy to v0.942 26 April 2022, 11:06:14 UTC
62cf989 Updated backport on buster-swh from debian/1.3.2-1_swh1 (unstable-swh) 25 April 2022, 12:48:49 UTC
5ad117d Merge tag 'debian/1.3.2-1_swh1' into debian/buster-swh 25 April 2022, 12:48:48 UTC
d4183fb Updated debian changelog for version 1.3.2 25 April 2022, 12:33:52 UTC
9daf7bc Update upstream source from tag 'debian/upstream/1.3.2' Update to upstream version '1.3.2' with Debian dir 4800aced1470880655cac85fcaa18ec010184687 25 April 2022, 12:33:51 UTC
2c45e26 New upstream version 1.3.2 25 April 2022, 12:33:50 UTC
e6e658e pre-commit: Remove codespell commit-msg hook That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it. 21 April 2022, 11:39:50 UTC
f136559 User logger everywhere in tenacious.py 14 April 2022, 10:13:58 UTC
bbb4fc1 retry: re-raise original exception instead of a RetryError This will make the sentry reports more usable. If the exception changes across calls, the earlier exceptions are still logged and available as breadcrumbs. 13 April 2022, 13:37:57 UTC
72c2b91 Updated backport on buster-swh from debian/1.3.1-1_swh1 (unstable-swh) 12 April 2022, 13:02:20 UTC
20cbf17 Merge tag 'debian/1.3.1-1_swh1' into debian/buster-swh 12 April 2022, 13:02:19 UTC
f66d1e5 Updated debian changelog for version 1.3.1 12 April 2022, 12:48:04 UTC
ac7bd15 Update upstream source from tag 'debian/upstream/1.3.1' Update to upstream version '1.3.1' with Debian dir 396826333823bf05c82e05c3784d0cea79d93835 12 April 2022, 12:48:03 UTC
fa5bc0f New upstream version 1.3.1 12 April 2022, 12:48:01 UTC
75aa073 postgresql: ensure origin_visit(_status) queries use index When using an inner join for the single origin value, instead of a subquery, the query fails to use the (origin, visit) indexes and falls back to fetching all the visits (or all the statuses) for the origin and sorting them. This breaks down for origins with a lot of visits, such as the ones that are being used for end to end monitoring. Using a subselect to generate a single origin id value ensures that the queries can use the proper indexes. 12 April 2022, 12:21:50 UTC
035d4c1 origin_get_with_statuses: Rename RPC endpoint path Align RPC endpoint path with method name. 12 April 2022, 10:15:52 UTC
d255fb3 origin_get_with_statuses: Fix case when fetched visits list is empty Ensure to return an empty list of results when fetched visits list is empty, for instance when the provided page_token is greater or equal to latest origin visit. Related to T4090 12 April 2022, 10:06:28 UTC
303fa7a Updated backport on buster-swh from debian/1.3.0-1_swh1 (unstable-swh) 11 April 2022, 12:51:42 UTC
1792143 Merge tag 'debian/1.3.0-1_swh1' into debian/buster-swh 11 April 2022, 12:51:41 UTC
034c0d2 Updated debian changelog for version 1.3.0 11 April 2022, 12:37:01 UTC
96b96f0 Update upstream source from tag 'debian/upstream/1.3.0' Update to upstream version '1.3.0' with Debian dir 6ba548a311bba2851a07d5071aa45cd4ed4f4e10 11 April 2022, 12:37:00 UTC
b655efc New upstream version 1.3.0 11 April 2022, 12:36:59 UTC
7c9586a Add .git-blame-ignore-revs file with automatic reformatting commits 08 April 2022, 13:15:36 UTC
b146bb7 python: Reformat code with black 22.3.0 Related to T3922 08 April 2022, 13:15:09 UTC
93aecb6 pre-commit, tox: Bump black from 19.10b0 to 22.3.0 black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922 08 April 2022, 13:13:51 UTC
567c8e4 interface: Add new method origin_visit_get_with_statuses It enables to retrieve in an efficient and paginated way the list of visits and all their statuses for a given origin. Previously, it was required to call origin_visit_status_get on each visit of the origin to get such list. Related to T4090 07 April 2022, 14:47:06 UTC
812590c requirements-test: Remove pytest pinning to < 7 pytest-postgresql 3.1.3 and pytest-redis 2.4.0 added support for pytest >= 7 so we can now drop the pytest pinning. 06 April 2022, 15:14:53 UTC
c6dc5cd Make postgresql's Storage client options configurable from config Adding a `query_options` member to postgresql's Storage, in conjunction with swh.core >= 2.5, allows to set/overwrite SQL client options from the storage configuration file. Default values are set, as they used to be, from the decorator arguments. But in addition to this, one can overload these value at run time from the storage configuration file. For example: .. code-block:: yaml storage: cls: postgresql db: testdb objstorage: cls: memory query_options: directory_ls: statement_timeout: 180000 will provide a Storage instance for which the timeout value for the `directory_ls` endpoint is 3mn (instead of the default 20s). 04 April 2022, 14:43:17 UTC
fa6c2df Updated backport on buster-swh from debian/1.2.0-1_swh1 (unstable-swh) 23 March 2022, 16:14:20 UTC
d4472a4 Merge tag 'debian/1.2.0-1_swh1' into debian/buster-swh 23 March 2022, 16:14:20 UTC
16a98b9 Updated debian changelog for version 1.2.0 23 March 2022, 16:00:33 UTC
8e2b1a4 Update upstream source from tag 'debian/upstream/1.2.0' Update to upstream version '1.2.0' with Debian dir 3e9154aa3229b4f3f0f8aa34d6b653a3bd67ae99 23 March 2022, 16:00:32 UTC
5029870 New upstream version 1.2.0 23 March 2022, 16:00:30 UTC
835feb6 Fix tenacious storage tests for swh.model 6 The logic for testing the tenacious storage proxy by interspersing bad objects with good ones triggers when the object list is larger than 3 items. However, the allowed error rate of 1 failure for a window of 3 objects would only work for lists larger than six objects (putting at least 2 good objects between each bad object). swh.model 6 made the directory, revision and release test object lists 3 object long, triggering the buggy code. Reducing the window size to 2 objects makes the test logic works for lists of 3 or more objects. 23 March 2022, 15:34:50 UTC
1d7df1e Updated debian changelog for version 1.1.0 23 March 2022, 15:01:48 UTC
3c105d7 Update upstream source from tag 'debian/upstream/1.1.0' Update to upstream version '1.1.0' with Debian dir 03abd5d07629cc28195bb55df5963468fc443ebe 23 March 2022, 15:01:47 UTC
6057aa8 New upstream version 1.1.0 23 March 2022, 15:01:45 UTC
6fdaf8a Remove typing workarounds for Revision.author or Revision.committer being None swh-model 6.0.0 adds proper support for them. + fix issue found by mypy 23 March 2022, 10:01:34 UTC
3eff720 Add support for author=None and committer=None committer=None happens on some malformed commits generated by old dgit version; and it is possible for author=None to happen for the same reason. For now, this is not supported by swh-model, so tests temporarily disable attrs checks that swh-model relies on. 23 March 2022, 10:00:22 UTC
92c78ab pytest: Exclude build directory for tests discovery Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests. 22 March 2022, 10:58:26 UTC
98b41c8 backfill: Add missing raw_manifest to directories This was not covered by tests so far, because swh.model.tests.swh_model_data.TEST_OBJECTS did not contain any object with a raw_manifest. But it will in swh-model > 5.0.0 16 March 2022, 11:24:46 UTC
8b65e42 backfill: Make integer_ranges() work on str args + add typing to RANGE_GENERATORS Without the type annotation, mypy errors with 'Cannot call function of unknown type' when called from a type-checked function. 15 March 2022, 16:19:22 UTC
77f7e6d postgresql: Remove unused listener code from db.py 15 March 2022, 12:36:42 UTC
ccde097 origin_visit_get_latest: Order by visit id instead of date This allows both the postgresql and cassandra backends to make efficient queries by using an index (resp. clustering key) instead of scanning all visits of the given origin then sorting by date. This does not affect the results for the last majority of cases, as ids are always in increasing chronological, unless an origin was re-loaded from an old archive. 11 March 2022, 12:43:36 UTC
600e87f origin_visit_get_latest: Materialize subquery on 'origin' table. postgresql's query planner does not understand the origin is unique, so it performs a partial index scan on origin_visit_pkey, which is inefficient on origins with many visits. This commit itself is not enough to make it use the proper index, but provides this necessary change that will be used by a future commit. 11 March 2022, 12:36:07 UTC
b0cdab5 postgresql: Increase timeouts that often fail According to Sentry, in the last 30 days: * directory_entry_get_by_path: 958 events, https://sentry.softwareheritage.org/share/issue/c4c2124953a145b2bd325f6f6b7df5a6/ * revision_get: 841 events, https://sentry.softwareheritage.org/share/issue/55fbe01c6f4d4c9bbf684c7608a62ad9/ * release_get: 14 events, https://sentry.softwareheritage.org/share/issue/37c53354541b4c4eaa1faf4e20a68418/ * origin_visit_find_by_date: 114 events, https://sentry.softwareheritage.org/share/issue/a674c12049a941968a717661a0226559/ * origin_get: 79 events, https://sentry.softwareheritage.org/share/issue/bf21d6bc7b24442eb18643d80d936d27/ ; 67 events, https://sentry.softwareheritage.org/share/issue/010a4b1e085a4e2089ba4897c6de6038/ 11 March 2022, 12:19:28 UTC
4e78014 Remove aiohttp from requirements.txt it's not used by swh.storage. 08 March 2022, 15:29:15 UTC
284a4ab Move metrics handling from backends to RPC server Motivation: replaces code duplication in the backends with a single one, to be consistent with the objstorage (which has many more backends) This also fixes the issue of metrics from 'extid_add' to be missing when using the postgresql storage. 02 March 2022, 11:32:18 UTC
bdfe9d9 Updated backport on buster-swh from debian/1.0.0-1_swh1 (unstable-swh) 24 February 2022, 11:27:28 UTC
730eeed Merge tag 'debian/1.0.0-1_swh1' into debian/buster-swh 24 February 2022, 11:27:28 UTC
3612cc6 Updated debian changelog for version 1.0.0 24 February 2022, 11:13:57 UTC
eae87d8 Update upstream source from tag 'debian/upstream/1.0.0' Update to upstream version '1.0.0' with Debian dir 8ae626ab798d36848604eb33d6358be5da4d7885 24 February 2022, 11:13:57 UTC
35e7f0c New upstream version 1.0.0 24 February 2022, 11:13:55 UTC
61acdf3 Prepare v1: bump dependency to swh.core 2 24 February 2022, 11:02:29 UTC
215162b Update for swh.core 2.0.0 - Add expected entry points for swh.core 2 db handling new features: - add a ``swh.storge.get_datastore()`` function - add ``swh.storage.postgreql.storage.Storage.get_current_version()`` method - move sql migration scripts in ``swh/storage/sql/upgrades`` - modify sql initialization scripts to match swh.core 2 (remove dbversion management code). - Update tests to use the new template-based database handling; this should have only minimal impact on test execution performances. 24 February 2022, 10:05:01 UTC
386fb4d Add types-toml to requirements-test.txt 24 February 2022, 10:05:01 UTC
f578377 pre-commit: Bump hooks and add new one to check commit message spelling To install the new hook: $ pre-commit install -t commit-msg 10 February 2022, 16:29:28 UTC
e4c4983 Updated backport on buster-swh from debian/0.43.1-1_swh1 (unstable-swh) 08 February 2022, 09:45:21 UTC
2f90249 Merge tag 'debian/0.43.1-1_swh1' into debian/buster-swh 08 February 2022, 09:45:20 UTC
97244e6 Updated debian changelog for version 0.43.1 08 February 2022, 09:31:33 UTC
631eb17 Update upstream source from tag 'debian/upstream/0.43.1' Update to upstream version '0.43.1' with Debian dir baaa2fcae1f2b0d13d9d811af8f9f21ca52024d8 08 February 2022, 09:31:33 UTC
10d367f New upstream version 0.43.1 08 February 2022, 09:31:31 UTC
8bf07c3 revision_walker: Actually pass ignore_displaynam to revision_log I somehow forgot to stage this change in the previous commit. 04 February 2022, 15:46:07 UTC
0f9f54b revision_walker: Add support for ignore_displayname. This is needed by the vault. 04 February 2022, 15:14:59 UTC
75a7f09 Add typing to revision_walker.py and make the state a dataclass 04 February 2022, 15:14:53 UTC
a3a63d8 Require pytest to be <7.0.0 04 February 2022, 14:02:23 UTC
25fd5e2 Updated backport on buster-swh from debian/0.43.0-1_swh1 (unstable-swh) 02 February 2022, 18:10:01 UTC
40dc7c6 Merge tag 'debian/0.43.0-1_swh1' into debian/buster-swh 02 February 2022, 18:10:01 UTC
191992d Updated debian changelog for version 0.43.0 02 February 2022, 17:56:34 UTC
48357a7 Update upstream source from tag 'debian/upstream/0.43.0' Update to upstream version '0.43.0' with Debian dir 2f67565746d5bbcfcedb31d8154c0ed0db035567 02 February 2022, 17:56:33 UTC
ec42d59 New upstream version 0.43.0 02 February 2022, 17:56:31 UTC
4544d7c Introduce a new displayname field for persons in the PostgreSQL storage Extend the APIs for Revisions and Releases to honor the field by default, unless the new `ignore_displayname` argument is set. 01 February 2022, 12:03:52 UTC
97caa93 Make test_release_add_get_arbitrary non-flaky It was made flaky by d4ddd41535d0ce1cd50d51d297e154bf0ab6e649. 01 February 2022, 12:03:52 UTC
f868f3c Mostly use normalized Person objects in tests This opens up the possibility of eventually ignoring the `name` and `email` fields stored in database in favor of parsing them again from the fullname field (and therefore to update our parsing logic without having to affect stored data). 31 January 2022, 20:56:47 UTC
d4ddd41 postgresql: Use Person.from_fullname if name and email are None This allows us to populate sensible name and email values out of the new displayname field, without having to store them. 31 January 2022, 19:12:09 UTC
271a259 Updated backport on buster-swh from debian/0.42.0-1_swh1 (unstable-swh) 25 January 2022, 16:08:11 UTC
8079f65 Merge tag 'debian/0.42.0-1_swh1' into debian/buster-swh 25 January 2022, 16:08:10 UTC
b752a68 Updated debian changelog for version 0.42.0 25 January 2022, 15:54:46 UTC
c2a93a6 Update upstream source from tag 'debian/upstream/0.42.0' Update to upstream version '0.42.0' with Debian dir 2b9b41be406880810778652bc1db36aade0058e2 25 January 2022, 15:54:45 UTC
bedc372 New upstream version 0.42.0 25 January 2022, 15:54:44 UTC
6f02524 Fix directory_add to actually insert the manifest + add directory_get_raw_manifest I don't expect directory_get_raw_manifest to be used, but it is needed for tests, so why not. 25 January 2022, 09:46:27 UTC
5874905 Stop using the deprecated 'TimestampWithTimezone.offset' attribute It will be replaced by what is currently called 'offset_bytes' 21 January 2022, 14:01:31 UTC
2e74138 Remove 'offset' and 'negative_utc' This only keeps 'offset_bytes' to store the timezone, to support swh-model v5.0.0. However, this keeps writing 'offset' and 'negative_utc' to the postgresql database, just in case we need to roll back this change. But they are not read anymore. 21 January 2022, 13:07:02 UTC
c68a4fd postgres: Add indices to keep track of objects with a raw_manifest They should be a rare occurence, so adding these indices allows us to count and enumerate them without expensive full table scans. 18 January 2022, 10:04:28 UTC
228de33 Fix sphinx error [2022-01-17T16:03:27.448Z] /var/lib/jenkins/workspace/DSTO/tests-on-diff@2/docs/index.rst:25:hardcoded link 'https://archive.softwareheritage.org/api/' could be replaced by an extlink (try using ':swh_web:`api/`' instead) 18 January 2022, 09:16:05 UTC
40a57d4 cassandra: Make content_missing run in linear time instead of quadratic Assuming all contents passed to content_missing() have (at least) a missing algo, the function used to iterate over the size of the arg squared in the worst case (when all contents are found). With this commit, it starts with bucketing them by hash, so it does not need to iterate over *all* found contents for each content passed as arg. 12 January 2022, 10:38:20 UTC
d5f1f0e cassandra: Rewrite content_missing to run queries concurrently. This is twice as fast, according to https://forge.softwareheritage.org/T3577#72791 12 January 2022, 10:37:57 UTC
4a24505 cassandra: Use concurrent queries in *_missing() instead of naive grouping Instead of grouping ids in queries in arbitrary batches (which forces the server node to coordinate with other nodes to complete the query), this sends queries with one id each, directly to the right node. This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791 which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions. 06 January 2022, 11:43:09 UTC
4618e7f Updated backport on buster-swh from debian/0.41.1-1_swh1 (unstable-swh) 04 January 2022, 15:08:37 UTC
3ce1ffb Merge tag 'debian/0.41.1-1_swh1' into debian/buster-swh 04 January 2022, 15:08:37 UTC
f63d2e7 Updated debian changelog for version 0.41.1 04 January 2022, 14:55:33 UTC
71abcc3 Update upstream source from tag 'debian/upstream/0.41.1' Update to upstream version '0.41.1' with Debian dir cffa1d419282197d3c69b493f6fc958541b9c38d 04 January 2022, 14:55:32 UTC
c7cbc9a New upstream version 0.41.1 04 January 2022, 14:55:31 UTC
back to top