https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
3f35a0a New upstream version 1.7.2 31 October 2022, 09:01:30 UTC
82ad28b Fix documentation build It was broken by 0e8da810ac962b79dda18d9cfe9f3c7ea4f9de52 because the conditional import prevents sphinx from detecting where the class is imported from. Additionally, this attribute should not be on the interface, because proxies do not have it, but they are expected to have that interface. 23 October 2022, 15:39:56 UTC
0e8da81 tests: only flush() the kafka journal writer once per test instead of flushing it n times per test. Since the call to kafka Producer.flush() takes about 1s, reducing the number of calls to this method significantly reduce the execution time of the tests. This required a small refactoring of the JournalBackfiller class to make the journal writer live out of the scope of the run() so the test can access the journal writer instance and call the flush() method. Requires swh-journal >= 1.2.0 21 October 2022, 15:24:50 UTC
fe0eaee Make the replayer not crash on kafka messages that fail to be converted as model objects for example, there are a few kafka directory messages in the current production kafka cluster which entries contain the same name twice, preventing the Directory model object from being created at all, which makes the replayer crash. This change makes the replayer able to handle such cases. When the model object creation fails with a ValueError, the error is reported in the (redis) error reporter, but the replaying process continue. Since there is no model object, the error is reported with a crafted error key of the form "{object_type}:{object_id}" if an object id is present in the data structure, or "{object_type}:uuid:{uuid4}" if such an id is not even present. For the record, the standard error key in redis for a model object is it's swhid (if any). 21 October 2022, 14:43:16 UTC
242e37a Add a comment that should have been "kept" from 850a7553b 21 October 2022, 14:43:16 UTC
784f730 Fix typos detected by codespell 19 October 2022, 13:07:21 UTC
086d974 New upstream version 1.7.1 19 October 2022, 12:55:24 UTC
3c08d9f test_retry: Use proper way to mock sleep of retryable storage methods Previous implementation was not mocking sleep of retryable storage methods as the RetryingProxyStorage setup the retry features when it is instantiated. So modify fixture to ensure sleep functions are mocked and return the mocks in a dict indexed by storage method names. This fixes debian buster package build for swh-storage. 19 October 2022, 12:10:32 UTC
c1c2dbf pre-commit, tox: Bump pre-commit, codespell, black and flake8 - pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated). 18 October 2022, 16:53:39 UTC
17d9ad2 docs: Add info about CPAN extrinsic metadata format Related to T2833 17 October 2022, 17:03:30 UTC
e8896bd New upstream version 1.7.0 10 October 2022, 12:04:58 UTC
657d31f postgresql: Remove merge join with origin_visit in origin_visit_get_latest I noticed that `origin_visit_get_latest` spends a lot of time doing index scans on `origin_visit_pkey`: ``` swh=> explain analyze SELECT * FROM origin_visit ov INNER JOIN origin o ON o.id = ov.origin INNER JOIN origin_visit_status ovs USING (origin, visit) WHERE ov.origin = (SELECT id FROM origin o WHERE o.url = 'https://pypi.org/project/simpleado/') AND ovs.snapshot is not null AND ovs.status = 'full' ORDER BY ov.visit DESC, ovs.date DESC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=10.14..29.33 rows=1 width=171) (actual time=1432.475..1432.479 rows=1 loops=1) InitPlan 1 (returns $0) -> Index Scan using origin_url_idx on origin o_1 (cost=0.56..8.57 rows=1 width=8) (actual time=0.077..0.079 rows=1 loops=1) Index Cond: (url = 'https://pypi.org/project/simpleado/'::text) -> Merge Join (cost=1.56..2208.37 rows=115 width=171) (actual time=1432.473..1432.476 rows=1 loops=1) Merge Cond: (ovs.visit = ov.visit) -> Nested Loop (cost=1.00..1615.69 rows=93 width=143) (actual time=298.705..298.707 rows=1 loops=1) -> Index Scan Backward using origin_visit_status_pkey on origin_visit_status ovs (cost=0.57..1606.07 rows=93 width=85) (actual time=298.658..298.658 rows=1 loops=1) Index Cond: (origin = $0) Filter: ((snapshot IS NOT NULL) AND (status = 'full'::origin_visit_state)) Rows Removed by Filter: 198 -> Materialize (cost=0.43..8.46 rows=1 width=58) (actual time=0.042..0.043 rows=1 loops=1) -> Index Scan using origin_pkey on origin o (cost=0.43..8.45 rows=1 width=58) (actual time=0.038..0.038 rows=1 loops=1) Index Cond: (id = $0) -> Index Scan Backward using origin_visit_pkey on origin_visit ov (cost=0.56..590.92 rows=150 width=28) (actual time=30.120..1133.650 rows=100 loops=1) Index Cond: (origin = $0) Planning Time: 0.577 ms Execution Time: 1432.532 ms (18 lignes) ``` As far as I understand, this is because we do not have a FK to tell the planner that every row in `origin_visit_status` does have a corresponding row in `origin_visit`, so it checks every row from `origin_visit_status` in this loop. Therefore, I rewrote the query to use a `LEFT JOIN`, so it will spare this check. First, here is the original query: ``` swh=> explain SELECT * FROM origin_visit ov INNER JOIN origin_visit_status ovs USING (origin, visit) WHERE ov.origin = (SELECT id FROM origin o WHERE o.url = 'https://pypi.org/project/simpleado/') AND ovs.snapshot is not null AND ovs.status = 'full' ORDER BY ov.visit DESC, ovs.date DESC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9.71..28.82 rows=1 width=113) InitPlan 1 (returns $0) -> Index Scan using origin_url_idx on origin o (cost=0.56..8.57 rows=1 width=8) Index Cond: (url = 'https://pypi.org/project/simpleado/'::text) -> Merge Join (cost=1.13..2198.75 rows=115 width=113) Merge Cond: (ovs.visit = ov.visit) -> Index Scan Backward using origin_visit_status_pkey on origin_visit_status ovs (cost=0.57..1606.07 rows=93 width=85) Index Cond: (origin = $0) Filter: ((snapshot IS NOT NULL) AND (status = 'full'::origin_visit_state)) -> Index Scan Backward using origin_visit_pkey on origin_visit ov (cost=0.56..590.92 rows=150 width=28) Index Cond: (origin = $0) (11 lignes) ``` Change columns to filter directly on the "materialized" fields in ovs instead of those on those in ov (no actual change yet): ``` swh=> explain SELECT * FROM origin_visit ov INNER JOIN origin_visit_status ovs USING (origin, visit) WHERE ovs.origin = (SELECT id FROM origin o WHERE o.url = 'https://pypi.org/project/simpleado/') AND ovs.snapshot is not null AND ovs.status = 'full' ORDER BY ovs.visit DESC, ovs.date DESC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9.71..28.82 rows=1 width=113) InitPlan 1 (returns $0) -> Index Scan using origin_url_idx on origin o (cost=0.56..8.57 rows=1 width=8) Index Cond: (url = 'https://pypi.org/project/simpleado/'::text) -> Merge Join (cost=1.13..2198.75 rows=115 width=113) Merge Cond: (ovs.visit = ov.visit) -> Index Scan Backward using origin_visit_status_pkey on origin_visit_status ovs (cost=0.57..1606.07 rows=93 width=85) Index Cond: (origin = $0) Filter: ((snapshot IS NOT NULL) AND (status = 'full'::origin_visit_state)) -> Index Scan Backward using origin_visit_pkey on origin_visit ov (cost=0.56..590.92 rows=150 width=28) Index Cond: (origin = $0) (11 lignes) ``` Then, reorder tables (obviously no change either): ``` swh=> explain SELECT * FROM origin_visit_status ovs INNER JOIN origin_visit ov USING (origin, visit) WHERE ovs.origin = (SELECT id FROM origin o WHERE o.url = 'https://pypi.org/project/simpleado/') AND ovs.snapshot is not null AND ovs.status = 'full' ORDER BY ovs.visit DESC, ovs.date DESC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9.71..28.82 rows=1 width=113) InitPlan 1 (returns $0) -> Index Scan using origin_url_idx on origin o (cost=0.56..8.57 rows=1 width=8) Index Cond: (url = 'https://pypi.org/project/simpleado/'::text) -> Merge Join (cost=1.13..2198.75 rows=115 width=113) Merge Cond: (ovs.visit = ov.visit) -> Index Scan Backward using origin_visit_status_pkey on origin_visit_status ovs (cost=0.57..1606.07 rows=93 width=85) Index Cond: (origin = $0) Filter: ((snapshot IS NOT NULL) AND (status = 'full'::origin_visit_state)) -> Index Scan Backward using origin_visit_pkey on origin_visit ov (cost=0.56..590.92 rows=150 width=28) Index Cond: (origin = $0) (11 lignes) ``` Finally, replace `INNER JOIN` with `LEFT JOIN`: ``` swh=> explain SELECT * FROM origin_visit_status ovs LEFT JOIN origin_visit ov USING (origin, visit) WHERE ovs.origin = (SELECT id FROM origin o WHERE o.url = 'https://pypi.org/project/simpleado/') AND ovs.snapshot is not null AND ovs.status = 'full' ORDER BY ovs.visit DESC, ovs.date DESC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9.71..35.47 rows=1 width=113) InitPlan 1 (returns $0) -> Index Scan using origin_url_idx on origin o (cost=0.56..8.57 rows=1 width=8) Index Cond: (url = 'https://pypi.org/project/simpleado/'::text) -> Nested Loop Left Join (cost=1.13..2396.79 rows=93 width=113) -> Index Scan Backward using origin_visit_status_pkey on origin_visit_status ovs (cost=0.57..1606.07 rows=93 width=85) Index Cond: (origin = $0) Filter: ((snapshot IS NOT NULL) AND (status = 'full'::origin_visit_state)) -> Index Scan using origin_visit_pkey on origin_visit ov (cost=0.56..8.59 rows=1 width=28) Index Cond: ((origin = ovs.origin) AND (origin = $0) AND (visit = ovs.visit)) (10 lignes) ``` This would also work with a subquery just to get the value of `ov.date` and removing the actual join to `ov` entirely, but it was more annoying to implement because the function reuses `self.origin_visit_select_cols` as column list. All these EXPLAIN queries were run on staging. 29 September 2022, 11:22:57 UTC
44616af conftest: Replace multiprocessing hack when pytest-cov >= 4 is installed The hack crashes on >= 4 because 'pytest_cov.embed.multiprocessing_start' is not in the hook list anymore. https://pytest-cov.readthedocs.io/en/latest/changelog.html 29 September 2022, 10:53:41 UTC
87d3f0d retry: Do not retry on SystemExit exceptions It prevents process shutdown (unless the user presses Ctrl-C several times in a row) 28 September 2022, 11:27:05 UTC
7da7067 docs: Update archive stats 27 September 2022, 14:13:51 UTC
26995d4 Handle errors raised by fromisoformat. 27 September 2022, 13:10:33 UTC
1281ee7 Add date-based index to origin_visit This will help queries retrieving origin_visits by date by avoiding having to scan all visits for the origin. 13 September 2022, 14:10:10 UTC
0786ab1 SQL upgrade scripts don't need to bump dbversion anymore 13 September 2022, 13:55:19 UTC
aa735cb docs: Document metadata formats for Gogs and Gitea 31 August 2022, 12:57:58 UTC
d038240 postgresql: Fix SQL query for origin_find_visit_by_date method In that query, the interval alias was set for the visit column instead of the date difference computation which could lead to wrong visit being returned due to invalid results ordering. 30 August 2022, 12:36:35 UTC
57e5431 Raise StorageArgumentException on ProgramLimitExceeded So it is no logged on the server side, and so clients do not retry 29 August 2022, 12:24:56 UTC
b5836ba origin_visit_add: Fix crash when adding multiple visits to the same origin simultaneously This works by adding a RW lock on the row of the latest visit, which should block other transactions until the insertion is committed; so other transactions will generate a different (larger) visit id This commit also slightly rewrites how the max visit id is computed, as we need to actually select a row to lock it, instead of using the `max()` aggregate function. 22 August 2022, 09:22:50 UTC
34ba15b New upstream version 1.6.0 16 August 2022, 12:36:49 UTC
5335244 retry: Add constant 10s wait when retrying transient exceptions They are typically caused by server shutdown and other temporary failures that may take more time than the typical 0-3s delay used by the retry proxy. This should keep noisy exceptions like AdminShutdown out of the Sentry dashboards. 09 August 2022, 13:42:03 UTC
7c7a721 Convert psycopg2 errors to TransientRemoteException instead of RemoteException On the wire, this is done by making the server return a 503 error instead of 500, which the RPC client generated by swh-core interprets to change the exception class. 09 August 2022, 13:38:15 UTC
b4f289c Add anchor extrinsic-metadata-original-artifacts-json It needs to be linked from swh/lister/crates/__init__.py 08 August 2022, 18:42:54 UTC
6dba4b5 New upstream version 1.5.1 05 August 2022, 12:17:20 UTC
280ecc3 cassandra: Fix flakiness of test_directory_add_atomic[concurrent] 05 August 2022, 10:24:10 UTC
68b93a6 Fix crash of test_*_arbitrary when given objects with the same id 05 August 2022, 09:39:42 UTC
9b9eb28 cassandra: Make origin_visit_status_get_random's interval consistent with postgresql The postgresql implementation uses '3 months', which is closer to 13 weeks than to 12 weeks. 05 August 2022, 09:20:33 UTC
56f69e5 Fix flakiness of test_origin_visit_status_get_random_nothing_found start is increased from 13 to 14, because 13 weeks is 91 days, ie. 30+31+30; so it is sometimes smaller than 3 months. This was only hit rarely because the number of visits was small, so this commit also increases the number of visits to make the test more likely to fail if it should actually fail. 05 August 2022, 09:18:46 UTC
ea2f9f4 New upstream version 1.5.0 05 August 2022, 08:44:39 UTC
4825f40 Fix flakiness in test_directory_add_get_arbitrary By ignoring other attributes when raw_manifest is not None; just like we already do in test_revision_add_get_arbitrary and test_release_add_get_arbitrary. 05 August 2022, 08:08:01 UTC
fc89059 Stop logging and sending postgresql timeouts to Sentry They are very noisy, and clients are expected to retry a few times before re-raising the exception on their side. 04 August 2022, 14:02:47 UTC
1e7ede1 Stop using `USE <keyspace>` with prepared statements This caused the following warning: ``` WARNING cassandra.protocol:libevreactor.py:361 Server warning: `USE <keyspace>` with prepared statements is considered to be an anti-pattern due to ambiguity in non-qualified table names. Please consider removing instances of `Session#setKeyspace(<keyspace>)`, `Session#execute("USE <keyspace>")` and `cluster.newSession(<keyspace>)` from your code, and always use fully qualified table names (e.g. <keyspace>.<table>). ``` This also prepends 'test' to the name of keyspaces used in tests, so they are guaranteed to start with an letter (starting with digits cause syntax errors in most statements). 04 August 2022, 11:53:29 UTC
0aff461 cassandra: Simplify SELECT statement formatting 04 August 2022, 11:51:16 UTC
2205fa6 Add test_directory_add_raw_manifest__different_entries This reproduces what I think is the issue found in https://jenkins.softwareheritage.org/job/debian/job/packages/job/DSTO/job/gbp-buildpackage/423/consoleFull This does not fix the issue as it is a consequence of the design, but documents this problematic behavior. 04 August 2022, 09:55:40 UTC
fad99cc New upstream version 1.4.2 04 August 2022, 08:13:22 UTC
fbe3803 postgresql: Increase some timeouts to get origin visits Even if missing index to speedup origin visit queries has been added to replica database, the configured timeouts for origin_visit_get_with_statuses and origin_visit_find_by_date were still too low to avoid query timeouts in production. After performing some tests locally, bumping them to 2000ms makes the timeouts go away. Related to T4386 13 July 2022, 14:44:35 UTC
cfc8679 backfill: Add support for directories with duplicated entries This uses Directory.from_possibly_duplicated_entries() to mangle entry names instead of crashing. 12 July 2022, 12:26:13 UTC
d6db4e4 cli: move an import statement in the cli command 08 July 2022, 11:57:44 UTC
e0825ac do not always auto-create an OriginVisitStatus object in origin_visit_add() when the OriginVisit object given as argument to be inserted already have its visit id set (which is usually the case in a replayer-like session), it makes no sense to auto-add the first OriginVisitStatus objects related to this visit; this behavior is expected only when the origin_visit_add() is called from a loading session. Adapt tests accordingly -- several tests did depend on the auto-add behavior of the origin_visit_add method for OriginVisit objects which visit_id is given in the test dataset. 06 July 2022, 15:28:47 UTC
47caf04 Add a Storage.flavor property to the postgresql backend and add tests for 'mirror' and 'read_replica' flavors. 01 July 2022, 13:03:43 UTC
a00650e Update pytest_plugin for swh.core 2.10 01 July 2022, 13:03:41 UTC
5b366ae New upstream version 1.4.1 03 June 2022, 15:25:17 UTC
c19f53f Set current_version attribute to postgresql datastore This also simplifies the db collaborator code reusing core.db functions to check the code version and the actual db version matches. Related to T4305 03 June 2022, 13:40:56 UTC
e64d64e pytest_plugin: use the stock pytest_postgresql postgresql factory instead of swh-core's postgresql_fact one, since we actually do not use its custom features any more in swh-storage. 31 May 2022, 16:41:53 UTC
a936cfd Add missing __init__.py in proxies/ 31 May 2022, 11:35:36 UTC
cb12394 docs: Describe metadata formats more precisely, and mention github and gitlab's 10 May 2022, 12:05:12 UTC
27d3c8a add strict asyncio_mode in pytest.ini 09 May 2022, 10:14:18 UTC
d0af35e New upstream version 1.4.0 03 May 2022, 09:55:04 UTC
9562953 Add function storage.algos.directory.directory_get It will be used i swh-storage to fetch a complete directory object, ie. with the raw_manifest and all branches. 02 May 2022, 10:26:23 UTC
fb55141 client: Migrate to _post method call to stop deprecation warnings 28 April 2022, 10:23:22 UTC
a942679 Bump mypy to v0.942 26 April 2022, 11:06:14 UTC
2c45e26 New upstream version 1.3.2 25 April 2022, 12:33:50 UTC
e6e658e pre-commit: Remove codespell commit-msg hook That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it. 21 April 2022, 11:39:50 UTC
f136559 User logger everywhere in tenacious.py 14 April 2022, 10:13:58 UTC
bbb4fc1 retry: re-raise original exception instead of a RetryError This will make the sentry reports more usable. If the exception changes across calls, the earlier exceptions are still logged and available as breadcrumbs. 13 April 2022, 13:37:57 UTC
fa5bc0f New upstream version 1.3.1 12 April 2022, 12:48:01 UTC
75aa073 postgresql: ensure origin_visit(_status) queries use index When using an inner join for the single origin value, instead of a subquery, the query fails to use the (origin, visit) indexes and falls back to fetching all the visits (or all the statuses) for the origin and sorting them. This breaks down for origins with a lot of visits, such as the ones that are being used for end to end monitoring. Using a subselect to generate a single origin id value ensures that the queries can use the proper indexes. 12 April 2022, 12:21:50 UTC
035d4c1 origin_get_with_statuses: Rename RPC endpoint path Align RPC endpoint path with method name. 12 April 2022, 10:15:52 UTC
d255fb3 origin_get_with_statuses: Fix case when fetched visits list is empty Ensure to return an empty list of results when fetched visits list is empty, for instance when the provided page_token is greater or equal to latest origin visit. Related to T4090 12 April 2022, 10:06:28 UTC
b655efc New upstream version 1.3.0 11 April 2022, 12:36:59 UTC
7c9586a Add .git-blame-ignore-revs file with automatic reformatting commits 08 April 2022, 13:15:36 UTC
b146bb7 python: Reformat code with black 22.3.0 Related to T3922 08 April 2022, 13:15:09 UTC
93aecb6 pre-commit, tox: Bump black from 19.10b0 to 22.3.0 black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922 08 April 2022, 13:13:51 UTC
567c8e4 interface: Add new method origin_visit_get_with_statuses It enables to retrieve in an efficient and paginated way the list of visits and all their statuses for a given origin. Previously, it was required to call origin_visit_status_get on each visit of the origin to get such list. Related to T4090 07 April 2022, 14:47:06 UTC
812590c requirements-test: Remove pytest pinning to < 7 pytest-postgresql 3.1.3 and pytest-redis 2.4.0 added support for pytest >= 7 so we can now drop the pytest pinning. 06 April 2022, 15:14:53 UTC
c6dc5cd Make postgresql's Storage client options configurable from config Adding a `query_options` member to postgresql's Storage, in conjunction with swh.core >= 2.5, allows to set/overwrite SQL client options from the storage configuration file. Default values are set, as they used to be, from the decorator arguments. But in addition to this, one can overload these value at run time from the storage configuration file. For example: .. code-block:: yaml storage: cls: postgresql db: testdb objstorage: cls: memory query_options: directory_ls: statement_timeout: 180000 will provide a Storage instance for which the timeout value for the `directory_ls` endpoint is 3mn (instead of the default 20s). 04 April 2022, 14:43:17 UTC
5029870 New upstream version 1.2.0 23 March 2022, 16:00:30 UTC
835feb6 Fix tenacious storage tests for swh.model 6 The logic for testing the tenacious storage proxy by interspersing bad objects with good ones triggers when the object list is larger than 3 items. However, the allowed error rate of 1 failure for a window of 3 objects would only work for lists larger than six objects (putting at least 2 good objects between each bad object). swh.model 6 made the directory, revision and release test object lists 3 object long, triggering the buggy code. Reducing the window size to 2 objects makes the test logic works for lists of 3 or more objects. 23 March 2022, 15:34:50 UTC
6057aa8 New upstream version 1.1.0 23 March 2022, 15:01:45 UTC
6fdaf8a Remove typing workarounds for Revision.author or Revision.committer being None swh-model 6.0.0 adds proper support for them. + fix issue found by mypy 23 March 2022, 10:01:34 UTC
3eff720 Add support for author=None and committer=None committer=None happens on some malformed commits generated by old dgit version; and it is possible for author=None to happen for the same reason. For now, this is not supported by swh-model, so tests temporarily disable attrs checks that swh-model relies on. 23 March 2022, 10:00:22 UTC
92c78ab pytest: Exclude build directory for tests discovery Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests. 22 March 2022, 10:58:26 UTC
98b41c8 backfill: Add missing raw_manifest to directories This was not covered by tests so far, because swh.model.tests.swh_model_data.TEST_OBJECTS did not contain any object with a raw_manifest. But it will in swh-model > 5.0.0 16 March 2022, 11:24:46 UTC
8b65e42 backfill: Make integer_ranges() work on str args + add typing to RANGE_GENERATORS Without the type annotation, mypy errors with 'Cannot call function of unknown type' when called from a type-checked function. 15 March 2022, 16:19:22 UTC
77f7e6d postgresql: Remove unused listener code from db.py 15 March 2022, 12:36:42 UTC
ccde097 origin_visit_get_latest: Order by visit id instead of date This allows both the postgresql and cassandra backends to make efficient queries by using an index (resp. clustering key) instead of scanning all visits of the given origin then sorting by date. This does not affect the results for the last majority of cases, as ids are always in increasing chronological, unless an origin was re-loaded from an old archive. 11 March 2022, 12:43:36 UTC
600e87f origin_visit_get_latest: Materialize subquery on 'origin' table. postgresql's query planner does not understand the origin is unique, so it performs a partial index scan on origin_visit_pkey, which is inefficient on origins with many visits. This commit itself is not enough to make it use the proper index, but provides this necessary change that will be used by a future commit. 11 March 2022, 12:36:07 UTC
b0cdab5 postgresql: Increase timeouts that often fail According to Sentry, in the last 30 days: * directory_entry_get_by_path: 958 events, https://sentry.softwareheritage.org/share/issue/c4c2124953a145b2bd325f6f6b7df5a6/ * revision_get: 841 events, https://sentry.softwareheritage.org/share/issue/55fbe01c6f4d4c9bbf684c7608a62ad9/ * release_get: 14 events, https://sentry.softwareheritage.org/share/issue/37c53354541b4c4eaa1faf4e20a68418/ * origin_visit_find_by_date: 114 events, https://sentry.softwareheritage.org/share/issue/a674c12049a941968a717661a0226559/ * origin_get: 79 events, https://sentry.softwareheritage.org/share/issue/bf21d6bc7b24442eb18643d80d936d27/ ; 67 events, https://sentry.softwareheritage.org/share/issue/010a4b1e085a4e2089ba4897c6de6038/ 11 March 2022, 12:19:28 UTC
4e78014 Remove aiohttp from requirements.txt it's not used by swh.storage. 08 March 2022, 15:29:15 UTC
284a4ab Move metrics handling from backends to RPC server Motivation: replaces code duplication in the backends with a single one, to be consistent with the objstorage (which has many more backends) This also fixes the issue of metrics from 'extid_add' to be missing when using the postgresql storage. 02 March 2022, 11:32:18 UTC
35e7f0c New upstream version 1.0.0 24 February 2022, 11:13:55 UTC
215162b Update for swh.core 2.0.0 - Add expected entry points for swh.core 2 db handling new features: - add a ``swh.storge.get_datastore()`` function - add ``swh.storage.postgreql.storage.Storage.get_current_version()`` method - move sql migration scripts in ``swh/storage/sql/upgrades`` - modify sql initialization scripts to match swh.core 2 (remove dbversion management code). - Update tests to use the new template-based database handling; this should have only minimal impact on test execution performances. 24 February 2022, 10:05:01 UTC
386fb4d Add types-toml to requirements-test.txt 24 February 2022, 10:05:01 UTC
f578377 pre-commit: Bump hooks and add new one to check commit message spelling To install the new hook: $ pre-commit install -t commit-msg 10 February 2022, 16:29:28 UTC
10d367f New upstream version 0.43.1 08 February 2022, 09:31:31 UTC
8bf07c3 revision_walker: Actually pass ignore_displaynam to revision_log I somehow forgot to stage this change in the previous commit. 04 February 2022, 15:46:07 UTC
0f9f54b revision_walker: Add support for ignore_displayname. This is needed by the vault. 04 February 2022, 15:14:59 UTC
75a7f09 Add typing to revision_walker.py and make the state a dataclass 04 February 2022, 15:14:53 UTC
a3a63d8 Require pytest to be <7.0.0 04 February 2022, 14:02:23 UTC
ec42d59 New upstream version 0.43.0 02 February 2022, 17:56:31 UTC
4544d7c Introduce a new displayname field for persons in the PostgreSQL storage Extend the APIs for Revisions and Releases to honor the field by default, unless the new `ignore_displayname` argument is set. 01 February 2022, 12:03:52 UTC
97caa93 Make test_release_add_get_arbitrary non-flaky It was made flaky by d4ddd41535d0ce1cd50d51d297e154bf0ab6e649. 01 February 2022, 12:03:52 UTC
f868f3c Mostly use normalized Person objects in tests This opens up the possibility of eventually ignoring the `name` and `email` fields stored in database in favor of parsing them again from the fullname field (and therefore to update our parsing logic without having to affect stored data). 31 January 2022, 20:56:47 UTC
d4ddd41 postgresql: Use Person.from_fullname if name and email are None This allows us to populate sensible name and email values out of the new displayname field, without having to store them. 31 January 2022, 19:12:09 UTC
bedc372 New upstream version 0.42.0 25 January 2022, 15:54:44 UTC
6f02524 Fix directory_add to actually insert the manifest + add directory_get_raw_manifest I don't expect directory_get_raw_manifest to be used, but it is needed for tests, so why not. 25 January 2022, 09:46:27 UTC
5874905 Stop using the deprecated 'TimestampWithTimezone.offset' attribute It will be replaced by what is currently called 'offset_bytes' 21 January 2022, 14:01:31 UTC
back to top