https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
e289593 requirements-test.txt: Drop no longer needed pytest-postgresql requirement requirements-swh.txt already declares the swh.core[db] dependency which transitively pulls it. Related to T2746 23 November 2020, 12:07:45 UTC
0065d4d backfill: Reverse flawed logic in SnapshotBranch generation The previous code would nullify all non-null branches, and try to create a SnapshotBranch out of null branches. 13 November 2020, 15:51:29 UTC
f501136 migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot As this happens for about 50 revisions in the archive. 13 November 2020, 14:26:15 UTC
20d3f8e backfill: only flush the journal writer on every batch This module's use of write_addition predated the introduction of reliable writing in swh.journal; Since this introduction, the backfiller has been flushing the kafka writer after writing each single object, leading to a 3x measured slowdown on backfilling contents. 13 November 2020, 10:17:31 UTC
248a04b Don't use string expansions in debug logging 12 November 2020, 17:07:14 UTC
3eba73d migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing as this happens quite often and isn't an error. 09 November 2020, 15:32:20 UTC
f3652a9 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. 09 November 2020, 15:32:20 UTC
c0a3d96 migrate_extrinsic_metadata: use the retry proxy Because it makes a lot of get requests and doesn't handle failures, it crashed often. 09 November 2020, 15:32:20 UTC
aded45b Make the retry proxy work on all functions. The metadata migration script kept crashing otherwise. 09 November 2020, 15:32:20 UTC
2e7d489 Set the value_sanitizer argument of get_journal_writer. The next version of swh-journal will remove the default value. 09 November 2020, 15:32:20 UTC
24cdc85 cassandra: Fix content_missing_per_sha1_git implementation 09 November 2020, 13:13:10 UTC
84984a6 algos.snapshot.snapshot_resolve_alias: Don't return the branch list. It complicates the signature and the code, and we don't have any use for it currently. 05 November 2020, 11:08:13 UTC
fa86834 Add test for snapshot_resolve_alias with a missing branch. 05 November 2020, 10:57:08 UTC
1826b2b Simplify algos.snapshot.snapshot_resolve_alias. 1. rename branch_info to last_branch 2. exclude the last_branch from 'branches', so that: a) it never needs to contain a None value, so we don't need a cast b) no need for slicing 05 November 2020, 10:57:08 UTC
943e440 Rename the `id` argument of raw_extrinsic_metadata_get to `target` Consistently with the new name for this attribute in swh.model 0.7.2. 03 November 2020, 14:59:52 UTC
48b6dbe cassandra/in_memory: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:22 UTC
4fbf481 PostgreSQL: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:16 UTC
8b18155 algos/snapshot: Add function to resolve branch alias to real target Related to T2734 03 November 2020, 11:49:07 UTC
6e3e350 migrate_extrinsic_metadata: Write metadata on directories instead of revisions. To match the new behavior of package loaders. 29 October 2020, 10:14:48 UTC
97d0b05 pre-commit: Fix codespell regexp related error 27 October 2020, 15:43:36 UTC
9645aef Replace RawExtrinsicMetadata `id` attribute with `target`. The old attribute was deprecated in swh.model 0.7.2 27 October 2020, 14:37:13 UTC
5819683 Update swh.storage.validate for swh.model 0.7.2 swh.model.model.ModelObject.compute_hash was changed to a method instead of a staticmethod. 27 October 2020, 13:11:34 UTC
4f35f7f Add black change on swh.storage.backfill 27 October 2020, 12:47:33 UTC
474ee72 --amend 22 October 2020, 20:35:32 UTC
eb3952f migrate_extrinsic_metadata: Make pypi_origin_from_filename fix project names when possible using PyPI's API. 22 October 2020, 14:23:30 UTC
b1a3b80 migrate_extrinsic_metadata: move pypi_origin_from_filename to its own function. Instead of bloating handle_row, which is already way too long. 22 October 2020, 14:22:46 UTC
aeb72c7 migrate_extrinsic_metadata: add support for guix revisions 22 October 2020, 10:32:48 UTC
73dc5e3 migrate_extrinsic_metadata: allow deposits with 'id' missing from their metadata. 22 October 2020, 10:28:18 UTC
c483066 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with an hex digit from 0 to 5. 22 October 2020, 10:27:16 UTC
2bfd9fe storage.pytest_plugin: Reuse swh.core.db.db_utils postgresql_fact 22 October 2020, 09:58:03 UTC
d93429f api.server: Add missing coverage on make_app_from_configfile factory This is actually what starts the server, so it sounds more reasonable to test that part. 19 October 2020, 13:08:01 UTC
ca8e6aa api.server: Drop the % in the error message 19 October 2020, 13:08:01 UTC
49d787c storage.api.server: Add type to load_and_check_config then refactor tests This also drops the type parameter from load_and_check_config which is never used. 16 October 2020, 13:57:11 UTC
1a9687f backfill: use get_journal_writer instead if instantiating JournalWriter directly. A future version of swh-journal will add a mandatory argument to JournalWriter, whic get_journal_writer sets by default. 12 October 2020, 17:21:09 UTC
b425b5c migrate_extrinsic_metadata: add support for the new deposit metadata formats introduced in late september. * https://forge.softwareheritage.org/D4065 * https://forge.softwareheritage.org/D4105 12 October 2020, 13:07:27 UTC
a11d58a Remove a bunch of deprecated instances of `args` in configurations Notably, `get_objstorage`'s `args` has been deprecated as of swh.objstorage 0.2.2. 09 October 2020, 15:29:10 UTC
a085b7e backfill: use the common `storage` top-level config key This makes the backfiller configuration compatible with all other modules. 08 October 2020, 18:35:49 UTC
dceeb74 backfill: support arbitrary journal writer configuration This allows more settings than the previous hardcoded three, e.g. the `privileged` flag to backfill a journal containing anonymous topics. 08 October 2020, 18:35:49 UTC
a6af589 retry: don't retry on keyboardinterrupt. Otherwise, Ctrl-C is ignored if pressed while sending a request. 02 October 2020, 09:07:09 UTC
889bd87 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with 0, 1, or 2. 02 October 2020, 09:07:09 UTC
9ddbb69 migrate_extrinsic_metadata: allow dash in deposit client and collection names. 02 October 2020, 09:07:09 UTC
59e7e68 migrate_extrinsic_metadata: update name of column deposit.swhid_context. It was renamed in 4d72d1be529a568784842f5c0864e862a4b4705c. 02 October 2020, 09:07:09 UTC
07df3f6 migrate_extrinsic_metadata: Add support for the current format of original_artifacts written by the CRAN loader. 02 October 2020, 09:07:09 UTC
bef08d6 Fix object_types default in buffer interface protocol and impls Default argument object_types was not properly declared in StorageInterface and concrete implmentations PostgreSQL and Cassandra. Reverted unnecessary fix in storage tests. 30 September 2020, 09:26:12 UTC
40997c0 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with 0 or 1. 29 September 2020, 12:36:24 UTC
e37c8f7 Pin black in tox to the same version as .pre-commit-config.yaml 28 September 2020, 13:37:43 UTC
c812c79 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. 26 September 2020, 06:04:52 UTC
0adb8fc Add a regression test for the buffer proxy default settings This is used by swh.loader.core, regressed in v0.15.0 but wasn't caught by local tests. 25 September 2020, 15:14:16 UTC
dd5fb8d Drop vcversioner from requirements We stopped using it months ago. 25 September 2020, 15:14:16 UTC
632e99e Add static check to object_type literals in buffers 25 September 2020, 12:23:30 UTC
a75c5ca Improve typing of the buffering interface - use more generic collection types, so that parametrized types can be made stricter (e.g. str, in the next revision) - remove Optionals that are not needed and provide better defaults 25 September 2020, 12:23:30 UTC
e8f1136 Run isort after the CLI import changes 25 September 2020, 12:19:21 UTC
ac3c537 Update sql paths for the moved SQL files This should fix the currently failing documentation build. 24 September 2020, 18:08:56 UTC
96be9bd Fix default value handling in constructor Use a more simple default value and do not identity check against it. 24 September 2020, 16:33:09 UTC
829118a Add the SQL commands used to set up the logical replication publication 24 September 2020, 11:57:18 UTC
5d3de06 Support different database flavors in the SQL scripts This uses a new database table and some psql conditionals to introduce three different flavors for the swh.storage Postgres database: - the 'default' flavor has all the deduplication features, foreign keys and read indexes - the 'mirror' flavor has all the deduplication features and read indexes; it drops some foreign keys to allow for out of order addition of some object types - the 'read_replica' flavor has the minimal set of indexes to support read queries, and replication using the PostgreSQL logical replication feature Related to T2604. 24 September 2020, 11:57:14 UTC
63426e6 pytest_plugin: Use psql to load SQL files instead of connecting with psycopg2 This avoids running into issues when the SQL files contain psql-specific features like backslash-escapes. 24 September 2020, 11:54:38 UTC
38b1dbf Output a warning when the version of the database is different than expected 24 September 2020, 11:54:38 UTC
e37f639 Improve code quality and doc in BufferedProxyStorage - better names related to the object buffers - extracted parameter dicts from the constructor - used more generic typing in function parameters and more specific in other contexts in order to apply the principle of robustness 23 September 2020, 22:21:54 UTC
c97b23b Adapt cli declaration entrypoint to swh.core 0.3 23 September 2020, 14:13:01 UTC
924621f pytest_plugin: Order the fixture definitions in dependency order 23 September 2020, 10:30:55 UTC
6286e18 pytest_plugin: Change dbname to storage to avoid clash in tests Other similar fixtures in other modules which use the same "tests" db already. Clash can then happen when table names exists in different modules (e.g. dbversion exist both in scheduler and storage dbs). 23 September 2020, 10:28:52 UTC
8c44a29 pytest_plugin: Reuse swh_storage_postgresql connection string The `swh_storage_postgresql.dsn` string already contains the connection information necessary for the tests to run. 23 September 2020, 10:27:02 UTC
30cdb78 Drop the -swh- part of sql files it does not bring any meaningful info and makes it somewhat inconsistent with the new -superuser- "tag". 22 September 2020, 08:18:07 UTC
915575d Rename 10-swh-init.sql as 10-superuser-init.sql so the db initialization from swh.core (>= 0.3) executes this during the database creation step (i.e. while having a superuser level connection to the database). 22 September 2020, 08:14:15 UTC
67ee86b Warn about skipped_content sneaking the 'content' topics 21 September 2020, 08:43:00 UTC
8de6564 Small fix in the graph replayer to prevent a wrong warning 18 September 2020, 14:30:37 UTC
b0027ab python: Reorder imports with isort Related to T2610 17 September 2020, 16:06:07 UTC
d27a046 pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 16:06:06 UTC
469c38c Make origin_add() handle multiple occurences of an origin properly this is needed to prevent some traceback in case an origin is present several times in the same origin_add batch, which situation has been seen in some mirror tests. 17 September 2020, 14:47:27 UTC
37ce2c4 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:57:15 UTC
f008a59 migrate_extrinsic_metadata: improve pypi_project_from_filename to support suffixes after the version number. 16 September 2020, 15:00:42 UTC
8e32ad0 migrate_extrinsic_metadata: guess PyPI origins. This works by guessing the package name from the original_artifact data, then building an origin that would match the package name, then filtering checking if the revision can be reached from it. 16 September 2020, 15:00:37 UTC
0bbcd91 migrate_extrinsic_metadata.test_pypi: use the in-memory storage instead of mocks in a future commit, migrating pypi revisions will become more interactive with the storage, so it's easier to have a real one instead of a mock. 16 September 2020, 15:00:33 UTC
1676478 migrate_extrinsic_metadata.test_debian: use the in-memory storage instead of mocks in tests that need to read in the storage. Using mocks just makes it more complicated, and we decided not to do that a while ago. 16 September 2020, 15:00:29 UTC
723e728 migrate_extrinsic_metadata: fix crash on dangling branch. 16 September 2020, 15:00:23 UTC
89d23cb migrate_extrinsic_metadata: fix crash when a Debian revision is missing. https://forge.softwareheritage.org/T997 16 September 2020, 15:00:19 UTC
2ad5600 migrate_extrinsic_metadata: guess Debian origins. This works by guessing the package name from the original_artifact data, then building origins that would match the package name, then filtering out origins by checking if the revision can be reached from them. 16 September 2020, 15:00:16 UTC
3b781a8 sql: Make the extra_headers not null a constraint Due to the data volume, the basic not null instruction constraint (which is blocking) impose us downtime for loaders otherwise. Related to T2547 14 September 2020, 11:07:31 UTC
ed55e9c Load the "fast" hypothesis profile by default otherwise, running pytest directly (without specifying a --hypothesis-profile option) will use default hypothesis values that are unsuitable for us. 11 September 2020, 10:27:17 UTC
fd6d72f cli: speedup the `swh` cli command startup time by moving import statements in functions, as well as preventing the loading of swh.storage.interface in swh.storage unless needed for type checking. Related to T2575. 11 September 2020, 09:15:58 UTC
d24a1e7 migrate_extrinsic_metadata: retry in case of database errors. They are likely to happen since this script takes a long time to run. 10 September 2020, 07:39:55 UTC
5ec70a6 Add a Python script to migrate extrinsic metadata from revision metadata. 10 September 2020, 07:39:55 UTC
93458a4 Import get_objstorage from swh.objstorage.factory instead of swh.objstorage (deprecated). 09 September 2020, 17:26:00 UTC
90eda98 directory_ls: Don't return None for status/length/sha1/... if the content is known but skipped. 08 September 2020, 11:52:33 UTC
3198e11 Add a test for directory_ls when the contents are missing. 08 September 2020, 11:52:33 UTC
374e01c algos.diff: Add missed revision_get conversion This fixes by doing the minimum required so the diff module still works and fix [1]. Will release release 0.14.1 asap. [1] https://forge.softwareheritage.org/harbormaster/unit/view/918517/ Related to T645 04 September 2020, 13:37:24 UTC
356eacd Refactor revision_get storage API to return Revision objects The signature becomes revision_get(...) -> List[Optional[Revision]] Related to T645 03 September 2020, 15:05:11 UTC
36d284c cassandra: Discard Content ctime field in content_get_partition That field should only be returned by content_find method. Add tests to check expected behaviors according to methods. 02 September 2020, 13:10:57 UTC
e6fcfb9 storage*: release_get(...) -> List[Optional[Release]] Related to T645 01 September 2020, 12:19:19 UTC
e6c17f6 Make StorageInterface a Protocol. We're already using 'StorageInterface' as type annotation where we expect an object that implements the same methods, but it isn't technically correct since these objects aren't an instance of a subclass. Making StorageInterface a protocol reflects how we already use it, without lying to mypy. 01 September 2020, 08:38:36 UTC
5afd985 Add a validating storage proxy, to check ids before insertion. It can be used to prevent buggy or malicious clients of the storage from adding invalid objects. 28 August 2020, 11:18:53 UTC
4532a4d Add a --check-config option for cli commands this allows to specify on command line wether to check the configuration and read or write access to the storage at startup time (especially for `rpc-server` and `replay` commands.) Warning: this option defaults to "read" for the backfill command and "write" for rpc-serve and replay commands, so now the creation of the Storage instance used in cli commands *will be checked*. Closes T2525. 25 August 2020, 08:50:50 UTC
2a35c0b Remove the deprecated config-path option from `swh storage rpc-serve` command include the validation of the presence of a "storage" config section in the main `storage` click.group, where the config-file is actually parsed. 25 August 2020, 08:50:50 UTC
e8b1b21 Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. Before: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 6.93s user 0.51s system 100% cpu 7.425 total After: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 2.39s user 0.10s system 100% cpu 2.475 total 25 August 2020, 08:01:20 UTC
f3870dc 161: Fix sql upgrade script Related to T2524 25 August 2020, 07:35:31 UTC
cc33dd3 Add support for a new "check_config" config option in get_storage() if "check_config" is present in get_storage()'s kwargs, call the storage.check_config() method and raise an exception is the result fails. The storage.check_config() method is called with the content of the "check_config" config value (so expected value for this currently is `check_config={"check_write": <bool>}`). 24 August 2020, 14:58:17 UTC
4dd9597 Check for db version mismatch in PgStorage.check_config() 24 August 2020, 14:58:17 UTC
c16ff50 Add a check_dbversion() method to the Db class This method compares the version stored in the database in the `dbversion` table with the currently declared version. This current version is declared as a simple `current_version` attribute on the Db class. **It must be updated jointly in 30-swh-schema.sql** when modifying the db schema. Note that if this is forgotten, the added test `test_dbversion` should fail. Related to T2525. 24 August 2020, 14:58:17 UTC
629d2d1 Fix pytest_plugin's database janitor: do not truncate the dbversion table 24 August 2020, 14:58:17 UTC
back to top