https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
14739c5 RawExtrinsicMetadata: update to use the API in swh-model 1.0.0 01 March 2021, 16:38:44 UTC
2388748 storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released. 25 February 2021, 15:33:40 UTC
f3ef6e6 storage: Implement visit types filtering in origin_search method Enable to filter searched origins by visit types. Add a new optional visit_types parameter to origin_search method in StorageInterface. Implement visit types filtering in storage backends, an origin wil be returned if it has any of the requested visit types. This is clearly not designed to be used in production due to performance issues but rather in testing environments with small archive dataset. Related to T2869 19 February 2021, 10:36:29 UTC
7b4c124 167: Make the migration script unblocking 17 February 2021, 09:18:26 UTC
cc3eb4b Switch anonymized replayer test to use pytest parametrization This allows us to only read the kafka topics once instead of twice in the same tests, which is apparently a hard thing to do in a way compatible with both confluent-kafka 1.5 and 1.6. 16 February 2021, 16:09:03 UTC
e0e88b2 storage: Refactor OriginVisitStatus instantiation 09 February 2021, 16:01:26 UTC
d30ca93 db: Unify sql joins on origin_visit_status using "USING" 09 February 2021, 16:01:26 UTC
046fe57 storage.postgresql: Use origin_visit_status.type value as source This stops using the origin_visit.type as fallback values as now, the database has been migrated. So this makes the origin_visit_status.type a not nullable column. This also drops now redundant join instructions on origin_visit table when reading. Related to T2968 09 February 2021, 16:01:25 UTC
51df58e test_replay: Fix hang since confluent-kafka 1.6 release Side effect of the following commit in librdkafka 1.6: https://github.com/edenhill/librdkafka/commit/f418e0f721518d71ff533759698b647cb2e89b80 Tests was relying on a buggy behavior of the mocked kafka cluster: two subsequent consumers setup with the same group id should receive a different set of messages, rather than the same set of messages. Also explicitly commit messages once consumed. 09 February 2021, 14:56:15 UTC
b038383 postgresql: Fix dbversion() to return the max version instead of a random one. 08 February 2021, 11:13:03 UTC
efd8815 buffer: ensure objects are flushed in topological order This new integration test checks that, when flushing the buffer storage, the addition functions of the underlying storage backend are called in topological order (content, directory, revision, release then snapshot). This reduces the probability of "data consistency" regressions caused by the use of the buffering storage proxy alone. 04 February 2021, 18:17:11 UTC
1526107 Return an accurate summary from buffer's flush() method The earlier implementation would only return summary data from keys that existed in the last `_add` backend method run, rather than collating all the results. 04 February 2021, 18:14:03 UTC
5b3e6c9 buffer: add support for snapshots This is mostly a consistency addition, considering that most (if not all) loaders will only add a single snapshot. The common pattern of loading objects in topological order (content > directory > revision > release > snapshot), then flushing the storage, is now fully consistent; Without this addition, the snapshot addition would reach the backend storage before all other objects are added, leading to potential inconsistencies if the flush of other object types fails. 04 February 2021, 13:37:12 UTC
18967ed buffer: add type annotations for tests 04 February 2021, 09:19:34 UTC
9a9f234 storage: Make origin_get_latest_visit_status return OriginVisitStatus This returned a Tuple[OriginVisit, OriginVisitStatus]. This was required to have the missing information "type" for visit-status. This is no longer needed as now OriginVisitStatus holds the type information. 01 February 2021, 11:06:35 UTC
626b0bf Change origin_visit_status_get_random interface to return visit_status This returned a Tuple[OriginVisit, OriginVisitStatus] which is no longer needed as now OriginVisitStatus held the type information now. 01 February 2021, 11:06:34 UTC
f6ae8a0 Write introduction to swh-storage. Explains: * when to use swh-web instead * that `get_storage` should always be used to instantiate the storage * `StorageInterface` * model objects * pagination * backends 01 February 2021, 11:03:02 UTC
76de53c Correctly return origin_visit_status.type value everywhere If the type is not present on an origin_visit_status, it should be computed from the origin_visit. There were some methods which only return the origin_visit_status value. It breaks the webapp mangling the type to empty value on the search result page. Related to T3001 28 January 2021, 11:15:11 UTC
e433255 db: Allow new status values not_found, failed to OriginVisitStatus Related to T2961 20 January 2021, 14:36:12 UTC
d04165f Add type to the origin_visit_status topic useful when the type is not yet populated in the database Related to T2966 18 January 2021, 10:49:34 UTC
c24d35f Add persistence of the field OriginVisitStatus.type (!) A new database upgrade is needed (165.sql) for postgresql backend Related to T2964 15 January 2021, 11:38:38 UTC
da55308 Make test_content_add_race fail for the right reason. Since 209de5dbaa127dacd114fbbd084f22632982eb77, it was failing because of: TypeError("content_add() got an unexpected keyword argument 'db'") 15 January 2021, 10:30:36 UTC
0b44b37 Adapt cassandra storage to ignore the new OriginVisitStatus.type field Depends on D4848 Related to T2443 13 January 2021, 10:06:12 UTC
728c3ee Allow to use the JAVA_HOME environment for cassandra tests This allows to enforce a specific version of java to be used. For example, since cassandra seems not to support java 14 yet, this allows to run tests on bullseye: JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ pytest swh 13 January 2021, 09:56:07 UTC
30945a5 Enforce hypothesis <6 to prevent test breakage hypothesis 6 upgraded a warning into an error: now raises a FailedHealthCheck when using a pytest fixture with a @given generative test set. See https://hypothesis.readthedocs.io/en/latest/healthchecks.html 13 January 2021, 09:42:42 UTC
74e6f58 Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list prevent being fooled by a missing '\n'. 08 January 2021, 13:20:08 UTC
2b35198 Add a cli section in the doc 18 December 2020, 12:41:23 UTC
04ae89f storage.backfill: Allow cli run for origin_visit_status as well 24 November 2020, 17:21:21 UTC
64ee845 conftest: Reference swh.core.db.pytest_plugin As it's exposed through the swh.storage.pytest_plugin itself used by other swh modules, this needs to be declared to avoid other swh module build failures. Related to T2746 24 November 2020, 13:08:12 UTC
e289593 requirements-test.txt: Drop no longer needed pytest-postgresql requirement requirements-swh.txt already declares the swh.core[db] dependency which transitively pulls it. Related to T2746 23 November 2020, 12:07:45 UTC
0065d4d backfill: Reverse flawed logic in SnapshotBranch generation The previous code would nullify all non-null branches, and try to create a SnapshotBranch out of null branches. 13 November 2020, 15:51:29 UTC
f501136 migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot As this happens for about 50 revisions in the archive. 13 November 2020, 14:26:15 UTC
20d3f8e backfill: only flush the journal writer on every batch This module's use of write_addition predated the introduction of reliable writing in swh.journal; Since this introduction, the backfiller has been flushing the kafka writer after writing each single object, leading to a 3x measured slowdown on backfilling contents. 13 November 2020, 10:17:31 UTC
248a04b Don't use string expansions in debug logging 12 November 2020, 17:07:14 UTC
3eba73d migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing as this happens quite often and isn't an error. 09 November 2020, 15:32:20 UTC
f3652a9 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. 09 November 2020, 15:32:20 UTC
c0a3d96 migrate_extrinsic_metadata: use the retry proxy Because it makes a lot of get requests and doesn't handle failures, it crashed often. 09 November 2020, 15:32:20 UTC
aded45b Make the retry proxy work on all functions. The metadata migration script kept crashing otherwise. 09 November 2020, 15:32:20 UTC
2e7d489 Set the value_sanitizer argument of get_journal_writer. The next version of swh-journal will remove the default value. 09 November 2020, 15:32:20 UTC
24cdc85 cassandra: Fix content_missing_per_sha1_git implementation 09 November 2020, 13:13:10 UTC
84984a6 algos.snapshot.snapshot_resolve_alias: Don't return the branch list. It complicates the signature and the code, and we don't have any use for it currently. 05 November 2020, 11:08:13 UTC
fa86834 Add test for snapshot_resolve_alias with a missing branch. 05 November 2020, 10:57:08 UTC
1826b2b Simplify algos.snapshot.snapshot_resolve_alias. 1. rename branch_info to last_branch 2. exclude the last_branch from 'branches', so that: a) it never needs to contain a None value, so we don't need a cast b) no need for slicing 05 November 2020, 10:57:08 UTC
943e440 Rename the `id` argument of raw_extrinsic_metadata_get to `target` Consistently with the new name for this attribute in swh.model 0.7.2. 03 November 2020, 14:59:52 UTC
48b6dbe cassandra/in_memory: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:22 UTC
4fbf481 PostgreSQL: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:16 UTC
8b18155 algos/snapshot: Add function to resolve branch alias to real target Related to T2734 03 November 2020, 11:49:07 UTC
6e3e350 migrate_extrinsic_metadata: Write metadata on directories instead of revisions. To match the new behavior of package loaders. 29 October 2020, 10:14:48 UTC
97d0b05 pre-commit: Fix codespell regexp related error 27 October 2020, 15:43:36 UTC
9645aef Replace RawExtrinsicMetadata `id` attribute with `target`. The old attribute was deprecated in swh.model 0.7.2 27 October 2020, 14:37:13 UTC
5819683 Update swh.storage.validate for swh.model 0.7.2 swh.model.model.ModelObject.compute_hash was changed to a method instead of a staticmethod. 27 October 2020, 13:11:34 UTC
4f35f7f Add black change on swh.storage.backfill 27 October 2020, 12:47:33 UTC
474ee72 --amend 22 October 2020, 20:35:32 UTC
eb3952f migrate_extrinsic_metadata: Make pypi_origin_from_filename fix project names when possible using PyPI's API. 22 October 2020, 14:23:30 UTC
b1a3b80 migrate_extrinsic_metadata: move pypi_origin_from_filename to its own function. Instead of bloating handle_row, which is already way too long. 22 October 2020, 14:22:46 UTC
aeb72c7 migrate_extrinsic_metadata: add support for guix revisions 22 October 2020, 10:32:48 UTC
73dc5e3 migrate_extrinsic_metadata: allow deposits with 'id' missing from their metadata. 22 October 2020, 10:28:18 UTC
c483066 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with an hex digit from 0 to 5. 22 October 2020, 10:27:16 UTC
2bfd9fe storage.pytest_plugin: Reuse swh.core.db.db_utils postgresql_fact 22 October 2020, 09:58:03 UTC
d93429f api.server: Add missing coverage on make_app_from_configfile factory This is actually what starts the server, so it sounds more reasonable to test that part. 19 October 2020, 13:08:01 UTC
ca8e6aa api.server: Drop the % in the error message 19 October 2020, 13:08:01 UTC
49d787c storage.api.server: Add type to load_and_check_config then refactor tests This also drops the type parameter from load_and_check_config which is never used. 16 October 2020, 13:57:11 UTC
1a9687f backfill: use get_journal_writer instead if instantiating JournalWriter directly. A future version of swh-journal will add a mandatory argument to JournalWriter, whic get_journal_writer sets by default. 12 October 2020, 17:21:09 UTC
b425b5c migrate_extrinsic_metadata: add support for the new deposit metadata formats introduced in late september. * https://forge.softwareheritage.org/D4065 * https://forge.softwareheritage.org/D4105 12 October 2020, 13:07:27 UTC
a11d58a Remove a bunch of deprecated instances of `args` in configurations Notably, `get_objstorage`'s `args` has been deprecated as of swh.objstorage 0.2.2. 09 October 2020, 15:29:10 UTC
a085b7e backfill: use the common `storage` top-level config key This makes the backfiller configuration compatible with all other modules. 08 October 2020, 18:35:49 UTC
dceeb74 backfill: support arbitrary journal writer configuration This allows more settings than the previous hardcoded three, e.g. the `privileged` flag to backfill a journal containing anonymous topics. 08 October 2020, 18:35:49 UTC
a6af589 retry: don't retry on keyboardinterrupt. Otherwise, Ctrl-C is ignored if pressed while sending a request. 02 October 2020, 09:07:09 UTC
889bd87 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with 0, 1, or 2. 02 October 2020, 09:07:09 UTC
9ddbb69 migrate_extrinsic_metadata: allow dash in deposit client and collection names. 02 October 2020, 09:07:09 UTC
59e7e68 migrate_extrinsic_metadata: update name of column deposit.swhid_context. It was renamed in 4d72d1be529a568784842f5c0864e862a4b4705c. 02 October 2020, 09:07:09 UTC
07df3f6 migrate_extrinsic_metadata: Add support for the current format of original_artifacts written by the CRAN loader. 02 October 2020, 09:07:09 UTC
bef08d6 Fix object_types default in buffer interface protocol and impls Default argument object_types was not properly declared in StorageInterface and concrete implmentations PostgreSQL and Cassandra. Reverted unnecessary fix in storage tests. 30 September 2020, 09:26:12 UTC
40997c0 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with 0 or 1. 29 September 2020, 12:36:24 UTC
e37c8f7 Pin black in tox to the same version as .pre-commit-config.yaml 28 September 2020, 13:37:43 UTC
c812c79 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. 26 September 2020, 06:04:52 UTC
0adb8fc Add a regression test for the buffer proxy default settings This is used by swh.loader.core, regressed in v0.15.0 but wasn't caught by local tests. 25 September 2020, 15:14:16 UTC
dd5fb8d Drop vcversioner from requirements We stopped using it months ago. 25 September 2020, 15:14:16 UTC
632e99e Add static check to object_type literals in buffers 25 September 2020, 12:23:30 UTC
a75c5ca Improve typing of the buffering interface - use more generic collection types, so that parametrized types can be made stricter (e.g. str, in the next revision) - remove Optionals that are not needed and provide better defaults 25 September 2020, 12:23:30 UTC
e8f1136 Run isort after the CLI import changes 25 September 2020, 12:19:21 UTC
ac3c537 Update sql paths for the moved SQL files This should fix the currently failing documentation build. 24 September 2020, 18:08:56 UTC
96be9bd Fix default value handling in constructor Use a more simple default value and do not identity check against it. 24 September 2020, 16:33:09 UTC
829118a Add the SQL commands used to set up the logical replication publication 24 September 2020, 11:57:18 UTC
5d3de06 Support different database flavors in the SQL scripts This uses a new database table and some psql conditionals to introduce three different flavors for the swh.storage Postgres database: - the 'default' flavor has all the deduplication features, foreign keys and read indexes - the 'mirror' flavor has all the deduplication features and read indexes; it drops some foreign keys to allow for out of order addition of some object types - the 'read_replica' flavor has the minimal set of indexes to support read queries, and replication using the PostgreSQL logical replication feature Related to T2604. 24 September 2020, 11:57:14 UTC
63426e6 pytest_plugin: Use psql to load SQL files instead of connecting with psycopg2 This avoids running into issues when the SQL files contain psql-specific features like backslash-escapes. 24 September 2020, 11:54:38 UTC
38b1dbf Output a warning when the version of the database is different than expected 24 September 2020, 11:54:38 UTC
e37f639 Improve code quality and doc in BufferedProxyStorage - better names related to the object buffers - extracted parameter dicts from the constructor - used more generic typing in function parameters and more specific in other contexts in order to apply the principle of robustness 23 September 2020, 22:21:54 UTC
c97b23b Adapt cli declaration entrypoint to swh.core 0.3 23 September 2020, 14:13:01 UTC
924621f pytest_plugin: Order the fixture definitions in dependency order 23 September 2020, 10:30:55 UTC
6286e18 pytest_plugin: Change dbname to storage to avoid clash in tests Other similar fixtures in other modules which use the same "tests" db already. Clash can then happen when table names exists in different modules (e.g. dbversion exist both in scheduler and storage dbs). 23 September 2020, 10:28:52 UTC
8c44a29 pytest_plugin: Reuse swh_storage_postgresql connection string The `swh_storage_postgresql.dsn` string already contains the connection information necessary for the tests to run. 23 September 2020, 10:27:02 UTC
30cdb78 Drop the -swh- part of sql files it does not bring any meaningful info and makes it somewhat inconsistent with the new -superuser- "tag". 22 September 2020, 08:18:07 UTC
915575d Rename 10-swh-init.sql as 10-superuser-init.sql so the db initialization from swh.core (>= 0.3) executes this during the database creation step (i.e. while having a superuser level connection to the database). 22 September 2020, 08:14:15 UTC
67ee86b Warn about skipped_content sneaking the 'content' topics 21 September 2020, 08:43:00 UTC
8de6564 Small fix in the graph replayer to prevent a wrong warning 18 September 2020, 14:30:37 UTC
b0027ab python: Reorder imports with isort Related to T2610 17 September 2020, 16:06:07 UTC
d27a046 pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 16:06:06 UTC
469c38c Make origin_add() handle multiple occurences of an origin properly this is needed to prevent some traceback in case an origin is present several times in the same origin_add batch, which situation has been seen in some mirror tests. 17 September 2020, 14:47:27 UTC
37ce2c4 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:57:15 UTC
back to top