https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
7e25bb8 New upstream version 0.25.0 18 March 2021, 13:02:00 UTC
8dd9f7b Document the existing metadata formats 15 March 2021, 14:59:07 UTC
ffc0841 content_add: Write to the objstorage before the DB or Kafka Must add to the objstorage before the DB and journal. Otherwise: 1. in case of a crash the DB may "believe" we have the content, but we didn't have time to write to the objstorage before the crash 2. the objstorage mirroring, which reads from the journal, may attempt to read from the objstorage before we finished writing it This is already done in the postgresql backend unintentionally since 209de5dbaa127dacd114fbbd084f22632982eb77. This commit documents it, makes the cassandra backend behave that way too, and adds a test. 15 March 2021, 11:55:29 UTC
b565201 storage: Allow to filter out branches by prefix when counting them Add an optional branch_name_exclude_prefix parameter to the snapshot_count_branches method of the Storage interface. It enables to filter out branches whose name starts with a given prefix when counting. The purpose is to get accurate counters in swh-web as pull request branches will be filtered out by default. Related to T2782 12 March 2021, 14:23:54 UTC
93301a1 storage: Add branch names filtering support in snapshot_get_branches Add optional branch_name_include_substring parameter to snapshot_get_branches, if provided only branches whose name contains the given substring will be returned. Add optional branch_name_exclude_prefix parameter to snapshot_get_branches, if provided branches whose name starts with the given prefix will not be returned. Purpose of these new features: add a search form in the branches view of swh-web and filter out pull request branches (whose names start with "refs/pull/") by default. Related to T2782 12 March 2021, 14:23:28 UTC
b8e10f0 Add ExtID query support to the Storage These endpoints allow to add and query the storage for known ExtID from SWHID (typically get original VCS' revision intrinsic identifier from SWHID). The underlying data structure is to be filled typically by loaders using the `extid_add()` endpoint. This only provides the Postgresql implementation. Related to T2849. 11 March 2021, 13:20:18 UTC
6a77732 Add hg revisions to the test data set 10 March 2021, 15:25:00 UTC
e83452b Import TEST_OBJECTS from swh.model instead of swh.journal this later has been deprecated for a while now. 10 March 2021, 15:25:00 UTC
82ce7bf Make sure test_backfill does not depend on 2 dict keys being miraculously listed the same. 10 March 2021, 14:49:48 UTC
c4fdd6d Add support for raw_extrinsic_metadata in the replayer This also checks the basic raw_extrinsic_metadata codepaths in the backfiller tests. 10 March 2021, 13:07:11 UTC
53a58fa Add basic support for raw_extrinsic_metadata in the backfiller 10 March 2021, 13:00:05 UTC
89ae0a1 Add simple unit test for the backfill.byte_ranges function 10 March 2021, 08:34:27 UTC
0d785d2 Add support for reading RawExtrinsicMetadata with raw URL targets We convert the target attribute to a hashed ExtendedSWHID before returning the object. 10 March 2021, 08:33:53 UTC
b4574cb New upstream version 0.24.1 04 March 2021, 22:39:01 UTC
88ff2c2 postgresql: Ensure a minimum limit for the snapshot branches query With small limits (< 10), the snapshot branches query can degenerate into using the deduplication index on snapshot_branch (name, target, target_type), and the postgresql planner happily scans several hundred million rows. So ensure a minimum limit value of 10 before executing the query for optimal performances when a small branches_count value is provided to the snapshot_get_branches method of the Storage interface. Related to P966 03 March 2021, 16:49:20 UTC
ce8335d Remove the remaining references to the deprecated SWHID class 03 March 2021, 16:46:50 UTC
f46244b tests: Drop hypothesis < 6 requirement Ensure tests can be executed using hypothesis >= 6 by suppressing the function_scoped_fixture health check on tests that use a function scope fixture in combination with @given that does not need to be reset between individual hypothesis examples. 03 March 2021, 10:53:08 UTC
fd0efad New upstream version 0.24.0 02 March 2021, 09:11:13 UTC
14739c5 RawExtrinsicMetadata: update to use the API in swh-model 1.0.0 01 March 2021, 16:38:44 UTC
2388748 storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released. 25 February 2021, 15:33:40 UTC
f56267f New upstream version 0.23.2 19 February 2021, 10:58:48 UTC
f3ef6e6 storage: Implement visit types filtering in origin_search method Enable to filter searched origins by visit types. Add a new optional visit_types parameter to origin_search method in StorageInterface. Implement visit types filtering in storage backends, an origin wil be returned if it has any of the requested visit types. This is clearly not designed to be used in production due to performance issues but rather in testing environments with small archive dataset. Related to T2869 19 February 2021, 10:36:29 UTC
7b4c124 167: Make the migration script unblocking 17 February 2021, 09:18:26 UTC
f7f161d New upstream version 0.23.1 16 February 2021, 16:28:23 UTC
cc3eb4b Switch anonymized replayer test to use pytest parametrization This allows us to only read the kafka topics once instead of twice in the same tests, which is apparently a hard thing to do in a way compatible with both confluent-kafka 1.5 and 1.6. 16 February 2021, 16:09:03 UTC
5c6b53c New upstream version 0.23.0 15 February 2021, 14:39:02 UTC
e0e88b2 storage: Refactor OriginVisitStatus instantiation 09 February 2021, 16:01:26 UTC
d30ca93 db: Unify sql joins on origin_visit_status using "USING" 09 February 2021, 16:01:26 UTC
046fe57 storage.postgresql: Use origin_visit_status.type value as source This stops using the origin_visit.type as fallback values as now, the database has been migrated. So this makes the origin_visit_status.type a not nullable column. This also drops now redundant join instructions on origin_visit table when reading. Related to T2968 09 February 2021, 16:01:25 UTC
51df58e test_replay: Fix hang since confluent-kafka 1.6 release Side effect of the following commit in librdkafka 1.6: https://github.com/edenhill/librdkafka/commit/f418e0f721518d71ff533759698b647cb2e89b80 Tests was relying on a buggy behavior of the mocked kafka cluster: two subsequent consumers setup with the same group id should receive a different set of messages, rather than the same set of messages. Also explicitly commit messages once consumed. 09 February 2021, 14:56:15 UTC
b038383 postgresql: Fix dbversion() to return the max version instead of a random one. 08 February 2021, 11:13:03 UTC
efd8815 buffer: ensure objects are flushed in topological order This new integration test checks that, when flushing the buffer storage, the addition functions of the underlying storage backend are called in topological order (content, directory, revision, release then snapshot). This reduces the probability of "data consistency" regressions caused by the use of the buffering storage proxy alone. 04 February 2021, 18:17:11 UTC
1526107 Return an accurate summary from buffer's flush() method The earlier implementation would only return summary data from keys that existed in the last `_add` backend method run, rather than collating all the results. 04 February 2021, 18:14:03 UTC
5b3e6c9 buffer: add support for snapshots This is mostly a consistency addition, considering that most (if not all) loaders will only add a single snapshot. The common pattern of loading objects in topological order (content > directory > revision > release > snapshot), then flushing the storage, is now fully consistent; Without this addition, the snapshot addition would reach the backend storage before all other objects are added, leading to potential inconsistencies if the flush of other object types fails. 04 February 2021, 13:37:12 UTC
18967ed buffer: add type annotations for tests 04 February 2021, 09:19:34 UTC
f1e523e New upstream version 0.22.0 03 February 2021, 11:15:26 UTC
9a9f234 storage: Make origin_get_latest_visit_status return OriginVisitStatus This returned a Tuple[OriginVisit, OriginVisitStatus]. This was required to have the missing information "type" for visit-status. This is no longer needed as now OriginVisitStatus holds the type information. 01 February 2021, 11:06:35 UTC
626b0bf Change origin_visit_status_get_random interface to return visit_status This returned a Tuple[OriginVisit, OriginVisitStatus] which is no longer needed as now OriginVisitStatus held the type information now. 01 February 2021, 11:06:34 UTC
f6ae8a0 Write introduction to swh-storage. Explains: * when to use swh-web instead * that `get_storage` should always be used to instantiate the storage * `StorageInterface` * model objects * pagination * backends 01 February 2021, 11:03:02 UTC
57d3066 New upstream version 0.21.1 28 January 2021, 13:19:21 UTC
76de53c Correctly return origin_visit_status.type value everywhere If the type is not present on an origin_visit_status, it should be computed from the origin_visit. There were some methods which only return the origin_visit_status value. It breaks the webapp mangling the type to empty value on the search result page. Related to T3001 28 January 2021, 11:15:11 UTC
47e0a4c New upstream version 0.21.0 20 January 2021, 14:52:18 UTC
e433255 db: Allow new status values not_found, failed to OriginVisitStatus Related to T2961 20 January 2021, 14:36:12 UTC
45803cf New upstream version 0.20.0 20 January 2021, 09:29:52 UTC
d04165f Add type to the origin_visit_status topic useful when the type is not yet populated in the database Related to T2966 18 January 2021, 10:49:34 UTC
c24d35f Add persistence of the field OriginVisitStatus.type (!) A new database upgrade is needed (165.sql) for postgresql backend Related to T2964 15 January 2021, 11:38:38 UTC
da55308 Make test_content_add_race fail for the right reason. Since 209de5dbaa127dacd114fbbd084f22632982eb77, it was failing because of: TypeError("content_add() got an unexpected keyword argument 'db'") 15 January 2021, 10:30:36 UTC
2204346 New upstream version 0.19.0 14 January 2021, 10:18:30 UTC
0b44b37 Adapt cassandra storage to ignore the new OriginVisitStatus.type field Depends on D4848 Related to T2443 13 January 2021, 10:06:12 UTC
728c3ee Allow to use the JAVA_HOME environment for cassandra tests This allows to enforce a specific version of java to be used. For example, since cassandra seems not to support java 14 yet, this allows to run tests on bullseye: JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ pytest swh 13 January 2021, 09:56:07 UTC
30945a5 Enforce hypothesis <6 to prevent test breakage hypothesis 6 upgraded a warning into an error: now raises a FailedHealthCheck when using a pytest fixture with a @given generative test set. See https://hypothesis.readthedocs.io/en/latest/healthchecks.html 13 January 2021, 09:42:42 UTC
74e6f58 Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list prevent being fooled by a missing '\n'. 08 January 2021, 13:20:08 UTC
2b35198 Add a cli section in the doc 18 December 2020, 12:41:23 UTC
04ae89f storage.backfill: Allow cli run for origin_visit_status as well 24 November 2020, 17:21:21 UTC
64ee845 conftest: Reference swh.core.db.pytest_plugin As it's exposed through the swh.storage.pytest_plugin itself used by other swh modules, this needs to be declared to avoid other swh module build failures. Related to T2746 24 November 2020, 13:08:12 UTC
4c46835 New upstream version 0.18.0 23 November 2020, 13:52:31 UTC
e289593 requirements-test.txt: Drop no longer needed pytest-postgresql requirement requirements-swh.txt already declares the swh.core[db] dependency which transitively pulls it. Related to T2746 23 November 2020, 12:07:45 UTC
0065d4d backfill: Reverse flawed logic in SnapshotBranch generation The previous code would nullify all non-null branches, and try to create a SnapshotBranch out of null branches. 13 November 2020, 15:51:29 UTC
f501136 migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot As this happens for about 50 revisions in the archive. 13 November 2020, 14:26:15 UTC
6089094 New upstream version 0.17.2 13 November 2020, 11:05:33 UTC
20d3f8e backfill: only flush the journal writer on every batch This module's use of write_addition predated the introduction of reliable writing in swh.journal; Since this introduction, the backfiller has been flushing the kafka writer after writing each single object, leading to a 3x measured slowdown on backfilling contents. 13 November 2020, 10:17:31 UTC
248a04b Don't use string expansions in debug logging 12 November 2020, 17:07:14 UTC
3eba73d migrate_extrinsic_metadata: Remove log output when a CRAN origin is missing as this happens quite often and isn't an error. 09 November 2020, 15:32:20 UTC
f3652a9 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. 09 November 2020, 15:32:20 UTC
c0a3d96 migrate_extrinsic_metadata: use the retry proxy Because it makes a lot of get requests and doesn't handle failures, it crashed often. 09 November 2020, 15:32:20 UTC
aded45b Make the retry proxy work on all functions. The metadata migration script kept crashing otherwise. 09 November 2020, 15:32:20 UTC
2e7d489 Set the value_sanitizer argument of get_journal_writer. The next version of swh-journal will remove the default value. 09 November 2020, 15:32:20 UTC
24cdc85 cassandra: Fix content_missing_per_sha1_git implementation 09 November 2020, 13:13:10 UTC
d8a6720 New upstream version 0.17.1 05 November 2020, 12:56:51 UTC
84984a6 algos.snapshot.snapshot_resolve_alias: Don't return the branch list. It complicates the signature and the code, and we don't have any use for it currently. 05 November 2020, 11:08:13 UTC
fa86834 Add test for snapshot_resolve_alias with a missing branch. 05 November 2020, 10:57:08 UTC
1826b2b Simplify algos.snapshot.snapshot_resolve_alias. 1. rename branch_info to last_branch 2. exclude the last_branch from 'branches', so that: a) it never needs to contain a None value, so we don't need a cast b) no need for slicing 05 November 2020, 10:57:08 UTC
57956ce New upstream version 0.17.0 03 November 2020, 17:20:44 UTC
943e440 Rename the `id` argument of raw_extrinsic_metadata_get to `target` Consistently with the new name for this attribute in swh.model 0.7.2. 03 November 2020, 14:59:52 UTC
48b6dbe cassandra/in_memory: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:22 UTC
4fbf481 PostgreSQL: rename raw_extrinsic_metadata.id to target For consistency with swh.model v0.7.2, to prepare for the addition of an (intrinsic) id field to RawExtrinsicMetadata objects. 03 November 2020, 13:56:16 UTC
8b18155 algos/snapshot: Add function to resolve branch alias to real target Related to T2734 03 November 2020, 11:49:07 UTC
6e3e350 migrate_extrinsic_metadata: Write metadata on directories instead of revisions. To match the new behavior of package loaders. 29 October 2020, 10:14:48 UTC
97d0b05 pre-commit: Fix codespell regexp related error 27 October 2020, 15:43:36 UTC
9645aef Replace RawExtrinsicMetadata `id` attribute with `target`. The old attribute was deprecated in swh.model 0.7.2 27 October 2020, 14:37:13 UTC
5819683 Update swh.storage.validate for swh.model 0.7.2 swh.model.model.ModelObject.compute_hash was changed to a method instead of a staticmethod. 27 October 2020, 13:11:34 UTC
4f35f7f Add black change on swh.storage.backfill 27 October 2020, 12:47:33 UTC
474ee72 --amend 22 October 2020, 20:35:32 UTC
eb3952f migrate_extrinsic_metadata: Make pypi_origin_from_filename fix project names when possible using PyPI's API. 22 October 2020, 14:23:30 UTC
b1a3b80 migrate_extrinsic_metadata: move pypi_origin_from_filename to its own function. Instead of bloating handle_row, which is already way too long. 22 October 2020, 14:22:46 UTC
aeb72c7 migrate_extrinsic_metadata: add support for guix revisions 22 October 2020, 10:32:48 UTC
73dc5e3 migrate_extrinsic_metadata: allow deposits with 'id' missing from their metadata. 22 October 2020, 10:28:18 UTC
c483066 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with an hex digit from 0 to 5. 22 October 2020, 10:27:16 UTC
2bfd9fe storage.pytest_plugin: Reuse swh.core.db.db_utils postgresql_fact 22 October 2020, 09:58:03 UTC
d93429f api.server: Add missing coverage on make_app_from_configfile factory This is actually what starts the server, so it sounds more reasonable to test that part. 19 October 2020, 13:08:01 UTC
ca8e6aa api.server: Drop the % in the error message 19 October 2020, 13:08:01 UTC
49d787c storage.api.server: Add type to load_and_check_config then refactor tests This also drops the type parameter from load_and_check_config which is never used. 16 October 2020, 13:57:11 UTC
1a9687f backfill: use get_journal_writer instead if instantiating JournalWriter directly. A future version of swh-journal will add a mandatory argument to JournalWriter, whic get_journal_writer sets by default. 12 October 2020, 17:21:09 UTC
b425b5c migrate_extrinsic_metadata: add support for the new deposit metadata formats introduced in late september. * https://forge.softwareheritage.org/D4065 * https://forge.softwareheritage.org/D4105 12 October 2020, 13:07:27 UTC
aade84f New upstream version 0.16.0 09 October 2020, 16:33:09 UTC
a11d58a Remove a bunch of deprecated instances of `args` in configurations Notably, `get_objstorage`'s `args` has been deprecated as of swh.objstorage 0.2.2. 09 October 2020, 15:29:10 UTC
a085b7e backfill: use the common `storage` top-level config key This makes the backfiller configuration compatible with all other modules. 08 October 2020, 18:35:49 UTC
dceeb74 backfill: support arbitrary journal writer configuration This allows more settings than the previous hardcoded three, e.g. the `privileged` flag to backfill a journal containing anonymous topics. 08 October 2020, 18:35:49 UTC
a6af589 retry: don't retry on keyboardinterrupt. Otherwise, Ctrl-C is ignored if pressed while sending a request. 02 October 2020, 09:07:09 UTC
889bd87 migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames. It now supports all pypi revisions with an id starting with 0, 1, or 2. 02 October 2020, 09:07:09 UTC
back to top