https://github.com/SoftwareHeritage/swh-storage

sort by:
Revision Author Date Message Commit Date
374e01c algos.diff: Add missed revision_get conversion This fixes by doing the minimum required so the diff module still works and fix [1]. Will release release 0.14.1 asap. [1] https://forge.softwareheritage.org/harbormaster/unit/view/918517/ Related to T645 04 September 2020, 13:37:24 UTC
356eacd Refactor revision_get storage API to return Revision objects The signature becomes revision_get(...) -> List[Optional[Revision]] Related to T645 03 September 2020, 15:05:11 UTC
36d284c cassandra: Discard Content ctime field in content_get_partition That field should only be returned by content_find method. Add tests to check expected behaviors according to methods. 02 September 2020, 13:10:57 UTC
e6fcfb9 storage*: release_get(...) -> List[Optional[Release]] Related to T645 01 September 2020, 12:19:19 UTC
e6c17f6 Make StorageInterface a Protocol. We're already using 'StorageInterface' as type annotation where we expect an object that implements the same methods, but it isn't technically correct since these objects aren't an instance of a subclass. Making StorageInterface a protocol reflects how we already use it, without lying to mypy. 01 September 2020, 08:38:36 UTC
5afd985 Add a validating storage proxy, to check ids before insertion. It can be used to prevent buggy or malicious clients of the storage from adding invalid objects. 28 August 2020, 11:18:53 UTC
4532a4d Add a --check-config option for cli commands this allows to specify on command line wether to check the configuration and read or write access to the storage at startup time (especially for `rpc-server` and `replay` commands.) Warning: this option defaults to "read" for the backfill command and "write" for rpc-serve and replay commands, so now the creation of the Storage instance used in cli commands *will be checked*. Closes T2525. 25 August 2020, 08:50:50 UTC
2a35c0b Remove the deprecated config-path option from `swh storage rpc-serve` command include the validation of the presence of a "storage" config section in the main `storage` click.group, where the config-file is actually parsed. 25 August 2020, 08:50:50 UTC
e8b1b21 Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. Before: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 6.93s user 0.51s system 100% cpu 7.425 total After: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 2.39s user 0.10s system 100% cpu 2.475 total 25 August 2020, 08:01:20 UTC
f3870dc 161: Fix sql upgrade script Related to T2524 25 August 2020, 07:35:31 UTC
cc33dd3 Add support for a new "check_config" config option in get_storage() if "check_config" is present in get_storage()'s kwargs, call the storage.check_config() method and raise an exception is the result fails. The storage.check_config() method is called with the content of the "check_config" config value (so expected value for this currently is `check_config={"check_write": <bool>}`). 24 August 2020, 14:58:17 UTC
4dd9597 Check for db version mismatch in PgStorage.check_config() 24 August 2020, 14:58:17 UTC
c16ff50 Add a check_dbversion() method to the Db class This method compares the version stored in the database in the `dbversion` table with the currently declared version. This current version is declared as a simple `current_version` attribute on the Db class. **It must be updated jointly in 30-swh-schema.sql** when modifying the db schema. Note that if this is forgotten, the added test `test_dbversion` should fail. Related to T2525. 24 August 2020, 14:58:17 UTC
629d2d1 Fix pytest_plugin's database janitor: do not truncate the dbversion table 24 August 2020, 14:58:17 UTC
f570f93 algos.snapshot: Add visits_and_snapshots_get_from_revision Its code is moved from snapshot_id_get_from_revision so it's a rather small change; and the revision metadata migration script (bin/migrate-extrinsic-metadata.py) will need it. 24 August 2020, 13:55:00 UTC
d1f19e9 storage/interface: Remove deprecated diff endpoints They are not of interest anymore as swh-web is the only client and now directly uses functions defined in swh.storage.algos.diff. 20 August 2020, 15:15:15 UTC
5390a4c storage_tests: Remove duplicated postgresql-specific tests. They got copied to test_postgresql.py instead of moved. 20 August 2020, 14:04:30 UTC
3ac332e tests: Fix failures after test_storage renaming to storage_tests Also move the round_to_milliseconds function to the storage.utils module. 20 August 2020, 11:43:01 UTC
4073907 Move postgresql-related files to swh/storage/postgresql/ 19 August 2020, 15:17:35 UTC
6a53cb3 pg: Check revision.extra_headers is not null. Fixes a regression in 038a219f84d6b8a4f02b48f9ad3c5d823d097790, as it made the converter expect a list. When adding this column, we made it default to null instead of defaulting to an empty array, so existing records were initialized will null. This commit migrates these nulls to empty arrays, then adds a constraint to enforce it in the future. 19 August 2020, 12:57:26 UTC
ca3ee92 converters: convert extra_headers to an empty list if it is None. 19 August 2020, 12:57:23 UTC
e2b1494 pg: Make date_neg_utc_offset is not null if date is not null. Fixes a regression in 038a219f84d6b8a4f02b48f9ad3c5d823d097790, as it made converters expect a boolean. We stopped writing this kind of nulls since we started using model objects for insertions, but didn't migrate existing data. This commit migrates these nulls to false, then adds a constraint to enforce it in the future. 19 August 2020, 12:57:21 UTC
7dcd570 converters: convert neg_utc_offset to False if it is None. 19 August 2020, 12:57:18 UTC
e2f0665 backfiller: Add missing 'extra_header' field. This field wasn't backfilled; and it wasn't caught by tests, because revisions in swh.journal.tests.journal_data.TEST_OBJECTS were had an empty extra_header field. Starting with swh-journal v0.4.3, one of the revisions in TEST_OBJECTS has a non-empty extra_header field, so it will become a test failure without this commit 19 August 2020, 12:55:22 UTC
bd92547 backfiller: remove convertion of model objects back to dicts. As a temporary workaround to remain compatible with existing backfiller converters, I made them convert back to dict before they are converted again to model objects. This commit removes this workaround by making converters return model objects. 17 August 2020, 08:45:59 UTC
038a219 converters: Work on model objects instead of dicts on the "not-db" side. This is a change internal to the pg storage, that will be needed to make revision_get, revision_log, and release_get return model objects. 17 August 2020, 08:25:42 UTC
89656c9 test_converters: Fix test data to match actual values. 17 August 2020, 07:52:10 UTC
3a713dd cassandra.cql: Use a dict of statements instead of dynamically building method names in the two methods which need to switch between statements. The method name building was done to shoehorn this statement switching into the existing @_prepared_select_statement. This introduces @_prepared_select_statements (plural), which does this switching properly, using a dictionary. 14 August 2020, 14:35:09 UTC
546d11e cassandra.cql: Make the 'limit' argument of origin_visit_get non-optional. It's not optional in the storage interface, so a None value can't be passed. 14 August 2020, 14:35:09 UTC
1996b49 in_memory: Remove dead code. 14 August 2020, 14:35:09 UTC
291704d in_memory: Remove InMemoryStorage.*metadata_* and implement InMemoryCqlRunner.*metadata_* 14 August 2020, 14:35:09 UTC
da35e56 in_memory: Remove InMemoryStorage.origin_visit_* and implement InMemoryCqlRunner.origin_visit_* 14 August 2020, 14:35:09 UTC
e5f450c cassandra.cql: reorder origin_visit_* and origin_visit_status_* methods to be properly grouped. 14 August 2020, 14:35:09 UTC
249e4af Remove unused arguments of CqlRunner.origin_visit_status_get. 14 August 2020, 14:35:09 UTC
e1eb6cd in_memory: Remove InMemoryStorage.origin_* and implement InMemoryCqlRunner.origin_* 14 August 2020, 14:35:09 UTC
f78c76f in_memory: Remove InMemoryStorage.snapshot_* and implement InMemoryCqlRunner.snapshot_* 14 August 2020, 14:35:08 UTC
6651130 Remove endpoint snapshot_get_by_origin_visit. It's not used anywhere. 14 August 2020, 14:34:55 UTC
1104c53 in_memory: Remove InMemoryStorage.release_* and implement InMemoryCqlRunner.release_* 14 August 2020, 14:34:55 UTC
237c400 in_memory: Remove InMemoryStorage.revision_* and implement InMemoryCqlRunner.revision_* 14 August 2020, 13:16:29 UTC
8e7eed4 in_memory: Remove InMemoryStorage.directory_* and implement InMemoryCqlRunner.directory_* 14 August 2020, 13:16:29 UTC
d5f41f8 in_memory: Remove InMemoryStorage.skipped_content_* and implement InMemoryCqlRunner.skipped_content_* 14 August 2020, 13:16:29 UTC
b3af39a in_memory: Remove InMemoryStorage.content_* and implement InMemoryCqlRunner.content_* 14 August 2020, 13:16:29 UTC
397a645 in_memory: make object_find_by_sha1_git merge results from the CassandraStorage. For now this has no effect. However, in the near future, the CassandraStorage will be in charge of some object types, so we need to merge objects found in the CassandraStorage and those found directly in the InMemoryStorage. 14 August 2020, 13:16:29 UTC
a96c253 in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra. For now it's only used for object counters; but future commits will progressively move the in-memory's storage features to it. 14 August 2020, 13:16:29 UTC
bc47283 Make InMemoryStorage inherit from CassandraStorage. This has no effect for now, other than deduplicating a method and causing a name clash. 14 August 2020, 13:16:29 UTC
2097186 in_memory: Add class Table, which emulates a Cassandra table. It will be used to implement the in-memory storage as a backend for the cassandra storage. 14 August 2020, 13:16:29 UTC
ef06005 cassandra.cql: Fix return type of stat_counters. 14 August 2020, 13:16:29 UTC
1266b6a cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes. They will be used by the in-mem implementation of CqlRunner. 14 August 2020, 13:16:29 UTC
3dc69aa cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last. This fixes a mismatch in behavior with the pg and the in-mem storages 14 August 2020, 13:16:29 UTC
006eeec cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing. 14 August 2020, 13:16:29 UTC
da28731 cassandra: Fix content_missing_per_sha1 when its parameter has length != 1. 14 August 2020, 13:16:29 UTC
6675286 cassandra.cql: Explicitly request columns, instead of 'SELECT *'. 'SELECT *' should be avoided in prepared statements, see https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#avoid-preparing-select-queries 11 August 2020, 10:22:15 UTC
0b78b39 cassandra: add a TABLE class attribute to row classes, and use it to deduplicate prepared statements logic. It will also be used in a future commit to generate 'select' prepared statements. 11 August 2020, 09:37:13 UTC
92e7a21 cassandra: Add annotations to make mypy actually type-check calls to CqlRunner. All methods of CqlRunner were decorated, which prevented mypy from doing anything useful. As I finally found a way to type the decorator (using mypy_extensions.NamedArg), I can finally make mypy aware of the methods' types. This commit (as well as all three of the last commits) also fixes issues found by mypy thanks to this. 11 August 2020, 09:19:57 UTC
b11b890 cassandra.storage: remove dead code 11 August 2020, 09:13:55 UTC
f954714 Fix type of snapshot_count_branches. 11 August 2020, 09:13:30 UTC
319de05 cassandra.cql: Use static dataclasses instead of generating namedtuples on the fly. Before this commit, python-cassandra used the default row factory, which creates anonymous named tuple on each query, which makes it impossible to type CqlRunner properly. This commit replaces the row factory with dict_factory, which creates only dicts, and converts them to well-defined dataclasses. Additionally, this stop leaking python-cassandra internals to cassandra.storage. This also has some great side-effects: * methods of CqlRunner are now consistent with each other (eg. _add_one methods used to be a mix of objects, dictionaries, and taking each value as argument) * it will allow me to deduplicate more codes in further commits (I already deduplicated insertions methods to use self._add_one, as it was meant on the initial write of this class) * CqlRunner no longer needs to define lists with column names, they are automatically detected from the dataclasses 10 August 2020, 19:48:37 UTC
7d332f5 algos/snapshot.py: add types. 10 August 2020, 08:59:21 UTC
bdd2f4a pg: Fix crash in snapshot_get when the snapshot does not exist. Reviewers: #reviewers Differential Revision: https://forge.softwareheritage.org/D3743 07 August 2020, 19:46:39 UTC
4918759 Make snapshot_get_branches return a TypedDict containing SnapshotBranch objects. Instead of untyped dictionaries. This makes snapshot_get longer and duplicates its code across backends; but snapshot_get should be removed soon. Related: T645 07 August 2020, 13:51:26 UTC
d9ff391 storage*: Rename and type content_get(List[Sha1]) -> List[Optional[Content]] Related to T645 07 August 2020, 10:24:41 UTC
bfa8f46 storage*: Rename content_get_data(Sha1) -> Optional[bytes] Rename the confusing endpoint `content_get` to `content_get_data`. This now works on one content as it is how it is used today. Related to T645 07 August 2020, 10:21:36 UTC
243e873 Simplify as Content.ctime None is popped out of a to_dict call in recent model since model 0.6.6 (similarly to the data field) Fixes build [1] [1] https://jenkins.softwareheritage.org/job/DSTO/job/tests/1534/console 07 August 2020, 10:21:14 UTC
653b1f9 cassandra.storage: Use next token for pagination instead of computing it The existing implementation computed the next token using the tok (adding 1). It's not good enough as it would skip some contents in case of collision on tok (collisions exist as the tok here is a noncryptographic hash on 32 bits). Related to T2518 06 August 2020, 16:50:27 UTC
be9e958 in_memory: Drop dead code Related to T645 05 August 2020, 18:01:20 UTC
0d72ea2 storage*: content_get_partition(...) -> PagedResult[Content] Related to T645 05 August 2020, 15:01:24 UTC
b48d834 storage*: Drop deprecated content_get_range endpoint Related to T645 05 August 2020, 13:19:59 UTC
4722663 storage*: object_find_by_sha1_git: Type remaining existing endpoints Related to T2517 05 August 2020, 11:06:58 UTC
334a016 storage*: snapshot_get_branches: Type remaining existing endpoints Related to T2517 05 August 2020, 10:47:46 UTC
27c7f07 storage*: snapshot_count_branches: Type remaining existing endpoints Related to T2517 05 August 2020, 10:35:31 UTC
ec620af storage*: snapshot_get_by_origin_visit: Type remaining existing endpoints Related to T2517 05 August 2020, 10:29:24 UTC
7c6c088 storage*: snapshot_get: Type remaining existing endpoints Related to T2517 05 August 2020, 10:23:47 UTC
ec4aed4 storage*: snapshot_missing: Type remaining existing endpoints Related to T2517 05 August 2020, 10:23:11 UTC
7dbd64d storage*: release_get: Type remaining existing endpoints Related to T2517 05 August 2020, 10:23:07 UTC
5f6630a storage*: release_missing: Type remaining existing endpoints Related to T2517 05 August 2020, 10:23:03 UTC
c5d63ad storage*: origin_get_by_sha1: Drop generator from pgstorage And simplify its type Related to T645 05 August 2020, 09:12:56 UTC
760cbf6 storage*: revision_*log: Type remaining existing endpoints Related to T645 05 August 2020, 09:12:56 UTC
38ee525 storage*: revision_get: Type remaining existing endpoints Related to T645 05 August 2020, 06:18:28 UTC
8b6d18e storage*: revision_missing: Type remaining existing endpoints Related to T645 05 August 2020, 06:11:04 UTC
9f214bc storage*: directory_entry_get_by_path: Type remaining existing endpoints Related to T645 04 August 2020, 21:04:26 UTC
f9d0952 storage*: directory_ls: Type remaining existing endpoints Related to T645 04 August 2020, 20:43:23 UTC
fd5fd86 storage*: directory_missing: Type remaining existing endpoints Related to T645 04 August 2020, 20:27:45 UTC
5d13cd7 storage*: skipped_content_missing: Type remaining existing endpoints Related to T645 04 August 2020, 20:17:00 UTC
1a2aa70 storage*: content_missing_per_sha1_git: Type remaining existing endpoints Related to T645 04 August 2020, 19:42:19 UTC
15e4863 storage*: content_missing_per_sha1: Type remaining existing endpoints Related to T645 04 August 2020, 19:41:36 UTC
b62afbb storage*: content_missing: Unify and type remaining existing endpoints This updates the docstrings as well Related to T645 04 August 2020, 16:52:00 UTC
d6f26e4 storage*: content_get_partition: Type remaining existing endpoints Related to T645 04 August 2020, 16:51:59 UTC
8644733 storage*: content_get_range: Type remaining existing endpoints Related to T645 04 August 2020, 16:51:59 UTC
c6da282 storage*: content_get: Type remaining existing endpoints Related to T645 04 August 2020, 16:51:59 UTC
25ebc48 storage*: content_update: Type remaining existing endpoints Related to T645 04 August 2020, 16:51:59 UTC
c32e224 storage*: origin_get_by_sha1: Type remaining existing endpoints Related to T645 04 August 2020, 16:39:48 UTC
26ef015 storage*: check_config: Type remaining existing endpoints Related to T645 04 August 2020, 16:00:30 UTC
a5232b7 tests: Improve coverage on directory_ls endpoints This fixes the current directory listing tests coverage to check down to the contents. This also fixes one inconsistent test data and the tests impacted by this change. 04 August 2020, 11:34:48 UTC
15e8c99 storage*: Type content_find(...) -> List[Content] Related to T645 04 August 2020, 09:22:45 UTC
3c2e5a3 storage*: Type {cnt,dir,rev,rel,snp}_get_random(...) -> Sha1Git Related to T645 03 August 2020, 14:23:44 UTC
aa58e10 storage*: Drop origin-get-range in favor of origin-list Related to T645 03 August 2020, 09:39:58 UTC
87c5ba2 storage*: Do not allow unknown visit status in origin_visit*_get_latest That makes some storage (pg-storage) fail without that filtering. 01 August 2020, 07:07:12 UTC
92f1183 storage*: Add type annotation to origin_count Related to T645 31 July 2020, 12:51:14 UTC
3466e48 Reuse swh.core stream_results function Related to T645 31 July 2020, 12:15:51 UTC
0eb309e Rename argument 'object_type' of raw_extrinsic_metadata_get to 'type'. For consistency with RawExtrinsicMetadata. 31 July 2020, 11:00:59 UTC
back to top