swh:1:snp:eb70f1f85391e4b077c211bec36af0061c4bf937

sort by:
Revision Author Date Message Commit Date
c0440b0 New upstream version 0.15.3 24 September 2020, 18:24:12 UTC
ac3c537 Update sql paths for the moved SQL files This should fix the currently failing documentation build. 24 September 2020, 18:08:56 UTC
20808a4 New upstream version 0.15.2 24 September 2020, 17:28:20 UTC
ee6041c New upstream version 0.15.1 24 September 2020, 16:44:20 UTC
96be9bd Fix default value handling in constructor Use a more simple default value and do not identity check against it. 24 September 2020, 16:33:09 UTC
e6808fa New upstream version 0.15.0 24 September 2020, 15:03:56 UTC
829118a Add the SQL commands used to set up the logical replication publication 24 September 2020, 11:57:18 UTC
5d3de06 Support different database flavors in the SQL scripts This uses a new database table and some psql conditionals to introduce three different flavors for the swh.storage Postgres database: - the 'default' flavor has all the deduplication features, foreign keys and read indexes - the 'mirror' flavor has all the deduplication features and read indexes; it drops some foreign keys to allow for out of order addition of some object types - the 'read_replica' flavor has the minimal set of indexes to support read queries, and replication using the PostgreSQL logical replication feature Related to T2604. 24 September 2020, 11:57:14 UTC
63426e6 pytest_plugin: Use psql to load SQL files instead of connecting with psycopg2 This avoids running into issues when the SQL files contain psql-specific features like backslash-escapes. 24 September 2020, 11:54:38 UTC
38b1dbf Output a warning when the version of the database is different than expected 24 September 2020, 11:54:38 UTC
e37f639 Improve code quality and doc in BufferedProxyStorage - better names related to the object buffers - extracted parameter dicts from the constructor - used more generic typing in function parameters and more specific in other contexts in order to apply the principle of robustness 23 September 2020, 22:21:54 UTC
c97b23b Adapt cli declaration entrypoint to swh.core 0.3 23 September 2020, 14:13:01 UTC
924621f pytest_plugin: Order the fixture definitions in dependency order 23 September 2020, 10:30:55 UTC
6286e18 pytest_plugin: Change dbname to storage to avoid clash in tests Other similar fixtures in other modules which use the same "tests" db already. Clash can then happen when table names exists in different modules (e.g. dbversion exist both in scheduler and storage dbs). 23 September 2020, 10:28:52 UTC
8c44a29 pytest_plugin: Reuse swh_storage_postgresql connection string The `swh_storage_postgresql.dsn` string already contains the connection information necessary for the tests to run. 23 September 2020, 10:27:02 UTC
30cdb78 Drop the -swh- part of sql files it does not bring any meaningful info and makes it somewhat inconsistent with the new -superuser- "tag". 22 September 2020, 08:18:07 UTC
915575d Rename 10-swh-init.sql as 10-superuser-init.sql so the db initialization from swh.core (>= 0.3) executes this during the database creation step (i.e. while having a superuser level connection to the database). 22 September 2020, 08:14:15 UTC
67ee86b Warn about skipped_content sneaking the 'content' topics 21 September 2020, 08:43:00 UTC
8de6564 Small fix in the graph replayer to prevent a wrong warning 18 September 2020, 14:30:37 UTC
8387ef5 New upstream version 0.14.3 17 September 2020, 16:53:52 UTC
b0027ab python: Reorder imports with isort Related to T2610 17 September 2020, 16:06:07 UTC
d27a046 pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 16:06:06 UTC
469c38c Make origin_add() handle multiple occurences of an origin properly this is needed to prevent some traceback in case an origin is present several times in the same origin_add batch, which situation has been seen in some mirror tests. 17 September 2020, 14:47:27 UTC
37ce2c4 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 11:57:15 UTC
f008a59 migrate_extrinsic_metadata: improve pypi_project_from_filename to support suffixes after the version number. 16 September 2020, 15:00:42 UTC
8e32ad0 migrate_extrinsic_metadata: guess PyPI origins. This works by guessing the package name from the original_artifact data, then building an origin that would match the package name, then filtering checking if the revision can be reached from it. 16 September 2020, 15:00:37 UTC
0bbcd91 migrate_extrinsic_metadata.test_pypi: use the in-memory storage instead of mocks in a future commit, migrating pypi revisions will become more interactive with the storage, so it's easier to have a real one instead of a mock. 16 September 2020, 15:00:33 UTC
1676478 migrate_extrinsic_metadata.test_debian: use the in-memory storage instead of mocks in tests that need to read in the storage. Using mocks just makes it more complicated, and we decided not to do that a while ago. 16 September 2020, 15:00:29 UTC
723e728 migrate_extrinsic_metadata: fix crash on dangling branch. 16 September 2020, 15:00:23 UTC
89d23cb migrate_extrinsic_metadata: fix crash when a Debian revision is missing. https://forge.softwareheritage.org/T997 16 September 2020, 15:00:19 UTC
2ad5600 migrate_extrinsic_metadata: guess Debian origins. This works by guessing the package name from the original_artifact data, then building origins that would match the package name, then filtering out origins by checking if the revision can be reached from them. 16 September 2020, 15:00:16 UTC
3b781a8 sql: Make the extra_headers not null a constraint Due to the data volume, the basic not null instruction constraint (which is blocking) impose us downtime for loaders otherwise. Related to T2547 14 September 2020, 11:07:31 UTC
eb961e9 New upstream version 0.14.2 11 September 2020, 13:37:06 UTC
ed55e9c Load the "fast" hypothesis profile by default otherwise, running pytest directly (without specifying a --hypothesis-profile option) will use default hypothesis values that are unsuitable for us. 11 September 2020, 10:27:17 UTC
fd6d72f cli: speedup the `swh` cli command startup time by moving import statements in functions, as well as preventing the loading of swh.storage.interface in swh.storage unless needed for type checking. Related to T2575. 11 September 2020, 09:15:58 UTC
d24a1e7 migrate_extrinsic_metadata: retry in case of database errors. They are likely to happen since this script takes a long time to run. 10 September 2020, 07:39:55 UTC
5ec70a6 Add a Python script to migrate extrinsic metadata from revision metadata. 10 September 2020, 07:39:55 UTC
93458a4 Import get_objstorage from swh.objstorage.factory instead of swh.objstorage (deprecated). 09 September 2020, 17:26:00 UTC
90eda98 directory_ls: Don't return None for status/length/sha1/... if the content is known but skipped. 08 September 2020, 11:52:33 UTC
3198e11 Add a test for directory_ls when the contents are missing. 08 September 2020, 11:52:33 UTC
85189d3 New upstream version 0.14.1 04 September 2020, 13:52:12 UTC
374e01c algos.diff: Add missed revision_get conversion This fixes by doing the minimum required so the diff module still works and fix [1]. Will release release 0.14.1 asap. [1] https://forge.softwareheritage.org/harbormaster/unit/view/918517/ Related to T645 04 September 2020, 13:37:24 UTC
08312f5 New upstream version 0.14.0 04 September 2020, 10:59:49 UTC
356eacd Refactor revision_get storage API to return Revision objects The signature becomes revision_get(...) -> List[Optional[Revision]] Related to T645 03 September 2020, 15:05:11 UTC
36d284c cassandra: Discard Content ctime field in content_get_partition That field should only be returned by content_find method. Add tests to check expected behaviors according to methods. 02 September 2020, 13:10:57 UTC
3962bb7 New upstream version 0.13.3 01 September 2020, 12:40:24 UTC
e6fcfb9 storage*: release_get(...) -> List[Optional[Release]] Related to T645 01 September 2020, 12:19:19 UTC
e6c17f6 Make StorageInterface a Protocol. We're already using 'StorageInterface' as type annotation where we expect an object that implements the same methods, but it isn't technically correct since these objects aren't an instance of a subclass. Making StorageInterface a protocol reflects how we already use it, without lying to mypy. 01 September 2020, 08:38:36 UTC
5afd985 Add a validating storage proxy, to check ids before insertion. It can be used to prevent buggy or malicious clients of the storage from adding invalid objects. 28 August 2020, 11:18:53 UTC
4532a4d Add a --check-config option for cli commands this allows to specify on command line wether to check the configuration and read or write access to the storage at startup time (especially for `rpc-server` and `replay` commands.) Warning: this option defaults to "read" for the backfill command and "write" for rpc-serve and replay commands, so now the creation of the Storage instance used in cli commands *will be checked*. Closes T2525. 25 August 2020, 08:50:50 UTC
2a35c0b Remove the deprecated config-path option from `swh storage rpc-serve` command include the validation of the presence of a "storage" config section in the main `storage` click.group, where the config-file is actually parsed. 25 August 2020, 08:50:50 UTC
e8b1b21 Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. Before: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 6.93s user 0.51s system 100% cpu 7.425 total After: $ time pytest --collect-only > /dev/null pytest --collect-only > /dev/null 2.39s user 0.10s system 100% cpu 2.475 total 25 August 2020, 08:01:20 UTC
f3870dc 161: Fix sql upgrade script Related to T2524 25 August 2020, 07:35:31 UTC
cc33dd3 Add support for a new "check_config" config option in get_storage() if "check_config" is present in get_storage()'s kwargs, call the storage.check_config() method and raise an exception is the result fails. The storage.check_config() method is called with the content of the "check_config" config value (so expected value for this currently is `check_config={"check_write": <bool>}`). 24 August 2020, 14:58:17 UTC
4dd9597 Check for db version mismatch in PgStorage.check_config() 24 August 2020, 14:58:17 UTC
c16ff50 Add a check_dbversion() method to the Db class This method compares the version stored in the database in the `dbversion` table with the currently declared version. This current version is declared as a simple `current_version` attribute on the Db class. **It must be updated jointly in 30-swh-schema.sql** when modifying the db schema. Note that if this is forgotten, the added test `test_dbversion` should fail. Related to T2525. 24 August 2020, 14:58:17 UTC
629d2d1 Fix pytest_plugin's database janitor: do not truncate the dbversion table 24 August 2020, 14:58:17 UTC
f570f93 algos.snapshot: Add visits_and_snapshots_get_from_revision Its code is moved from snapshot_id_get_from_revision so it's a rather small change; and the revision metadata migration script (bin/migrate-extrinsic-metadata.py) will need it. 24 August 2020, 13:55:00 UTC
d1f19e9 storage/interface: Remove deprecated diff endpoints They are not of interest anymore as swh-web is the only client and now directly uses functions defined in swh.storage.algos.diff. 20 August 2020, 15:15:15 UTC
5390a4c storage_tests: Remove duplicated postgresql-specific tests. They got copied to test_postgresql.py instead of moved. 20 August 2020, 14:04:30 UTC
3ac332e tests: Fix failures after test_storage renaming to storage_tests Also move the round_to_milliseconds function to the storage.utils module. 20 August 2020, 11:43:01 UTC
3efe8bd New upstream version 0.13.2 20 August 2020, 07:18:47 UTC
4073907 Move postgresql-related files to swh/storage/postgresql/ 19 August 2020, 15:17:35 UTC
6a53cb3 pg: Check revision.extra_headers is not null. Fixes a regression in 038a219f84d6b8a4f02b48f9ad3c5d823d097790, as it made the converter expect a list. When adding this column, we made it default to null instead of defaulting to an empty array, so existing records were initialized will null. This commit migrates these nulls to empty arrays, then adds a constraint to enforce it in the future. 19 August 2020, 12:57:26 UTC
ca3ee92 converters: convert extra_headers to an empty list if it is None. 19 August 2020, 12:57:23 UTC
e2b1494 pg: Make date_neg_utc_offset is not null if date is not null. Fixes a regression in 038a219f84d6b8a4f02b48f9ad3c5d823d097790, as it made converters expect a boolean. We stopped writing this kind of nulls since we started using model objects for insertions, but didn't migrate existing data. This commit migrates these nulls to false, then adds a constraint to enforce it in the future. 19 August 2020, 12:57:21 UTC
7dcd570 converters: convert neg_utc_offset to False if it is None. 19 August 2020, 12:57:18 UTC
e2f0665 backfiller: Add missing 'extra_header' field. This field wasn't backfilled; and it wasn't caught by tests, because revisions in swh.journal.tests.journal_data.TEST_OBJECTS were had an empty extra_header field. Starting with swh-journal v0.4.3, one of the revisions in TEST_OBJECTS has a non-empty extra_header field, so it will become a test failure without this commit 19 August 2020, 12:55:22 UTC
bd92547 backfiller: remove convertion of model objects back to dicts. As a temporary workaround to remain compatible with existing backfiller converters, I made them convert back to dict before they are converted again to model objects. This commit removes this workaround by making converters return model objects. 17 August 2020, 08:45:59 UTC
038a219 converters: Work on model objects instead of dicts on the "not-db" side. This is a change internal to the pg storage, that will be needed to make revision_get, revision_log, and release_get return model objects. 17 August 2020, 08:25:42 UTC
89656c9 test_converters: Fix test data to match actual values. 17 August 2020, 07:52:10 UTC
3a713dd cassandra.cql: Use a dict of statements instead of dynamically building method names in the two methods which need to switch between statements. The method name building was done to shoehorn this statement switching into the existing @_prepared_select_statement. This introduces @_prepared_select_statements (plural), which does this switching properly, using a dictionary. 14 August 2020, 14:35:09 UTC
546d11e cassandra.cql: Make the 'limit' argument of origin_visit_get non-optional. It's not optional in the storage interface, so a None value can't be passed. 14 August 2020, 14:35:09 UTC
1996b49 in_memory: Remove dead code. 14 August 2020, 14:35:09 UTC
291704d in_memory: Remove InMemoryStorage.*metadata_* and implement InMemoryCqlRunner.*metadata_* 14 August 2020, 14:35:09 UTC
da35e56 in_memory: Remove InMemoryStorage.origin_visit_* and implement InMemoryCqlRunner.origin_visit_* 14 August 2020, 14:35:09 UTC
e5f450c cassandra.cql: reorder origin_visit_* and origin_visit_status_* methods to be properly grouped. 14 August 2020, 14:35:09 UTC
249e4af Remove unused arguments of CqlRunner.origin_visit_status_get. 14 August 2020, 14:35:09 UTC
e1eb6cd in_memory: Remove InMemoryStorage.origin_* and implement InMemoryCqlRunner.origin_* 14 August 2020, 14:35:09 UTC
f78c76f in_memory: Remove InMemoryStorage.snapshot_* and implement InMemoryCqlRunner.snapshot_* 14 August 2020, 14:35:08 UTC
6651130 Remove endpoint snapshot_get_by_origin_visit. It's not used anywhere. 14 August 2020, 14:34:55 UTC
1104c53 in_memory: Remove InMemoryStorage.release_* and implement InMemoryCqlRunner.release_* 14 August 2020, 14:34:55 UTC
237c400 in_memory: Remove InMemoryStorage.revision_* and implement InMemoryCqlRunner.revision_* 14 August 2020, 13:16:29 UTC
8e7eed4 in_memory: Remove InMemoryStorage.directory_* and implement InMemoryCqlRunner.directory_* 14 August 2020, 13:16:29 UTC
d5f41f8 in_memory: Remove InMemoryStorage.skipped_content_* and implement InMemoryCqlRunner.skipped_content_* 14 August 2020, 13:16:29 UTC
b3af39a in_memory: Remove InMemoryStorage.content_* and implement InMemoryCqlRunner.content_* 14 August 2020, 13:16:29 UTC
397a645 in_memory: make object_find_by_sha1_git merge results from the CassandraStorage. For now this has no effect. However, in the near future, the CassandraStorage will be in charge of some object types, so we need to merge objects found in the CassandraStorage and those found directly in the InMemoryStorage. 14 August 2020, 13:16:29 UTC
a96c253 in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra. For now it's only used for object counters; but future commits will progressively move the in-memory's storage features to it. 14 August 2020, 13:16:29 UTC
bc47283 Make InMemoryStorage inherit from CassandraStorage. This has no effect for now, other than deduplicating a method and causing a name clash. 14 August 2020, 13:16:29 UTC
2097186 in_memory: Add class Table, which emulates a Cassandra table. It will be used to implement the in-memory storage as a backend for the cassandra storage. 14 August 2020, 13:16:29 UTC
ef06005 cassandra.cql: Fix return type of stat_counters. 14 August 2020, 13:16:29 UTC
1266b6a cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes. They will be used by the in-mem implementation of CqlRunner. 14 August 2020, 13:16:29 UTC
3dc69aa cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last. This fixes a mismatch in behavior with the pg and the in-mem storages 14 August 2020, 13:16:29 UTC
006eeec cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing. 14 August 2020, 13:16:29 UTC
da28731 cassandra: Fix content_missing_per_sha1 when its parameter has length != 1. 14 August 2020, 13:16:29 UTC
6675286 cassandra.cql: Explicitly request columns, instead of 'SELECT *'. 'SELECT *' should be avoided in prepared statements, see https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#avoid-preparing-select-queries 11 August 2020, 10:22:15 UTC
0b78b39 cassandra: add a TABLE class attribute to row classes, and use it to deduplicate prepared statements logic. It will also be used in a future commit to generate 'select' prepared statements. 11 August 2020, 09:37:13 UTC
92e7a21 cassandra: Add annotations to make mypy actually type-check calls to CqlRunner. All methods of CqlRunner were decorated, which prevented mypy from doing anything useful. As I finally found a way to type the decorator (using mypy_extensions.NamedArg), I can finally make mypy aware of the methods' types. This commit (as well as all three of the last commits) also fixes issues found by mypy thanks to this. 11 August 2020, 09:19:57 UTC
b11b890 cassandra.storage: remove dead code 11 August 2020, 09:13:55 UTC
f954714 Fix type of snapshot_count_branches. 11 August 2020, 09:13:30 UTC
back to top