swh:1:snp:eb70f1f85391e4b077c211bec36af0061c4bf937

sort by:
Revision Author Date Message Commit Date
ff8b4ac New upstream version 0.0.188 28 April 2020, 11:52:06 UTC
49109d1 test_retry: Centralize time.sleep setup within a fixture This monkeypatches the internal sleep function used to not wait. This kept the previous behavior. It changes the implementation to monkeypatch though. This also centralizes within a fixture. This avoids repeating setup. The previous implementation cluttered the tests body instruction with internal implementation details. 27 April 2020, 13:10:05 UTC
ecadd53 Remove recently added test_cli.test_rpc_serve* tests these are unreliable and werkzeug's tests WSGI server does not handle being executed from pytest very well (means it executes pytest all over again). 24 April 2020, 13:16:06 UTC
038c30b Adapt journal client loading to swh.journal 0.0.31 24 April 2020, 09:58:02 UTC
4b7ba1f Copy the graph replayer component from swh-journal The CLI command is included as well as `swh storage replay`. Copied test test_replay, in test_cli.py, should be identical. 24 April 2020, 09:58:02 UTC
5fd9b56 Deprecate the `config-path` argument of the `swh storage rpc-serve` command in favor of the standard `--config-file` option of `swh storage`. Attempt to write a couple of tests for the rpc-serve command. 24 April 2020, 09:27:07 UTC
cd32cf4 Normalize the configuration file handling in the `swh storage` CLI command Almost every other swh package handles the loading of the config file from the main click command group of the package. So we make storage behaves the same. 23 April 2020, 11:31:25 UTC
698af8c cli: rename the command 'backfiller' as 'backfill' for the sake of consistency. 23 April 2020, 11:31:25 UTC
b2bba45 Copy the backfiller component from swh-journal This componant makes more sense in the swh-storage package. 23 April 2020, 11:31:25 UTC
fe56005 Add a wrapper to manage a sorted list. For now this is only used by sorted_sha1, but we'll need it for origin_metadata soon. 23 April 2020, 11:03:28 UTC
bca643a setup: Update the minimum required runtime python3 version Related to T2367 20 April 2020, 15:22:45 UTC
32b3e93 doc: reference SWHID using explicit anchors 17 April 2020, 14:28:18 UTC
caa9759 New upstream version 0.0.187 14 April 2020, 16:23:40 UTC
f66184d storage.interface: Actually define the remote flush operation As all storage are chained together, we need to define it. We really need the actual backends, which should be the last storage chained to do noop when that endpoint is called (and they do). 14 April 2020, 16:05:03 UTC
9a150ac New upstream version 0.0.186 14 April 2020, 15:20:54 UTC
4359874 Drop BWCompatInMemoryJournalWriter (released with swh.journal 0.0.30) 14 April 2020, 14:26:18 UTC
9cf1d67 New upstream version 0.0.185 14 April 2020, 12:22:04 UTC
2cc263d test: update storage tests to (future) swh.journal 0.0.30 which will handle swh.model objects everywhere instead of dicts. Also add a BW compat version os the InMemoryJournalWriter so tests will pass with current version of swh.journal (0.0.29). 14 April 2020, 09:37:31 UTC
e5e5943 storage.filter: Remove internal state 14 April 2020, 08:54:14 UTC
0e5731c New upstream version 0.0.184 10 April 2020, 14:14:17 UTC
ddac3d2 test_retry: Add missing skipped_content_add tests 10 April 2020, 11:53:34 UTC
54b2907 storage*: Add flush endpoints to storage implems (backend, proxy) All storage defines one endpoint even if it's mostly noop. This avoids introspection surprises. Related to D2966 (to be consistent) 10 April 2020, 09:08:05 UTC
d6ecf54 New upstream version 0.0.183 09 April 2020, 10:46:26 UTC
b0b0313 test_filter: Extract the filter storage into a fixture 09 April 2020, 07:37:20 UTC
566c325 storage*: Add `clear_buffers` operation for proxy storages This also adds the endpoint as noop for the main backend implementations. Related to T2352 09 April 2020, 07:37:20 UTC
ed4097c Add a pyproject.toml file to target py37 for black 08 April 2020, 20:11:21 UTC
cd52a03 Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. 08 April 2020, 13:16:34 UTC
0fe4665 Fix Storage.origin_visit_update(); ensure it raises a StorageArgumentException 08 April 2020, 13:16:34 UTC
be954f2 buffer: filter out duplicate objects. 08 April 2020, 11:01:00 UTC
c51139e Make Storage.origin_visit_update() add an OriginVisit model entity in the journal instead of a dict, to comply with next version of swh.journal (which will require swh.model objects). 08 April 2020, 10:03:38 UTC
bf48cfe Make swh/storage/storage.py flake8 compliant 08 April 2020, 10:03:38 UTC
fbb51aa Add a setup.cfg file to configure flake8 for black compatibility 08 April 2020, 10:03:38 UTC
8e8577e Prevent erroneous HashCollisions by using the same ctime for all rows. 'swh_content_add' tries to avoid this issue with a DISTINCT clause on the entire row; but it is useless because 'ctime' cells differ by a few microseconds. This commit ensures all ctime values are exactly the same, so they are filtered out. An alternative would be to change 'swh_content_add' to do: ``` select distinct on (sha1, sha1_git, sha256, blake2s256, length, status) sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content ``` instead of: ``` select distinct sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content ``` but this is more verbose and there's no good reason to call 'now()' for every row. 08 April 2020, 08:30:41 UTC
82b41ba Remove magic CassObject class, use dicts instead. 02 April 2020, 14:40:00 UTC
df3207a Adapt cassandra backend to validating model types This is required to be able to activate type validation in the model (in swh.model.model). It requires to replace the "distorded" usage that was done of model entities to build objects compatible with CqlRunner's object addition logic. Since we cannot create invalid model entities any more in this context, we add a new CassObject type (just a dict with __getattr__=__getitem__) and use it as object passed to the CqlRunner for entity types that need special care (namely Revision and Release). This should still work with swh.model v0.0.62 (without type validation) as well as the (next) v0.0.63 which will come with type valdation. 01 April 2020, 12:19:41 UTC
20baa1b test: convert test_converts.py to pytest style 01 April 2020, 12:19:41 UTC
fa4a043 test: get rid of normalized_xxx in tests This is not needed any more with properly typed test data. 01 April 2020, 12:19:41 UTC
fcca905 test: ensure timestamp in test data are properly typed according to model declaration, a timestamp must be a dict with 2 keys, 'seconds' and 'microseconds'. Also add a few more tests for the date_to_db helper function so that the test coverage of this later remains. 01 April 2020, 12:19:41 UTC
377e6a8 tests: Prepare tests for origin_visit_update objects Related to T2310 01 April 2020, 09:56:05 UTC
9c22156 storage*: Add missing type annotations on origin_visit_get* endpoints 01 April 2020, 09:02:03 UTC
ff0a538 tests: Skip internal origin_visit_update model object generation Beside making the tests fail, they are not helpful right now. This commit will avoid the current master build from breaking. Also, the model bump is a mandatory preparatory work for making the origin visit immutable. Related to T2310 01 April 2020, 08:52:54 UTC
2856004 Ensure visit id is set in origin_visit_upsert before journal writes 31 March 2020, 14:28:25 UTC
acf057e storage*: Unify validation exception capture across storages 31 March 2020, 13:56:35 UTC
81e7575 storage*: Stop duplication and use storage.utils.now function 31 March 2020, 13:43:45 UTC
eb82792 cassandra/cql: Simplify type using Iterator 31 March 2020, 13:02:30 UTC
46fa27e storage*: Add types to origin_visit_get 31 March 2020, 12:58:55 UTC
4bdde50 storage*: Align origin_visit_update interface and implementations This also adds an unused (yet) optional parameter date. It will soon be used in the context of origin_visit_update use. Related to T2310 31 March 2020, 12:42:08 UTC
69862b0 storage: Define a now() function 31 March 2020, 12:35:00 UTC
8e8e3a9 test_retry: Remove unused import 31 March 2020, 12:34:22 UTC
c53433d test_retry: Use datetime instead of string Reuse a date_visit from the sample storage 31 March 2020, 12:24:36 UTC
623a1b7 test: add a small test to check for type validation (using release_add) 27 March 2020, 14:56:01 UTC
90c4112 validate: fix type annotation for origin_visit_add date argument is expected to be a datetime. 27 March 2020, 08:50:12 UTC
1916fd7 validate: ensure StorageArgumentException is always encodable by embeding a string representation of the original Exception as StorageArgumentException args instead of the original exc.args since this can contain any python (possibli non-serializable) object. This is needed e.g. when swh.model has runtime type validation. 27 March 2020, 08:50:12 UTC
c67fe21 writer: fix skipped_content_add type declaration to use SkippedContent instead of plain Content. 27 March 2020, 08:50:12 UTC
19be96f tests: fix types of several test data sets these are currently accepted by swh.model, but won't be any more as soon as we activate type validation in swh.model. 27 March 2020, 08:50:12 UTC
982023a New upstream version 0.0.182 27 March 2020, 06:13:14 UTC
570dce2 Shut down cassandra connection before closing the fixture down 26 March 2020, 19:43:30 UTC
ce5d2bf storage*: Update origin_visit_update to make status parameter mandatory This actually aligns with the origin_visit model whose status is already mandatory. 26 March 2020, 15:13:45 UTC
40a7569 test: Adapt origin validation test according to latest model changes Origin model no longer allows to have a type. Related to f533f62bbf114cfcc29f7c72307c4dfbe99cf048 26 March 2020, 14:44:24 UTC
0a22e72 Respec discovery_date as a Python datetime instead of an ISO string. For consistency with the rest of the API. 26 March 2020, 12:04:13 UTC
74fd15e origin_visit_add: Add missing db/cur argument to call to origin_get. 26 March 2020, 10:07:58 UTC
c99ec11 New upstream version 0.0.181 25 March 2020, 09:03:41 UTC
fd29fcb storage*: Hex encode content hashes in HashCollision exception Related to T2332#42793 24 March 2020, 17:40:48 UTC
b7477e5 Add format of discovery_date in the metadata specification. It was not specified what the format should be. 24 March 2020, 11:14:38 UTC
92a87ea Store the value of token(partition_key) in skipped_content_by_* table, instead of three hashes. As was done for content_by_*. 23 March 2020, 14:51:13 UTC
a24ab3f Store the value of token(partition_key) in content_by_* table, instead of three hashes. That's a big win in terms of disk space, and shouldn't affect performance negatively. 23 March 2020, 14:16:46 UTC
a72370d New upstream version 0.0.180 18 March 2020, 17:45:34 UTC
456e15a Don't double-count added origins in origin_add origin_add_one already counts origins; this other send_metric would have us count added origins twice. 18 March 2020, 17:10:36 UTC
d99f08b Don't count origins len(url) times when calling origin_add_one I guess the `origins` variable name was carried over from a refactoring, but it doesn't match what db.origin_add actually returns. Overall this variable name made us overcount origins a little. 18 March 2020, 17:08:32 UTC
b259b70 New upstream version 0.0.179 18 March 2020, 15:50:48 UTC
209de5d Serialize objstorage and database writes in content_add Considering that the objstorage is idempotent, and that there's no rollback feature, intermixing both "transactions" has no concrete benefit. This avoids doing database transactions that are longer than needed. 18 March 2020, 14:36:01 UTC
aaa0e54 Don't nest transactions in content_add/skipped_content_add Seems like the cur/db arguments have been missed in one of the various refactorings, creating separate transactions for these function calls. 18 March 2020, 11:04:46 UTC
1dbb732 Don't create a transaction for content_get_partition It just calls out to another function which, itself, creates a transaction. 18 March 2020, 11:04:05 UTC
9b3735b requirements-swh.txt: Use >= instead of == for swh-core version check This fixes installation of swh modules in virtualenv when executing "pip install $(./bin/pip-swh-packages)" in swh-environment. 17 March 2020, 14:34:25 UTC
0e68cbe New upstream version 0.0.178 16 March 2020, 11:59:16 UTC
da98f5f origin_visit_add: Adapt endpoint signature to return OriginVisit Prior to this commit, there was: - no signature in the method - discrepancy between checks on the different backend origin_visit_add endpoint is now typed ``` def origin_visit_add( self, origin_url: str, date: Union[str, datetime.datetime], type: str) -> OriginVisit: ``` This also: - renames appropriately the origin_url parameter (removing 1 FIXME) - align backend implementations' check which were different 13 March 2020, 13:09:42 UTC
0456cce origin_visit_upsert: Use OriginVisit object as input This aligns with other `_add` endpoints. Only the journal depends on this. Related to D2812#67298 12 March 2020, 18:05:26 UTC
aa39be1 storage/writer: refactor JournalWriter.content_add to send model objects to the journal writer, as it already does with other objet types (instead of dicts). 10 March 2020, 15:44:07 UTC
a97781d storage/validate: small code formatting 10 March 2020, 15:42:49 UTC
5ab89b0 New upstream version 0.0.177 10 March 2020, 10:48:10 UTC
05a4fca storage: Identify and provide the collision hashes in exception This matches, what's done in other storage backends. There is no consistency for now though, storage backends provides as HashCollision exception parameters the content information as: - cassandra: (algo: str, hash_id: bytes, Content as cassandra Row) - in_memory: (algo: str, hash_id: bytes, Content as Tuple[str, bytes] - pgstorage: algo: Optional[str] Opening this diff to discuss how to properly land this. 10 March 2020, 08:27:55 UTC
88fe942 Guarantee the order of results for revision_get and release_get It's a bit silly, but we depend on it for some tests. 09 March 2020, 15:53:34 UTC
7ee3972 Mock calls to time.sleep in retry tests. It makes the tests faster. 06 March 2020, 13:59:21 UTC
6fe9de4 Fix retry tests. mock_memory.has_calls does not exist; so calling it returns a MagicMock. 06 March 2020, 13:58:33 UTC
3b8b718 sql: do not attempt to create the plpgsql lang if already exists This is needed in case the pg user is not super user (but language already exists.) 06 March 2020, 08:29:09 UTC
8e41bcc Update requirement on swh.core for RPCClient method overrides 02 March 2020, 14:18:02 UTC
2b64be8 New upstream version 0.0.176 28 February 2020, 15:21:25 UTC
5222352 Use Content.hashes() instead of Content.to_dict() where it makes sense. .hashes() returns a subset of .to_dict(), so it was accidentally used instead. 27 February 2020, 15:59:52 UTC
d096542 Make the RPC client and objstorage helper fetch Content.data. This is needed when a lazy subclass of Content is used, eg. from swh.model.from_disk. 27 February 2020, 15:58:00 UTC
3996e5d Move ctime out of the validation proxy. It's not the right place to set the ctime (it should be on the server side). 27 February 2020, 15:56:06 UTC
caf51a0 Accept cassandra-driver >= 3.22. The bug that affected us in 3.21 is resolved ( https://datastax-oss.atlassian.net/browse/PYTHON-1205 ), so we can now use v3.22 to get wheels. 27 February 2020, 15:04:08 UTC
79e1f7c New upstream version 0.0.175 20 February 2020, 13:18:32 UTC
b093a5a retry: Add support for tenacity < 5.0 This fixes swh-storage debian package build on buster as python3-tenacity version is 4.12. 20 February 2020, 12:38:48 UTC
a20779f New upstream version 0.0.174 19 February 2020, 15:00:30 UTC
7cf0864 Add support for (de)serializing swh-model in RPC calls. This allows running the validating proxy on the client side instead of the server side. 18 February 2020, 13:22:33 UTC
80befa5 Make storage proxies use swh-model objects instead of dicts. This means that instead of having the validation proxy right before the backend class, it must now be at the beginning of pipelines. 18 February 2020, 12:45:46 UTC
29b0948 Fix FilteringProxy to not drop skipped-contents with a missing sha1_git. Passes them all to the backend instead of silently dropping them all if any of them is not missing. 18 February 2020, 12:45:22 UTC
51b2016 Fix inconsistent behavior of skipped_content_missing across backends. Two fixes: * in-mem ignored None keys * cassandra yielded input dicts as-is instead of a dict with just the hashes 14 February 2020, 16:49:05 UTC
03c2a02 Re-raise StorageArgumentException through API calls. So clients will get a nice exception looking like the original one, instead of generic RemoteApiError. 12 February 2020, 15:51:26 UTC
b668651 New upstream version 0.0.172 12 February 2020, 13:13:45 UTC
back to top