94b6946 | Jenkins for Software Heritage | 30 April 2020, 12:58:56 UTC | Update upstream source from tag 'debian/upstream/0.0.189' Update to upstream version '0.0.189' with Debian dir c15d9ed71a3944e12dd13ce35b89362c2e731d92 | 30 April 2020, 12:58:56 UTC |
b579670 | Jenkins for Software Heritage | 30 April 2020, 12:58:55 UTC | New upstream version 0.0.189 | 30 April 2020, 12:58:55 UTC |
b0b767b | Antoine R. Dumont (@ardumont) | 30 April 2020, 11:20:08 UTC | pg: Write both origin visit updates & status, read from origin_visit This partially reverts commit [1]. That now (still) writes new origin visit status... But, as before [1]: - update origin visit (with same values as origin visit status) - read from origin visit That does not revert the new in-memory (D2937) nor the cassandra (D2939) storage implementations. [1] a720caed6eebbb68a9f9b5be554a52859aa052d6 D2938 Related to D2938 Related to T2310#44043 | 30 April 2020, 11:52:13 UTC |
0e8234f | Antoine R. Dumont (@ardumont) | 29 April 2020, 10:10:17 UTC | pg-storage: Add new created state Related to T2310 | 30 April 2020, 11:43:05 UTC |
2b95dd3 | Stefano Zacchiroli | 29 April 2020, 16:33:40 UTC | setup.py: add documentation link | 29 April 2020, 16:33:40 UTC |
4dc2eb6 | Valentin Lorentz | 29 April 2020, 15:02:50 UTC | metadata spec: Fix title hierarchy | 29 April 2020, 15:02:50 UTC |
707f647 | Valentin Lorentz | 29 April 2020, 11:03:15 UTC | tests: Use aware datetimes instead of naive ones. Production should only use aware datetimes. | 29 April 2020, 11:03:15 UTC |
e3e76c4 | Antoine R. Dumont (@ardumont) | 30 March 2020, 11:08:32 UTC | cassandra: Adapt internal implementations to use origin visit update Related to T2310 | 28 April 2020, 14:46:47 UTC |
a720cae | Antoine R. Dumont (@ardumont) | 26 March 2020, 13:15:17 UTC | pg-storage: Adapt internal implementations to use origin visit update Related to T2310 | 28 April 2020, 14:46:46 UTC |
ead8088 | Antoine R. Dumont (@ardumont) | 25 March 2020, 16:53:48 UTC | in_memory: Adapt internal implementations to use origin visit update (pairing with @vlorentz) Related to T2310 | 28 April 2020, 14:46:43 UTC |
baa127e | Jenkins for Software Heritage | 28 April 2020, 11:52:09 UTC | Updated debian changelog for version 0.0.188 | 28 April 2020, 11:52:09 UTC |
03ad3ba | Jenkins for Software Heritage | 28 April 2020, 11:52:08 UTC | Update upstream source from tag 'debian/upstream/0.0.188' Update to upstream version '0.0.188' with Debian dir 50f1d39b3b3add4e066e19f2a7225c144133f3af | 28 April 2020, 11:52:08 UTC |
ff8b4ac | Jenkins for Software Heritage | 28 April 2020, 11:52:06 UTC | New upstream version 0.0.188 | 28 April 2020, 11:52:06 UTC |
49109d1 | Antoine R. Dumont (@ardumont) | 24 April 2020, 10:22:29 UTC | test_retry: Centralize time.sleep setup within a fixture This monkeypatches the internal sleep function used to not wait. This kept the previous behavior. It changes the implementation to monkeypatch though. This also centralizes within a fixture. This avoids repeating setup. The previous implementation cluttered the tests body instruction with internal implementation details. | 27 April 2020, 13:10:05 UTC |
ecadd53 | David Douard | 24 April 2020, 13:16:06 UTC | Remove recently added test_cli.test_rpc_serve* tests these are unreliable and werkzeug's tests WSGI server does not handle being executed from pytest very well (means it executes pytest all over again). | 24 April 2020, 13:16:06 UTC |
038c30b | David Douard | 22 April 2020, 15:00:45 UTC | Adapt journal client loading to swh.journal 0.0.31 | 24 April 2020, 09:58:02 UTC |
4b7ba1f | David Douard | 09 April 2020, 15:45:55 UTC | Copy the graph replayer component from swh-journal The CLI command is included as well as `swh storage replay`. Copied test test_replay, in test_cli.py, should be identical. | 24 April 2020, 09:58:02 UTC |
5fd9b56 | David Douard | 23 April 2020, 13:27:28 UTC | Deprecate the `config-path` argument of the `swh storage rpc-serve` command in favor of the standard `--config-file` option of `swh storage`. Attempt to write a couple of tests for the rpc-serve command. | 24 April 2020, 09:27:07 UTC |
cd32cf4 | David Douard | 22 April 2020, 14:57:07 UTC | Normalize the configuration file handling in the `swh storage` CLI command Almost every other swh package handles the loading of the config file from the main click command group of the package. So we make storage behaves the same. | 23 April 2020, 11:31:25 UTC |
698af8c | David Douard | 10 April 2020, 09:03:21 UTC | cli: rename the command 'backfiller' as 'backfill' for the sake of consistency. | 23 April 2020, 11:31:25 UTC |
b2bba45 | David Douard | 09 April 2020, 14:56:44 UTC | Copy the backfiller component from swh-journal This componant makes more sense in the swh-storage package. | 23 April 2020, 11:31:25 UTC |
fe56005 | Valentin Lorentz | 24 March 2020, 16:22:09 UTC | Add a wrapper to manage a sorted list. For now this is only used by sorted_sha1, but we'll need it for origin_metadata soon. | 23 April 2020, 11:03:28 UTC |
bca643a | Antoine R. Dumont (@ardumont) | 20 April 2020, 15:21:56 UTC | setup: Update the minimum required runtime python3 version Related to T2367 | 20 April 2020, 15:22:45 UTC |
32b3e93 | Stefano Zacchiroli | 17 April 2020, 14:28:18 UTC | doc: reference SWHID using explicit anchors | 17 April 2020, 14:28:18 UTC |
b3d2bdd | Jenkins for Software Heritage | 14 April 2020, 16:23:41 UTC | Updated debian changelog for version 0.0.187 | 14 April 2020, 16:23:41 UTC |
c4baf78 | Jenkins for Software Heritage | 14 April 2020, 16:23:41 UTC | Update upstream source from tag 'debian/upstream/0.0.187' Update to upstream version '0.0.187' with Debian dir 4b30ccf98194f4e90f8cf0db487d47e7baadc998 | 14 April 2020, 16:23:41 UTC |
caa9759 | Jenkins for Software Heritage | 14 April 2020, 16:23:40 UTC | New upstream version 0.0.187 | 14 April 2020, 16:23:40 UTC |
f66184d | Antoine R. Dumont (@ardumont) | 14 April 2020, 16:05:03 UTC | storage.interface: Actually define the remote flush operation As all storage are chained together, we need to define it. We really need the actual backends, which should be the last storage chained to do noop when that endpoint is called (and they do). | 14 April 2020, 16:05:03 UTC |
121d986 | Jenkins for Software Heritage | 14 April 2020, 15:20:57 UTC | Updated debian changelog for version 0.0.186 | 14 April 2020, 15:20:57 UTC |
d078445 | Jenkins for Software Heritage | 14 April 2020, 15:20:56 UTC | Update upstream source from tag 'debian/upstream/0.0.186' Update to upstream version '0.0.186' with Debian dir ce528779d1106f75b6d5fecc36e9ff3475778afe | 14 April 2020, 15:20:56 UTC |
9a150ac | Jenkins for Software Heritage | 14 April 2020, 15:20:54 UTC | New upstream version 0.0.186 | 14 April 2020, 15:20:54 UTC |
4359874 | Nicolas Dandrimont | 14 April 2020, 14:26:18 UTC | Drop BWCompatInMemoryJournalWriter (released with swh.journal 0.0.30) | 14 April 2020, 14:26:18 UTC |
3a7456a | Jenkins for Software Heritage | 14 April 2020, 12:22:07 UTC | Updated debian changelog for version 0.0.185 | 14 April 2020, 12:22:07 UTC |
bd6204a | Jenkins for Software Heritage | 14 April 2020, 12:22:06 UTC | Update upstream source from tag 'debian/upstream/0.0.185' Update to upstream version '0.0.185' with Debian dir 9850406eeba0db1a0f770d855a6911e04448c2bc | 14 April 2020, 12:22:06 UTC |
9cf1d67 | Jenkins for Software Heritage | 14 April 2020, 12:22:04 UTC | New upstream version 0.0.185 | 14 April 2020, 12:22:04 UTC |
2cc263d | David Douard | 07 April 2020, 14:17:39 UTC | test: update storage tests to (future) swh.journal 0.0.30 which will handle swh.model objects everywhere instead of dicts. Also add a BW compat version os the InMemoryJournalWriter so tests will pass with current version of swh.journal (0.0.29). | 14 April 2020, 09:37:31 UTC |
e5e5943 | Antoine R. Dumont (@ardumont) | 14 April 2020, 08:54:14 UTC | storage.filter: Remove internal state | 14 April 2020, 08:54:14 UTC |
eefa2e9 | Jenkins for Software Heritage | 10 April 2020, 14:14:20 UTC | Updated debian changelog for version 0.0.184 | 10 April 2020, 14:14:20 UTC |
5458209 | Jenkins for Software Heritage | 10 April 2020, 14:14:19 UTC | Update upstream source from tag 'debian/upstream/0.0.184' Update to upstream version '0.0.184' with Debian dir 762cbea3c94234f7762ade30c913f9f8b5e5e537 | 10 April 2020, 14:14:19 UTC |
0e5731c | Jenkins for Software Heritage | 10 April 2020, 14:14:17 UTC | New upstream version 0.0.184 | 10 April 2020, 14:14:17 UTC |
ddac3d2 | Antoine R. Dumont (@ardumont) | 10 April 2020, 09:40:39 UTC | test_retry: Add missing skipped_content_add tests | 10 April 2020, 11:53:34 UTC |
54b2907 | Antoine R. Dumont (@ardumont) | 08 April 2020, 13:27:08 UTC | storage*: Add flush endpoints to storage implems (backend, proxy) All storage defines one endpoint even if it's mostly noop. This avoids introspection surprises. Related to D2966 (to be consistent) | 10 April 2020, 09:08:05 UTC |
29c3f1b | Jenkins for Software Heritage | 09 April 2020, 10:46:29 UTC | Updated debian changelog for version 0.0.183 | 09 April 2020, 10:46:29 UTC |
ed00daa | Jenkins for Software Heritage | 09 April 2020, 10:46:28 UTC | Update upstream source from tag 'debian/upstream/0.0.183' Update to upstream version '0.0.183' with Debian dir a8bad95849a6771b153327f001634318da0e4b38 | 09 April 2020, 10:46:28 UTC |
d6ecf54 | Jenkins for Software Heritage | 09 April 2020, 10:46:26 UTC | New upstream version 0.0.183 | 09 April 2020, 10:46:26 UTC |
b0b0313 | Antoine R. Dumont (@ardumont) | 08 April 2020, 09:27:30 UTC | test_filter: Extract the filter storage into a fixture | 09 April 2020, 07:37:20 UTC |
566c325 | Antoine R. Dumont (@ardumont) | 07 April 2020, 12:56:46 UTC | storage*: Add `clear_buffers` operation for proxy storages This also adds the endpoint as noop for the main backend implementations. Related to T2352 | 09 April 2020, 07:37:20 UTC |
ed4097c | David Douard | 08 April 2020, 19:42:32 UTC | Add a pyproject.toml file to target py37 for black | 08 April 2020, 20:11:21 UTC |
cd52a03 | David Douard | 08 April 2020, 12:33:42 UTC | Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. | 08 April 2020, 13:16:34 UTC |
0fe4665 | David Douard | 08 April 2020, 13:14:48 UTC | Fix Storage.origin_visit_update(); ensure it raises a StorageArgumentException | 08 April 2020, 13:16:34 UTC |
be954f2 | Valentin Lorentz | 08 April 2020, 10:02:40 UTC | buffer: filter out duplicate objects. | 08 April 2020, 11:01:00 UTC |
c51139e | David Douard | 07 April 2020, 14:09:48 UTC | Make Storage.origin_visit_update() add an OriginVisit model entity in the journal instead of a dict, to comply with next version of swh.journal (which will require swh.model objects). | 08 April 2020, 10:03:38 UTC |
bf48cfe | David Douard | 07 April 2020, 14:09:05 UTC | Make swh/storage/storage.py flake8 compliant | 08 April 2020, 10:03:38 UTC |
fbb51aa | David Douard | 07 April 2020, 14:06:25 UTC | Add a setup.cfg file to configure flake8 for black compatibility | 08 April 2020, 10:03:38 UTC |
8e8577e | Valentin Lorentz | 08 April 2020, 08:30:28 UTC | Prevent erroneous HashCollisions by using the same ctime for all rows. 'swh_content_add' tries to avoid this issue with a DISTINCT clause on the entire row; but it is useless because 'ctime' cells differ by a few microseconds. This commit ensures all ctime values are exactly the same, so they are filtered out. An alternative would be to change 'swh_content_add' to do: ``` select distinct on (sha1, sha1_git, sha256, blake2s256, length, status) sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content ``` instead of: ``` select distinct sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content ``` but this is more verbose and there's no good reason to call 'now()' for every row. | 08 April 2020, 08:30:41 UTC |
82b41ba | Valentin Lorentz | 02 April 2020, 14:40:00 UTC | Remove magic CassObject class, use dicts instead. | 02 April 2020, 14:40:00 UTC |
df3207a | David Douard | 26 March 2020, 16:28:48 UTC | Adapt cassandra backend to validating model types This is required to be able to activate type validation in the model (in swh.model.model). It requires to replace the "distorded" usage that was done of model entities to build objects compatible with CqlRunner's object addition logic. Since we cannot create invalid model entities any more in this context, we add a new CassObject type (just a dict with __getattr__=__getitem__) and use it as object passed to the CqlRunner for entity types that need special care (namely Revision and Release). This should still work with swh.model v0.0.62 (without type validation) as well as the (next) v0.0.63 which will come with type valdation. | 01 April 2020, 12:19:41 UTC |
20baa1b | David Douard | 30 March 2020, 13:34:55 UTC | test: convert test_converts.py to pytest style | 01 April 2020, 12:19:41 UTC |
fa4a043 | David Douard | 25 March 2020, 15:15:46 UTC | test: get rid of normalized_xxx in tests This is not needed any more with properly typed test data. | 01 April 2020, 12:19:41 UTC |
fcca905 | David Douard | 25 March 2020, 15:08:06 UTC | test: ensure timestamp in test data are properly typed according to model declaration, a timestamp must be a dict with 2 keys, 'seconds' and 'microseconds'. Also add a few more tests for the date_to_db helper function so that the test coverage of this later remains. | 01 April 2020, 12:19:41 UTC |
377e6a8 | Antoine R. Dumont (@ardumont) | 01 April 2020, 09:54:33 UTC | tests: Prepare tests for origin_visit_update objects Related to T2310 | 01 April 2020, 09:56:05 UTC |
9c22156 | Antoine R. Dumont (@ardumont) | 27 March 2020, 16:28:56 UTC | storage*: Add missing type annotations on origin_visit_get* endpoints | 01 April 2020, 09:02:03 UTC |
ff0a538 | Antoine R. Dumont (@ardumont) | 01 April 2020, 08:48:29 UTC | tests: Skip internal origin_visit_update model object generation Beside making the tests fail, they are not helpful right now. This commit will avoid the current master build from breaking. Also, the model bump is a mandatory preparatory work for making the origin visit immutable. Related to T2310 | 01 April 2020, 08:52:54 UTC |
2856004 | Antoine R. Dumont (@ardumont) | 31 March 2020, 14:28:25 UTC | Ensure visit id is set in origin_visit_upsert before journal writes | 31 March 2020, 14:28:25 UTC |
acf057e | Antoine R. Dumont (@ardumont) | 31 March 2020, 13:54:30 UTC | storage*: Unify validation exception capture across storages | 31 March 2020, 13:56:35 UTC |
81e7575 | Antoine R. Dumont (@ardumont) | 31 March 2020, 13:43:45 UTC | storage*: Stop duplication and use storage.utils.now function | 31 March 2020, 13:43:45 UTC |
eb82792 | Antoine R. Dumont (@ardumont) | 30 March 2020, 18:30:44 UTC | cassandra/cql: Simplify type using Iterator | 31 March 2020, 13:02:30 UTC |
46fa27e | Antoine R. Dumont (@ardumont) | 27 March 2020, 13:40:36 UTC | storage*: Add types to origin_visit_get | 31 March 2020, 12:58:55 UTC |
4bdde50 | Antoine R. Dumont (@ardumont) | 26 March 2020, 07:57:18 UTC | storage*: Align origin_visit_update interface and implementations This also adds an unused (yet) optional parameter date. It will soon be used in the context of origin_visit_update use. Related to T2310 | 31 March 2020, 12:42:08 UTC |
69862b0 | Antoine R. Dumont (@ardumont) | 27 March 2020, 14:08:35 UTC | storage: Define a now() function | 31 March 2020, 12:35:00 UTC |
8e8e3a9 | Antoine R. Dumont (@ardumont) | 31 March 2020, 12:34:22 UTC | test_retry: Remove unused import | 31 March 2020, 12:34:22 UTC |
c53433d | Antoine R. Dumont (@ardumont) | 31 March 2020, 10:51:26 UTC | test_retry: Use datetime instead of string Reuse a date_visit from the sample storage | 31 March 2020, 12:24:36 UTC |
623a1b7 | David Douard | 23 March 2020, 09:52:40 UTC | test: add a small test to check for type validation (using release_add) | 27 March 2020, 14:56:01 UTC |
90c4112 | David Douard | 20 March 2020, 10:11:07 UTC | validate: fix type annotation for origin_visit_add date argument is expected to be a datetime. | 27 March 2020, 08:50:12 UTC |
1916fd7 | David Douard | 20 March 2020, 10:05:19 UTC | validate: ensure StorageArgumentException is always encodable by embeding a string representation of the original Exception as StorageArgumentException args instead of the original exc.args since this can contain any python (possibli non-serializable) object. This is needed e.g. when swh.model has runtime type validation. | 27 March 2020, 08:50:12 UTC |
c67fe21 | David Douard | 20 March 2020, 10:03:17 UTC | writer: fix skipped_content_add type declaration to use SkippedContent instead of plain Content. | 27 March 2020, 08:50:12 UTC |
19be96f | David Douard | 20 March 2020, 09:48:37 UTC | tests: fix types of several test data sets these are currently accepted by swh.model, but won't be any more as soon as we activate type validation in swh.model. | 27 March 2020, 08:50:12 UTC |
2a44182 | Jenkins for Software Heritage | 27 March 2020, 06:13:17 UTC | Updated debian changelog for version 0.0.182 | 27 March 2020, 06:13:17 UTC |
7d39fc1 | Jenkins for Software Heritage | 27 March 2020, 06:13:16 UTC | Update upstream source from tag 'debian/upstream/0.0.182' Update to upstream version '0.0.182' with Debian dir e10d04af6d3c24db4411f8e97ade9720fceb3e6f | 27 March 2020, 06:13:16 UTC |
982023a | Jenkins for Software Heritage | 27 March 2020, 06:13:14 UTC | New upstream version 0.0.182 | 27 March 2020, 06:13:14 UTC |
3245bd6 | Antoine R. Dumont (@ardumont) | 27 March 2020, 06:04:38 UTC | d/control: Update dependencies | 27 March 2020, 06:04:38 UTC |
570dce2 | Nicolas Dandrimont | 26 March 2020, 19:42:39 UTC | Shut down cassandra connection before closing the fixture down | 26 March 2020, 19:43:30 UTC |
ce5d2bf | Antoine R. Dumont (@ardumont) | 26 March 2020, 13:36:45 UTC | storage*: Update origin_visit_update to make status parameter mandatory This actually aligns with the origin_visit model whose status is already mandatory. | 26 March 2020, 15:13:45 UTC |
40a7569 | Antoine R. Dumont (@ardumont) | 26 March 2020, 07:53:50 UTC | test: Adapt origin validation test according to latest model changes Origin model no longer allows to have a type. Related to f533f62bbf114cfcc29f7c72307c4dfbe99cf048 | 26 March 2020, 14:44:24 UTC |
0a22e72 | Valentin Lorentz | 26 March 2020, 12:04:01 UTC | Respec discovery_date as a Python datetime instead of an ISO string. For consistency with the rest of the API. | 26 March 2020, 12:04:13 UTC |
74fd15e | Valentin Lorentz | 26 March 2020, 09:45:56 UTC | origin_visit_add: Add missing db/cur argument to call to origin_get. | 26 March 2020, 10:07:58 UTC |
89c74c5 | Jenkins for Software Heritage | 25 March 2020, 09:03:43 UTC | Updated debian changelog for version 0.0.181 | 25 March 2020, 09:03:43 UTC |
6753cf9 | Jenkins for Software Heritage | 25 March 2020, 09:03:42 UTC | Update upstream source from tag 'debian/upstream/0.0.181' Update to upstream version '0.0.181' with Debian dir 05cd45067f1391d8d893f83132d8740c263cd505 | 25 March 2020, 09:03:42 UTC |
c99ec11 | Jenkins for Software Heritage | 25 March 2020, 09:03:41 UTC | New upstream version 0.0.181 | 25 March 2020, 09:03:41 UTC |
fd29fcb | Antoine R. Dumont (@ardumont) | 24 March 2020, 12:32:33 UTC | storage*: Hex encode content hashes in HashCollision exception Related to T2332#42793 | 24 March 2020, 17:40:48 UTC |
b7477e5 | Valentin Lorentz | 24 March 2020, 11:14:38 UTC | Add format of discovery_date in the metadata specification. It was not specified what the format should be. | 24 March 2020, 11:14:38 UTC |
92a87ea | Valentin Lorentz | 23 March 2020, 14:50:01 UTC | Store the value of token(partition_key) in skipped_content_by_* table, instead of three hashes. As was done for content_by_*. | 23 March 2020, 14:51:13 UTC |
a24ab3f | Valentin Lorentz | 10 March 2020, 12:51:15 UTC | Store the value of token(partition_key) in content_by_* table, instead of three hashes. That's a big win in terms of disk space, and shouldn't affect performance negatively. | 23 March 2020, 14:16:46 UTC |
0b5647d | Jenkins for Software Heritage | 18 March 2020, 17:45:36 UTC | Updated debian changelog for version 0.0.180 | 18 March 2020, 17:45:36 UTC |
36369d7 | Jenkins for Software Heritage | 18 March 2020, 17:45:35 UTC | Update upstream source from tag 'debian/upstream/0.0.180' Update to upstream version '0.0.180' with Debian dir e9ef7b4e7884a002290ee6dacce6fef26d5aae9b | 18 March 2020, 17:45:35 UTC |
a72370d | Jenkins for Software Heritage | 18 March 2020, 17:45:34 UTC | New upstream version 0.0.180 | 18 March 2020, 17:45:34 UTC |
456e15a | Nicolas Dandrimont | 18 March 2020, 17:10:36 UTC | Don't double-count added origins in origin_add origin_add_one already counts origins; this other send_metric would have us count added origins twice. | 18 March 2020, 17:10:36 UTC |
d99f08b | Nicolas Dandrimont | 18 March 2020, 17:07:53 UTC | Don't count origins len(url) times when calling origin_add_one I guess the `origins` variable name was carried over from a refactoring, but it doesn't match what db.origin_add actually returns. Overall this variable name made us overcount origins a little. | 18 March 2020, 17:08:32 UTC |
16ae048 | Jenkins for Software Heritage | 18 March 2020, 15:50:50 UTC | Updated debian changelog for version 0.0.179 | 18 March 2020, 15:50:50 UTC |
6c2843b | Jenkins for Software Heritage | 18 March 2020, 15:50:50 UTC | Update upstream source from tag 'debian/upstream/0.0.179' Update to upstream version '0.0.179' with Debian dir f2df377756aea261b40e75757d7bef152d6f5b9f | 18 March 2020, 15:50:50 UTC |