swh:1:snp:eb70f1f85391e4b077c211bec36af0061c4bf937

sort by:
Revision Author Date Message Commit Date
89c74c5 Updated debian changelog for version 0.0.181 25 March 2020, 09:03:43 UTC
6753cf9 Update upstream source from tag 'debian/upstream/0.0.181' Update to upstream version '0.0.181' with Debian dir 05cd45067f1391d8d893f83132d8740c263cd505 25 March 2020, 09:03:42 UTC
c99ec11 New upstream version 0.0.181 25 March 2020, 09:03:41 UTC
fd29fcb storage*: Hex encode content hashes in HashCollision exception Related to T2332#42793 24 March 2020, 17:40:48 UTC
b7477e5 Add format of discovery_date in the metadata specification. It was not specified what the format should be. 24 March 2020, 11:14:38 UTC
92a87ea Store the value of token(partition_key) in skipped_content_by_* table, instead of three hashes. As was done for content_by_*. 23 March 2020, 14:51:13 UTC
a24ab3f Store the value of token(partition_key) in content_by_* table, instead of three hashes. That's a big win in terms of disk space, and shouldn't affect performance negatively. 23 March 2020, 14:16:46 UTC
0b5647d Updated debian changelog for version 0.0.180 18 March 2020, 17:45:36 UTC
36369d7 Update upstream source from tag 'debian/upstream/0.0.180' Update to upstream version '0.0.180' with Debian dir e9ef7b4e7884a002290ee6dacce6fef26d5aae9b 18 March 2020, 17:45:35 UTC
a72370d New upstream version 0.0.180 18 March 2020, 17:45:34 UTC
456e15a Don't double-count added origins in origin_add origin_add_one already counts origins; this other send_metric would have us count added origins twice. 18 March 2020, 17:10:36 UTC
d99f08b Don't count origins len(url) times when calling origin_add_one I guess the `origins` variable name was carried over from a refactoring, but it doesn't match what db.origin_add actually returns. Overall this variable name made us overcount origins a little. 18 March 2020, 17:08:32 UTC
16ae048 Updated debian changelog for version 0.0.179 18 March 2020, 15:50:50 UTC
6c2843b Update upstream source from tag 'debian/upstream/0.0.179' Update to upstream version '0.0.179' with Debian dir f2df377756aea261b40e75757d7bef152d6f5b9f 18 March 2020, 15:50:50 UTC
b259b70 New upstream version 0.0.179 18 March 2020, 15:50:48 UTC
209de5d Serialize objstorage and database writes in content_add Considering that the objstorage is idempotent, and that there's no rollback feature, intermixing both "transactions" has no concrete benefit. This avoids doing database transactions that are longer than needed. 18 March 2020, 14:36:01 UTC
aaa0e54 Don't nest transactions in content_add/skipped_content_add Seems like the cur/db arguments have been missed in one of the various refactorings, creating separate transactions for these function calls. 18 March 2020, 11:04:46 UTC
1dbb732 Don't create a transaction for content_get_partition It just calls out to another function which, itself, creates a transaction. 18 March 2020, 11:04:05 UTC
9b3735b requirements-swh.txt: Use >= instead of == for swh-core version check This fixes installation of swh modules in virtualenv when executing "pip install $(./bin/pip-swh-packages)" in swh-environment. 17 March 2020, 14:34:25 UTC
eb4db6a Updated debian changelog for version 0.0.178 16 March 2020, 11:59:18 UTC
7963e96 Update upstream source from tag 'debian/upstream/0.0.178' Update to upstream version '0.0.178' with Debian dir e3a2678ea2292f15af293de3d2389a6d0c2d0316 16 March 2020, 11:59:18 UTC
0e68cbe New upstream version 0.0.178 16 March 2020, 11:59:16 UTC
da98f5f origin_visit_add: Adapt endpoint signature to return OriginVisit Prior to this commit, there was: - no signature in the method - discrepancy between checks on the different backend origin_visit_add endpoint is now typed ``` def origin_visit_add( self, origin_url: str, date: Union[str, datetime.datetime], type: str) -> OriginVisit: ``` This also: - renames appropriately the origin_url parameter (removing 1 FIXME) - align backend implementations' check which were different 13 March 2020, 13:09:42 UTC
0456cce origin_visit_upsert: Use OriginVisit object as input This aligns with other `_add` endpoints. Only the journal depends on this. Related to D2812#67298 12 March 2020, 18:05:26 UTC
aa39be1 storage/writer: refactor JournalWriter.content_add to send model objects to the journal writer, as it already does with other objet types (instead of dicts). 10 March 2020, 15:44:07 UTC
a97781d storage/validate: small code formatting 10 March 2020, 15:42:49 UTC
0725600 Updated debian changelog for version 0.0.177 10 March 2020, 10:48:12 UTC
b837ce7 Update upstream source from tag 'debian/upstream/0.0.177' Update to upstream version '0.0.177' with Debian dir 9dae82494d0a2b3c9b208b48ce91a71ba320f558 10 March 2020, 10:48:11 UTC
5ab89b0 New upstream version 0.0.177 10 March 2020, 10:48:10 UTC
05a4fca storage: Identify and provide the collision hashes in exception This matches, what's done in other storage backends. There is no consistency for now though, storage backends provides as HashCollision exception parameters the content information as: - cassandra: (algo: str, hash_id: bytes, Content as cassandra Row) - in_memory: (algo: str, hash_id: bytes, Content as Tuple[str, bytes] - pgstorage: algo: Optional[str] Opening this diff to discuss how to properly land this. 10 March 2020, 08:27:55 UTC
88fe942 Guarantee the order of results for revision_get and release_get It's a bit silly, but we depend on it for some tests. 09 March 2020, 15:53:34 UTC
7ee3972 Mock calls to time.sleep in retry tests. It makes the tests faster. 06 March 2020, 13:59:21 UTC
6fe9de4 Fix retry tests. mock_memory.has_calls does not exist; so calling it returns a MagicMock. 06 March 2020, 13:58:33 UTC
3b8b718 sql: do not attempt to create the plpgsql lang if already exists This is needed in case the pg user is not super user (but language already exists.) 06 March 2020, 08:29:09 UTC
8e41bcc Update requirement on swh.core for RPCClient method overrides 02 March 2020, 14:18:02 UTC
343fd0e d/changelog: Update build dependencies 02 March 2020, 13:36:58 UTC
f5ae5f7 d/control: Update swh dependencies 02 March 2020, 13:35:03 UTC
f88e1be Updated debian changelog for version 0.0.176 28 February 2020, 15:21:27 UTC
38f7b69 Update upstream source from tag 'debian/upstream/0.0.176' Update to upstream version '0.0.176' with Debian dir a3a4171140665fb62502d24d675e0aa69835b421 28 February 2020, 15:21:26 UTC
2b64be8 New upstream version 0.0.176 28 February 2020, 15:21:25 UTC
5222352 Use Content.hashes() instead of Content.to_dict() where it makes sense. .hashes() returns a subset of .to_dict(), so it was accidentally used instead. 27 February 2020, 15:59:52 UTC
d096542 Make the RPC client and objstorage helper fetch Content.data. This is needed when a lazy subclass of Content is used, eg. from swh.model.from_disk. 27 February 2020, 15:58:00 UTC
3996e5d Move ctime out of the validation proxy. It's not the right place to set the ctime (it should be on the server side). 27 February 2020, 15:56:06 UTC
caf51a0 Accept cassandra-driver >= 3.22. The bug that affected us in 3.21 is resolved ( https://datastax-oss.atlassian.net/browse/PYTHON-1205 ), so we can now use v3.22 to get wheels. 27 February 2020, 15:04:08 UTC
7c7a0f2 Updated debian changelog for version 0.0.175 20 February 2020, 13:18:34 UTC
b072463 Update upstream source from tag 'debian/upstream/0.0.175' Update to upstream version '0.0.175' with Debian dir 2763e145c9d6b32840bf5d673ee30300a243fa20 20 February 2020, 13:18:33 UTC
79e1f7c New upstream version 0.0.175 20 February 2020, 13:18:32 UTC
b093a5a retry: Add support for tenacity < 5.0 This fixes swh-storage debian package build on buster as python3-tenacity version is 4.12. 20 February 2020, 12:38:48 UTC
9f166f3 Updated debian changelog for version 0.0.174 19 February 2020, 15:00:32 UTC
4a3c8f3 Update upstream source from tag 'debian/upstream/0.0.174' Update to upstream version '0.0.174' with Debian dir 8d31e83ca5c9c40500801331887fb0e85a0ad5e2 19 February 2020, 15:00:31 UTC
a20779f New upstream version 0.0.174 19 February 2020, 15:00:30 UTC
7cf0864 Add support for (de)serializing swh-model in RPC calls. This allows running the validating proxy on the client side instead of the server side. 18 February 2020, 13:22:33 UTC
80befa5 Make storage proxies use swh-model objects instead of dicts. This means that instead of having the validation proxy right before the backend class, it must now be at the beginning of pipelines. 18 February 2020, 12:45:46 UTC
29b0948 Fix FilteringProxy to not drop skipped-contents with a missing sha1_git. Passes them all to the backend instead of silently dropping them all if any of them is not missing. 18 February 2020, 12:45:22 UTC
51b2016 Fix inconsistent behavior of skipped_content_missing across backends. Two fixes: * in-mem ignored None keys * cassandra yielded input dicts as-is instead of a dict with just the hashes 14 February 2020, 16:49:05 UTC
03c2a02 Re-raise StorageArgumentException through API calls. So clients will get a nice exception looking like the original one, instead of generic RemoteApiError. 12 February 2020, 15:51:26 UTC
fa4dac4 Updated debian changelog for version 0.0.172 12 February 2020, 13:13:47 UTC
65f1c6e Update upstream source from tag 'debian/upstream/0.0.172' Update to upstream version '0.0.172' with Debian dir 355e8d105e8ea7d1af1c1238c772d8e57697fe84 12 February 2020, 13:13:47 UTC
b668651 New upstream version 0.0.172 12 February 2020, 13:13:45 UTC
652ecf0 storages: Refactor journal operations with a dedicated writer collab Prior to this commit, the code was triplicated across the storage backends. Now all storages use the same collaborator whose concern is writing to the storage. Could be a stepping stone to make that a proxy storage. 11 February 2020, 14:40:36 UTC
adcbf95 Fix RecursionError when storage proxies are deepcopied or unpickled. They both get attributes (eg. __setattr__) before setting any attribute, and don't call the constructor; so self.storage is not set when __getattr__ is called for the first times. 11 February 2020, 13:12:12 UTC
a2e565d d/control: Update runtime dependency 10 February 2020, 17:43:05 UTC
de7797d tests: Remove print statement 10 February 2020, 16:18:33 UTC
5b3c940 storages: Refactor objstorage operations with a dedicated collaborator Prior to this commit, the code was triplicated across the storage backends. Now all storages use the same collaborator whose concern is writing to the objstorage. 10 February 2020, 12:38:29 UTC
68ff23c Add a validation proxy for _add() methods. It converts input dictionaries into swh-model objects, which validates them and raises an appropriate error. This removes duplicated validation code that is currently present in all three storage backends. It also uses well-defined object types instead of loose dicts, which gives these _add() methods a more strict type. 10 February 2020, 11:03:49 UTC
6383637 Unify exception raised by invalid input to API endpoints. This is a first step toward not pickling exceptions. 07 February 2020, 10:40:23 UTC
5a6b025 Updated debian changelog for version 0.0.171 06 February 2020, 14:07:38 UTC
ae6befc Update upstream source from tag 'debian/upstream/0.0.171' Update to upstream version '0.0.171' with Debian dir 55757ca0c14457bc364a00b1f333b4c38160f5b2 06 February 2020, 14:07:37 UTC
be225df New upstream version 0.0.171 06 February 2020, 14:07:35 UTC
2b029b7 Split 'content_add' method into 'content_add' and 'skipped_content_add'. Respectively to add present content and skipped content. This simplifies the logic of both methods, and is a necessary step to typing / using swh-model objects everywhere, as contents have quite different attributes depending on whether they are present or missing. 06 February 2020, 13:29:31 UTC
93ea487 Increase Cassandra requests timeout to 1 second. 100ms worked fine so far, but we're starting to get some timeouts on the Azure test cluster. Multiplying the timeout by 10 should give us ample room to work with. 04 February 2020, 12:38:32 UTC
4a0a055 d/changelog: Bump new release 03 February 2020, 16:33:29 UTC
1d233df d/control: Update build dependencies Forcing use of openjdk-11-jre as there are issues to make cassandra run with the default dependency (openjdk-8-jre). 03 February 2020, 16:33:29 UTC
a119a7a d/changelog: Update dependencies Build will still fail but we will have more information to debug 03 February 2020, 15:01:20 UTC
41ca2bf d/control: Add build dependencies 03 February 2020, 13:36:08 UTC
bfe4586 Updated debian changelog for version 0.0.170 03 February 2020, 13:23:48 UTC
2812822 Update upstream source from tag 'debian/upstream/0.0.170' Update to upstream version '0.0.170' with Debian dir 08005486c202dfaa978bf8aadb9469768d769e1e 03 February 2020, 13:23:47 UTC
a66e16c New upstream version 0.0.170 03 February 2020, 13:23:46 UTC
b315f9d Tune Cassandra test config for lower test latency. 03 February 2020, 12:31:28 UTC
25941d5 Make tests reuse the same keyspace/schema instead of recreating it for each test. This makes tests run 16 times faster than https://forge.softwareheritage.org/D2612 (which is itself 3 times faster than this commit's parent) 03 February 2020, 11:26:24 UTC
eb155ad Add Cassandra backend. 31 January 2020, 15:05:53 UTC
ba373c9 Updated debian changelog for version 0.0.169 30 January 2020, 13:26:23 UTC
7dc7029 Update upstream source from tag 'debian/upstream/0.0.169' Update to upstream version '0.0.169' with Debian dir aae69d992484644229881c13c543f8489e58a53c 30 January 2020, 13:26:22 UTC
523f2eb New upstream version 0.0.169 30 January 2020, 13:26:21 UTC
cf45ec6 retry: Add retry behavior on pipeline storage with flushing failure Currently, wrong "hash collisions" are happening a lot on ingestion [1] [2] [3] The last loading step (flush) is failing on most loaders (git, npm, etc...). This commits adds the retry behavior to the current pipeline storage deployed. Which should decrease the frequency of that error. The remaining hash collision which won't subside should be then real hash collisions. [1] https://sentry.softwareheritage.org/share/issue/102aace238fe4ba6b49bcc5531f7c2bf/ [2] https://sentry.softwareheritage.org/share/issue/8e8b48a1d94c465b8109e76311ecdbe7/ [3] https://sentry.softwareheritage.org/share/issue/d4f1208b7eec4b43b11e38494ff039cc/ 30 January 2020, 11:22:16 UTC
0dc1626 Updated debian changelog for version 0.0.168 30 January 2020, 10:25:31 UTC
0c6e828 Update upstream source from tag 'debian/upstream/0.0.168' Update to upstream version '0.0.168' with Debian dir 83be097d86cd3d927e896c561f682dfeae685245 30 January 2020, 10:25:30 UTC
3e6d2bf New upstream version 0.0.168 30 January 2020, 10:25:29 UTC
1608fcd Allow deprecated endpoints to be missing from a backend class. 29 January 2020, 15:50:10 UTC
68702b5 CONTRIBUTORS: add Daniele Serafini 29 January 2020, 13:24:09 UTC
32d455b Rename in_memory.Storage to in_memory.InMemoryStorage. For consistency with the other class names. 29 January 2020, 12:34:38 UTC
d4fb270 Move Storage documentation and endpoint paths to a new StorageInterface class Documentation was duplicated between the in-mem and postgresql storage, and one of them regularly goes out of date. This deduplicates them both to a new class. This new class is also the one declaring the API paths, as it did not make sense to have this declaration on the postgresql storage. Last but not least, this commit adds a test that checks backend classes have all the functions, and they have exactly the same signature as the interface. This will catch stupid bugs before production, eg. if an argument does not have the same name in all classes. 29 January 2020, 11:16:55 UTC
1775edd in_memory: Fix content_get_metadata when there is no 'data' key. 27 January 2020, 16:37:57 UTC
0f51e8a Remove cur/db arguments from the in-mem storage. They shouldn't be there; bad copy-pasting. 24 January 2020, 15:53:27 UTC
1cd53c1 Implement content_update for the in-mem storage. 24 January 2020, 15:47:10 UTC
7245ee9 Updated debian changelog for version 0.0.167 24 January 2020, 14:01:58 UTC
e180f2a Update upstream source from tag 'debian/upstream/0.0.167' Update to upstream version '0.0.167' with Debian dir e797a96b300338124b09f8b75bef2996fc8220d9 24 January 2020, 14:01:57 UTC
e62d6e4 New upstream version 0.0.167 24 January 2020, 14:01:56 UTC
c8389c2 146: Fix typo 24 January 2020, 13:54:26 UTC
2ebcdf3 pgstorage: Empty temp tables instead of dropping them Due to our pattern of adding objects [1], vacuum is triggered regularly on pg_catalog.*, having an heavy impact on performance. This commit tries to avoid the dropping the temporary tables part, emptying them instead (they still are dropped at the end of the session but less often). This should decrease the bloat on pg_catalog.* tables. [1] - create temporary table - insert data from temporary table to production table with filtering - drop temporary table 24 January 2020, 11:14:00 UTC
back to top