Revision history - None - origin: https://forge.softwareheritage.org/source/swh-scheduler.git

visit type:

Revision	Author	Date	Message	Commit Date
f092ed3	Jenkins for Software Heritage	28 April 2022, 09:36:24 UTC	Updated debian changelog for version 1.1.1	28 April 2022, 09:36:24 UTC
23ce0d9	Jenkins for Software Heritage	28 April 2022, 09:36:23 UTC	Update upstream source from tag 'debian/upstream/1.1.1' Update to upstream version '1.1.1' with Debian dir 51b9198d0925a58c5f477ee300095bd0c9e8f9b6	28 April 2022, 09:36:23 UTC
d9e982e	Jenkins for Software Heritage	28 April 2022, 09:36:23 UTC	New upstream version 1.1.1	28 April 2022, 09:36:23 UTC
82274c1	Valentin Lorentz	27 April 2022, 13:15:28 UTC	cli/utils: Fix parsing of empty strings	27 April 2022, 13:15:28 UTC
353cf2a	Valentin Lorentz	26 April 2022, 11:05:15 UTC	Bump mypy to v0.942	26 April 2022, 11:05:15 UTC
f642da4	Jenkins for Software Heritage	26 April 2022, 10:35:52 UTC	Updated debian changelog for version 1.1.0	26 April 2022, 10:35:52 UTC
d912c65	Jenkins for Software Heritage	26 April 2022, 10:35:51 UTC	Update upstream source from tag 'debian/upstream/1.1.0' Update to upstream version '1.1.0' with Debian dir 728c35186bf7d46bb2e39efbe69cf3e4981c7311	26 April 2022, 10:35:51 UTC
442fcdb	Jenkins for Software Heritage	26 April 2022, 10:35:50 UTC	New upstream version 1.1.0	26 April 2022, 10:35:50 UTC
0365b85	Valentin Lorentz	21 April 2022, 16:40:55 UTC	Add a 'lister_instance_name' argument to all tasks created from ListedOrigin This will allow loaders to use the right API credentials to fetch extrinsic metadata for the origin from the forge.	26 April 2022, 10:28:37 UTC
42e362d	Valentin Lorentz	21 April 2022, 10:22:03 UTC	Add a 'lister_name' argument to all tasks created from ListedOrigin This will allow loaders to guess the forge type, and use the right API to fetch extrinsic metadata for the origin from the forge.	26 April 2022, 10:28:33 UTC
3687931	David Douard	25 April 2022, 16:14:29 UTC	Update a bit the documentation for the new origin visit scheduler	26 April 2022, 08:38:05 UTC
9483493	Valentin Lorentz	21 April 2022, 09:22:48 UTC	Make create_origin_task_dict a standalone function It feels off as an object method; and I am going to make it use joins in a future commit, so it makes more sense this way.	21 April 2022, 15:15:06 UTC
5e9ee60	Valentin Lorentz	21 April 2022, 09:21:05 UTC	test_utils.py: Convert to pytest-style tests	21 April 2022, 11:47:58 UTC
9627e6d	Antoine Lambert	21 April 2022, 11:39:49 UTC	pre-commit: Remove codespell commit-msg hook That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it.	21 April 2022, 11:39:49 UTC
a76bb02	David Douard	15 April 2022, 16:08:49 UTC	Make scheduling policy used in schedule_recurrent configurable Add support for a configuration option "scheduling_policy" in the config file loaded by the 'swh scheduler schedule-recurrent' command. This config entry allows to specify the scheduling policies used by the schedule-recurrent tool, instead of having them hardcoded in the source code. A visit type policy config entry should have at least a 'weight' value for each policy. Default values are unchanged. Eg.: scheduling_policy: git: - policy: already_visited_order_by_lag weight: 55 tablesample: 0.5 - policy: never_visited_oldest_update_first weight: 45 tablesample: 0.5 Note: there may not be configuration entries for all visit types, but if a visit type policy is configured, the config entry should be complete (in other words, the merging of the configuration with the default values is only done at first config level).	20 April 2022, 14:34:23 UTC
5302efd	Antoine Lambert	08 April 2022, 13:15:35 UTC	Add .git-blame-ignore-revs file with automatic reformatting commits	08 April 2022, 13:15:35 UTC
3f0843b	Antoine Lambert	08 April 2022, 13:15:09 UTC	python: Reformat code with black 22.3.0 Related to T3922	08 April 2022, 13:15:09 UTC
d9a2512	Antoine Lambert	08 April 2022, 13:13:50 UTC	pre-commit, tox: Bump black from 19.10b0 to 22.3.0 black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922	08 April 2022, 13:13:50 UTC
bafe03f	Antoine Lambert	06 April 2022, 15:14:52 UTC	requirements-test: Remove pytest pinning to < 7 pytest-postgresql 3.1.3 and pytest-redis 2.4.0 added support for pytest >= 7 so we can now drop the pytest pinning.	06 April 2022, 15:14:52 UTC
78f5579	Antoine Lambert	22 March 2022, 10:58:10 UTC	pytest: Exclude build directory for tests discovery Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests.	22 March 2022, 10:58:10 UTC
fded717	Jenkins for Software Heritage	24 February 2022, 16:03:55 UTC	Updated debian changelog for version 1.0.0	24 February 2022, 16:03:55 UTC
87e54e3	Jenkins for Software Heritage	24 February 2022, 16:03:54 UTC	Update upstream source from tag 'debian/upstream/1.0.0' Update to upstream version '1.0.0' with Debian dir 7e7d67a960f191f55f41140f0b00c7a1fe6e30fc	24 February 2022, 16:03:54 UTC
a63dbac	Jenkins for Software Heritage	24 February 2022, 16:03:53 UTC	New upstream version 1.0.0	24 February 2022, 16:03:53 UTC
43794aa	David Douard	24 February 2022, 15:52:44 UTC	Prepare v1: bump dependency to swh.core 2 also match dependency on swh.storage with requirements-swh.txt	24 February 2022, 15:52:44 UTC
5cc62be	David Douard	08 February 2022, 13:59:29 UTC	Adapt to swh.core 2.0.0 - add the `get_datastore` function in `swh.scheduler` - add the `get_current_version` method in `SchedulerBackend`, - remove dbversion management from sql init script - update tests accordingly	24 February 2022, 14:51:44 UTC
234e165	Antoine Lambert	10 February 2022, 16:23:34 UTC	pre-commit: Bump hooks and add new one to check commit message spelling To install the new hook: $ pre-commit install -t commit-msg	10 February 2022, 16:23:34 UTC
fddec02	Antoine Lambert	09 February 2022, 13:22:06 UTC	requirements: Remove click version pin Latest versions of celery and flask now support click >= 8.0 so we can remove the version pin.	09 February 2022, 13:22:46 UTC
c46ffad	David Douard	08 February 2022, 16:26:17 UTC	Prefix task types used in tests with 'test-' so that tests do not depend on a lucky guess on what the scheduler db state actually is. DB initialization scripts do create task types for git, hg and svn (used in tests) but these tests depends on the fact the db fixture has been called already once before, so tables are truncated (especially the task and task_type ones). For example running a single test involved in task-type creation was failing (eg. 'pytest swh -k test_create_task_type_idempotence'). This commit does make tests not collide with any existing task or task type initialization scripts may create. Note that this also means that there is actually no test dealing with the scheduler db state after initialization, which is not grat and should be addressed.	08 February 2022, 16:34:10 UTC
9f601f5	Antoine R. Dumont (@ardumont)	07 February 2022, 15:46:47 UTC	requirements-test: Pin pytest to < 7.0.0 Related to T3916	07 February 2022, 15:47:00 UTC
ce11283	Valentin Lorentz	21 January 2022, 10:10:48 UTC	Fix ReST syntax	21 January 2022, 10:14:59 UTC
b5477ea	Antoine R. Dumont (@ardumont)	12 January 2022, 09:58:58 UTC	sql: Clean up task/task_run data model This archives current task and task_run tables, creating new ones filtering only necessary tasks (last 2 months' oneshot tasks plus some recurring tasks; lister, indexer, ...). Those filtered tasks are the ones scheduled by the runner and runner priority services. This archiving will allow those services to be faster (corresponding query execution time will outputs results faster without the archived data). Related to T3837	12 January 2022, 10:30:36 UTC
3b6e1d4	Jenkins for Software Heritage	06 January 2022, 08:39:47 UTC	Updated debian changelog for version 0.23.0	06 January 2022, 08:39:47 UTC
67e1896	Jenkins for Software Heritage	06 January 2022, 08:39:46 UTC	Update upstream source from tag 'debian/upstream/0.23.0' Update to upstream version '0.23.0' with Debian dir f7e1a8a1f5f6dc07dc335a3ea905631cb4f80385	06 January 2022, 08:39:46 UTC
4c9e164	Jenkins for Software Heritage	06 January 2022, 08:39:45 UTC	New upstream version 0.23.0	06 January 2022, 08:39:45 UTC
5c836d6	Vincent SELLIER	04 January 2022, 23:08:50 UTC	Allow to specify the visit grab parameters per visit type and policy Related to T3827	05 January 2022, 17:18:32 UTC
559f345	Antoine R. Dumont (@ardumont)	16 December 2021, 14:47:56 UTC	Pin mypy and drop type annotations which makes mypy unhappy This also drops spurious copyright headers to those files if present. Related to T3812	16 December 2021, 14:47:56 UTC
e051b32	Nicolas Dandrimont	09 December 2021, 13:54:09 UTC	Use a temporary table to update scheduler metrics When using ``insert into <...> select <...>``, PostgreSQL disables parallel querying. Under some circumstances (in our large production database), this makes updating the scheduler metrics take a (very) long time. Parallel querying is allowed for ``create table <...> as select <...>``, and doing so restores the small(er) runtimes for this query (15 minutes instead of multiple hours). To use that, we have to turn the function into plpgsql instead of plain sql.	09 December 2021, 14:16:06 UTC
a8edbdb	Antoine R. Dumont (@ardumont)	07 December 2021, 13:31:34 UTC	Clean up disabled scheduler archival task related services This is dead code now as this has long been stopped and disabled in production. Related to T3777	08 December 2021, 10:12:53 UTC
0086f5a	Jenkins for Software Heritage	08 December 2021, 09:06:02 UTC	Updated debian changelog for version 0.22.0	08 December 2021, 09:06:02 UTC
bced01c	Jenkins for Software Heritage	08 December 2021, 09:05:45 UTC	Update upstream source from tag 'debian/upstream/0.22.0' Update to upstream version '0.22.0' with Debian dir 6ee09dd6732003e781781fea731b4a981ee1d0f1	08 December 2021, 09:05:45 UTC
10d495b	Jenkins for Software Heritage	08 December 2021, 09:05:44 UTC	New upstream version 0.22.0	08 December 2021, 09:05:44 UTC
5de8ba4	Nicolas Dandrimont	07 December 2021, 12:57:51 UTC	Make next_visit_queue_position an integer In visit types with small amounts of origins having no last_update field, we would end up overflowing Python datetimes (which only go up to 31 December 9999) pretty quickly. Making the queue position a 64-bit integer should give us some more leeway. The queue position now defaults to zero instead of an arbitrary point in time. Queue offsets are still commensurate with seconds, but that's mostly to give them some space to be splayed by the fudge factors.	07 December 2021, 16:39:48 UTC
c5e514f	Jenkins for Software Heritage	07 December 2021, 07:45:41 UTC	Updated debian changelog for version 0.21.0	07 December 2021, 07:45:41 UTC
bedb322	Jenkins for Software Heritage	07 December 2021, 07:45:40 UTC	Update upstream source from tag 'debian/upstream/0.21.0' Update to upstream version '0.21.0' with Debian dir 25bb4a20c7da58a06e1f9b407b51f2101d951472	07 December 2021, 07:45:40 UTC
a7851ad	Jenkins for Software Heritage	07 December 2021, 07:45:39 UTC	New upstream version 0.21.0	07 December 2021, 07:45:39 UTC
0a6aac5	Vincent SELLIER	06 December 2021, 15:23:49 UTC	Ensure there is no duplicated origins in the insertion batches when a lister try to insert duplicate origins in the same batch, the insertion is failing because the "on cascade do update" instruction cannot manage duplicates in the same transaction Related to T3769	06 December 2021, 20:11:40 UTC
377716e	Jenkins for Software Heritage	22 November 2021, 15:14:57 UTC	Updated debian changelog for version 0.20.0	22 November 2021, 15:14:57 UTC
3a9bdba	Jenkins for Software Heritage	22 November 2021, 15:14:56 UTC	Update upstream source from tag 'debian/upstream/0.20.0' Update to upstream version '0.20.0' with Debian dir 648a27f772aa18cbaeb88421ad44cfbf18517068	22 November 2021, 15:14:56 UTC
a42dbf8	Jenkins for Software Heritage	22 November 2021, 15:14:55 UTC	New upstream version 0.20.0	22 November 2021, 15:14:55 UTC
2abb393	Valentin Lorentz	22 November 2021, 12:32:20 UTC	Fix CardinalityViolation in grab_next_visits on duplicate origins grab_next_visits grabs from `listed_origins`, whose primary key is `(lister_id, url, visit_type)` and uses it to upsert in origin_visit_stats, whose primary key is `(url, visit_type)`. This causes the error `ON CONFLICT DO UPDATE command cannot affect row a second time` when the same (origin, type) pair is grabbed twice. This commit deduplicates the (origin, type) pairs before upserting.	22 November 2021, 12:36:24 UTC
00ff02e	Nicolas Dandrimont	29 October 2021, 13:58:31 UTC	recurrent visits: use policy weights instead of ratios The ratios weren't checked for normalization; using relative weights explicitly ensures that the settings won't be misinterpreted.	29 October 2021, 13:58:31 UTC
7f434c3	Nicolas Dandrimont	29 October 2021, 13:44:56 UTC	Improve docs rendering for recurrent visits scheduler	29 October 2021, 13:44:56 UTC
83d25c5	Jenkins for Software Heritage	28 October 2021, 11:15:10 UTC	Updated debian changelog for version 0.19.0	28 October 2021, 11:15:10 UTC
e78e56d	Jenkins for Software Heritage	28 October 2021, 11:15:10 UTC	Update upstream source from tag 'debian/upstream/0.19.0' Update to upstream version '0.19.0' with Debian dir 25276af0f39eed8f7341a27f86fd12f2168c0212	28 October 2021, 11:15:10 UTC
ae3c3c6	Jenkins for Software Heritage	28 October 2021, 11:15:09 UTC	New upstream version 0.19.0	28 October 2021, 11:15:09 UTC
50d7fd7	Nicolas Dandrimont	27 October 2021, 10:09:42 UTC	Add a new cli endpoint to schedule recurrent visits in Celery For each known visit type, we run a loop which: - monitors the size of the relevant celery queue - schedules more visits of the relevant type once the number of available slots goes over a given threshold (currently set to 5% of the max queue size). The scheduling of visits combines multiple scheduling policies, for now using static ratios set in the `POLICY_RATIOS` dict. We emit a warning if the ratio of origins fetched for each policy is skewed with respect to the original request (allowing, for now, manual adjustement of the ratios). The CLI endpoint spawns one thread for each visit type, which all handle connections to RabbitMQ and the scheduler backend separately. For now, we handle exceptions in the visit scheduling threads by (stupidly) respawning the relevant thread directly. We should probably improve this to give up after a specific number of tries. Co-authored-by: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>	28 October 2021, 11:06:56 UTC
0c7ef27	Nicolas Dandrimont	27 October 2021, 13:45:09 UTC	grab_next_visits: avoid time interval calculations in PostgreSQL When the database is in a non-UTC timezone with DST, and a `timestamptz - interval` calculation crosses a DST change, the result of the calculation can be one hour off from the expected value: PostgreSQL will vary the timestamp by the amount of days in the interval, and will keep the same (local) time, which will be offset by an hour because of the DST change. Doing the datetime +- timedelta calculations in Python instead of PostgreSQL avoids this caveat altogether.	27 October 2021, 13:45:09 UTC
ecc0e28	Antoine R. Dumont (@ardumont)	22 October 2021, 08:44:08 UTC	Restrict the click version to avoid conflict version with celery's Otherwise, in some edge case, like run in docker, the install fails on conflict. Related to P1205#8092	22 October 2021, 09:21:36 UTC
243a69f	Antoine R. Dumont (@ardumont)	20 October 2021, 08:42:34 UTC	Add docstring to runner and listener modules Related to T3667	20 October 2021, 09:25:38 UTC
5b53196	Antoine R. Dumont (@ardumont)	20 October 2021, 08:44:34 UTC	Drop deprecated listener module It's been deprecated for enough time. Related to T3667	20 October 2021, 09:02:11 UTC
f15c510	Antoine R. Dumont (@ardumont)	20 October 2021, 07:36:27 UTC	scheduler: Deprecate unused main celery runner	20 October 2021, 08:31:28 UTC
628c955	Jenkins for Software Heritage	18 October 2021, 13:18:57 UTC	Updated debian changelog for version 0.18.2	18 October 2021, 13:18:57 UTC
9b989a8	Jenkins for Software Heritage	18 October 2021, 13:18:54 UTC	Update upstream source from tag 'debian/upstream/0.18.2' Update to upstream version '0.18.2' with Debian dir 064dff3f140dad4fe33f0893b48748460076b59a	18 October 2021, 13:18:54 UTC
4241b21	Jenkins for Software Heritage	18 October 2021, 13:18:53 UTC	New upstream version 0.18.2	18 October 2021, 13:18:53 UTC
3aed688	Antoine R. Dumont (@ardumont)	18 October 2021, 12:14:02 UTC	Use swh_storage fixture for cli tests This actually fixes the debian build failure. Related to T3666	18 October 2021, 12:16:38 UTC
ea19f6e	Jenkins for Software Heritage	15 October 2021, 13:53:38 UTC	Updated debian changelog for version 0.18.1	15 October 2021, 13:53:38 UTC
22cdd27	Jenkins for Software Heritage	15 October 2021, 13:53:35 UTC	New upstream version 0.18.1	15 October 2021, 13:53:35 UTC
34b60f4	Jenkins for Software Heritage	15 October 2021, 13:53:35 UTC	Update upstream source from tag 'debian/upstream/0.18.1' Update to upstream version '0.18.1' with Debian dir 33b5f1325f7a752172f38a90c9be255f3bac2402	15 October 2021, 13:53:35 UTC
3aed7bf	Antoine R. Dumont (@ardumont)	15 October 2021, 13:14:12 UTC	Return 0 slot if no more slots available in the queues This scenario happens with the loader oneshot for example. This loader deals with more than 1 type of origins to ingest in the same queue. So the computation of that function returned negative value [1]. Which is ultimately not possible to execute in sql [1]. This commits fixes that behavior. This also explicits that the function must return positive values in its docstring. [1] ``` ... psycopg2.errors.InvalidRowCountInLimitClause: LIMIT must not be negative ```	15 October 2021, 13:22:52 UTC
1a70fd6	Jenkins for Software Heritage	02 September 2021, 09:35:32 UTC	Updated debian changelog for version 0.18.0	02 September 2021, 09:35:32 UTC
22baf3f	Jenkins for Software Heritage	02 September 2021, 09:35:31 UTC	Update upstream source from tag 'debian/upstream/0.18.0' Update to upstream version '0.18.0' with Debian dir e98b6b91a4c915d0ad9f6ee1286273ed99ee3b5b	02 September 2021, 09:35:31 UTC
66bf492	Jenkins for Software Heritage	02 September 2021, 09:35:31 UTC	New upstream version 0.18.0	02 September 2021, 09:35:31 UTC
ecc1400	Antoine R. Dumont (@ardumont)	08 June 2021, 15:36:28 UTC	runner: Improve help message on the task types flag.	02 September 2021, 09:15:36 UTC
63fdda0	Antoine R. Dumont (@ardumont)	03 June 2021, 14:03:26 UTC	send-to-celery: Add more options to allow scheduling of edge cases In the non optimal case, we may want to trigger specific case (not-yet enabled origins, origin from specific lister...). Related to T3350	27 August 2021, 11:26:38 UTC
7cc37fa	Nicolas Dandrimont	01 June 2021, 17:17:16 UTC	Refine scheduling policy for origins with no known last update For origins that have never been visited, and for which we don't have a queue position yet, we want to visit them in the order they've been added.	26 August 2021, 14:49:37 UTC
2efad28	Nicolas Dandrimont	01 June 2021, 18:04:11 UTC	Add a swh scheduler origin send-to-celery subcommand The subcommand bypasses the legacy task-based mechanism to directly send new origin visits to celery	26 August 2021, 14:48:46 UTC
5e8007f	Nicolas Dandrimont	01 June 2021, 13:48:05 UTC	Add table sampling option to grab_next_visits Running common operations on all git origins is pretty intense. Using table sampling gives us the opportunity to at least schedule some jobs in (decently small) time.	26 August 2021, 14:47:52 UTC
cc76a57	Antoine R. Dumont (@ardumont)	26 August 2021, 09:44:14 UTC	journal_client: Only upsert if we have something to upsert	26 August 2021, 09:44:14 UTC
4053937	Jenkins for Software Heritage	26 August 2021, 08:41:41 UTC	Updated debian changelog for version 0.17.1	26 August 2021, 08:41:41 UTC
c36a724	Jenkins for Software Heritage	26 August 2021, 08:41:40 UTC	Update upstream source from tag 'debian/upstream/0.17.1' Update to upstream version '0.17.1' with Debian dir e98b6f8fc3c8547ef7148fd0b9915f432584ab81	26 August 2021, 08:41:40 UTC
d04dbb3	Jenkins for Software Heritage	26 August 2021, 08:41:40 UTC	New upstream version 0.17.1	26 August 2021, 08:41:40 UTC
506f78c	Antoine R. Dumont (@ardumont)	25 August 2021, 16:15:06 UTC	journal_client: Ensure queue position does not overflow Queue positions are date and the current next_position_offset used to compute the new queue position was not bounded. This has the side-effect of making overflow error. This commit adapts the journal client computations to limit such next_position_offset to 10. This value was chosen because above that exponent the dates overflow (and we are way in the future already). Related to T3502	26 August 2021, 08:24:11 UTC
28ae1d8	Valentin Lorentz	18 August 2021, 09:20:25 UTC	Replace index-fossology-license-for-range with index-fossology-license-for-partition We changed the task name/interface a while ago	18 August 2021, 09:20:25 UTC
fa762c5	Jenkins for Software Heritage	06 August 2021, 09:11:54 UTC	Updated debian changelog for version 0.17.0	06 August 2021, 09:11:54 UTC
416b2c5	Jenkins for Software Heritage	06 August 2021, 09:11:54 UTC	Update upstream source from tag 'debian/upstream/0.17.0' Update to upstream version '0.17.0' with Debian dir c55959218dbe52af5f47cb3541419dfb69d77945	06 August 2021, 09:11:54 UTC
3c61059	Jenkins for Software Heritage	06 August 2021, 09:11:53 UTC	New upstream version 0.17.0	06 August 2021, 09:11:53 UTC
8281e35	Antoine R. Dumont (@ardumont)	08 July 2021, 09:24:42 UTC	journal_client: Disable origins when too many visited attempts failed This disable origins for either failed or not found attempts 3 times in a row. It's not definitive though as it's the lister's responsibility to activate back origins if they get listed again. Related to T2345	03 August 2021, 11:56:32 UTC
1bcf84d	David Douard	07 July 2021, 14:55:57 UTC	Add a successive_visits counter to origin visit stats This maintains the number of successive visits resulting in the same status. This will help implementing disabling of too many successive failed or not_found visits for a given origin. Related to T2345	03 August 2021, 10:49:45 UTC
4fa29fe	Antoine R. Dumont (@ardumont)	30 July 2021, 13:35:17 UTC	journal_client: Update get_last_status docstring Related to T2345	30 July 2021, 13:35:17 UTC
3b929d0	Antoine R. Dumont (@ardumont)	30 July 2021, 13:23:14 UTC	journal_client: Refactor by inlining the update_position_offset This is no longer required as it's called once. Related to T2345	30 July 2021, 13:23:14 UTC
87e66fa	Nicolas Dandrimont	23 July 2021, 09:48:23 UTC	Only record last_visited and last_successful in origin_visit_stats After using this schema for a while, all queries can be implemented in terms of these two timestamps, instead of the four original last_eventful, last_uneventful, last_failed and last_notfound timestamps. This ends up simplifying the logic within the journal client, as well as that of the grab_next_visits query builder. To make this change work, we also stop considering out of order messages altogether in journal_client. This welcome simplification is an accuracy tradeoff that is explained in the updated documentation of the journal client: .. [1] Ignoring out of order messages makes the initialization of the origin_visit_status table (from a full journal) less deterministic: only the `last_visit`, `last_visit_state` and `last_successful` fields are guaranteed to be exact, the `next_position_offset` field is a best effort estimate (which should converge once the client has run for a while on in-order messages).	23 July 2021, 09:56:32 UTC
3ca0d65	Antoine R. Dumont (@ardumont)	23 July 2021, 07:22:46 UTC	test_journal_client: Unify test assertion like the rest Related to D5917	23 July 2021, 07:22:46 UTC
8cf2238	Antoine R. Dumont (@ardumont)	22 July 2021, 09:42:24 UTC	test: Refactor assert_visit_stats_ok to ignore_fields This simplifies and unifies properly the utility test function to compare visit stats.	23 July 2021, 07:18:20 UTC
d58776a	Antoine R. Dumont (@ardumont)	22 July 2021, 10:22:24 UTC	Introduce new scheduling policy to grab origins without last update This is in charge of scheduling origins without last update. This also updates the global queue position so the journal client can initialize correctly the next position per origin and visit type. Related to T2345	22 July 2021, 10:23:44 UTC
825e8cf	Nicolas Dandrimont	22 July 2021, 10:19:42 UTC	grab_next_visits: make the handling of CTEs more modular This allows us to insert extra CTEs if a scheduling policy needs it.	22 July 2021, 10:19:42 UTC
8c4ae9f	Antoine R. Dumont (@ardumont)	29 June 2021, 14:00:01 UTC	journal_client: Compute next position for origin visit For origin without any last_update information [1], the journal client is now also in charge of moving their next position in the queue for rescheduling. Depending on their status, the next position offset and next_visit_queue_position are updated after each visit completes: - if the visit has failed, increase the next visit target by the minimal visit interval (to take into account transient loading issues) - if the visit is successful, and records some changes, decrease the visit interval index by 2 (visit the origin way more often). - if the visit is successful, and records no changes, increase the visit interval index by 1 (visit the origin less often). We then set the next visit target to its current value + the new visit interval multiplied by a random fudge factor (picked in the -/+ 10% range). The fudge factor allows the visits to spread out, avoiding "bursts" of loaded origins e.g. when a number of origins from a single hoster are processed at once. Note that the computations happen for all origins for simplicity and code maintenance but it will only be used by a new soon-to-be scheduling policy. [1] Lister cannot provide it for some reason.	06 July 2021, 12:35:13 UTC
cb1edf1	Antoine R. Dumont (@ardumont)	23 June 2021, 16:07:59 UTC	Introduce storage for the recurrent visit scheduler queue position	01 July 2021, 08:36:44 UTC
ec6e69f	Antoine R. Dumont (@ardumont)	23 June 2021, 14:42:26 UTC	Start handling of recurrent loading tasks in scheduler This deals first and foremost with the next_position_offset update done by the scheduler journal client.	01 July 2021, 08:36:44 UTC
c486b28	Antoine R. Dumont (@ardumont)	29 June 2021, 12:41:07 UTC	journal_client: Explicit docstring	29 June 2021, 13:16:15 UTC
98f99b9	Antoine R. Dumont (@ardumont)	23 June 2021, 14:39:40 UTC	journal_client: Only check last_* fields for some permutation tests In a future commit, we will add new fields whose values will be permutation dependent.	23 June 2021, 15:02:34 UTC

Newer
Older