1006f0a | Antoine R. Dumont (@ardumont) | 23 June 2021, 13:37:40 UTC | journal_client: Auto-generate the empty object from model fields This will help us when adding new fields to the table. | 23 June 2021, 14:54:34 UTC |
6400cc2 | Antoine R. Dumont (@ardumont) | 23 June 2021, 13:28:55 UTC | backend: Auto-generate origin visit stats upsert query This will help us when adding new fields to the table. | 23 June 2021, 14:54:34 UTC |
3762c34 | Antoine R. Dumont (@ardumont) | 23 June 2021, 14:49:12 UTC | cli/task: Ensure cli output is always in the same order | 23 June 2021, 14:54:34 UTC |
ed81870 | Nicolas Dandrimont | 01 June 2021, 13:47:19 UTC | Add a specific cooldown for notfound origins This allows us to avoid repeating visits on them, until a next pass of the lister can mark them as disabled. | 23 June 2021, 09:13:00 UTC |
651ddcc | Nicolas Dandrimont | 21 June 2021, 15:34:00 UTC | Add a (longer) specific cooldown for failed origin visits | 23 June 2021, 09:13:00 UTC |
ce8608d | Nicolas Dandrimont | 21 June 2021, 15:36:00 UTC | Make the origin visit scheduling cooldown configurable | 23 June 2021, 09:13:00 UTC |
7f51f27 | Antoine Lambert | 22 June 2021, 12:35:50 UTC | interface: Add get_listers method Add new method to scheduler interface returning the full list of listers registered in the database. Related to T3127 | 22 June 2021, 12:36:08 UTC |
9e1b414 | Nicolas Dandrimont | 01 June 2021, 13:43:32 UTC | Drop duplicate docstring from backend | 21 June 2021, 13:46:12 UTC |
c7707b5 | Antoine R. Dumont (@ardumont) | 08 June 2021, 15:36:28 UTC | runner: Separate scheduling tasks with and without priority concern In effect, this will allow to run 2 runners: - one for recurring tasks - one for the save code now This should decrease the probability of the scheduling tasks for the save code now to be stuck behind the main scheduler runner. Related to T3367 | 10 June 2021, 12:55:04 UTC |
21c4279 | Antoine R. Dumont (@ardumont) | 10 June 2021, 10:15:22 UTC | Refactor and extract a get_available_slots utility This adds coverage as well. This will be needed for subsidiary diffs. Related to T3367 | 10 June 2021, 10:15:22 UTC |
9d2618d | Antoine R. Dumont (@ardumont) | 09 June 2021, 10:29:38 UTC | Add typing stubs dependencies for mypy>0.900 This also explicits missing dependencies | 09 June 2021, 12:13:36 UTC |
9f7ab8f | Antoine Lambert | 25 May 2021, 10:15:04 UTC | pytest_plugin: Explicitly set hostname in broker_url for celery TestApp Since the release of kombu 5.1.0, a warning is now issued when a hostname is not set in the broker_url config value of a celery app. That change makes the test_celery_monitor_ping test fails due to that new unexpected warning. So explicitly add localhost hostname in the broker_url value of the celery TestApp config. | 25 May 2021, 11:43:03 UTC |
fe9d949 | Valentin Lorentz | 06 May 2021, 14:16:50 UTC | Fix flaky test_grab_ready_* tests | 06 May 2021, 14:20:57 UTC |
8a892e2 | Valentin Lorentz | 06 May 2021, 12:24:47 UTC | Use swh.core 0.14 It renamed db_name to dbname, which is a breaking change. | 06 May 2021, 13:49:47 UTC |
bab557e | Nicolas Dandrimont | 26 April 2021, 10:55:00 UTC | Remove row locking from SQL queries This would only be useful if we had multiple runners running concurrently, but that's not the case. | 30 April 2021, 18:13:38 UTC |
feff179 | Antoine Lambert | 26 April 2021, 16:01:59 UTC | tox: Add sphinx environments to check sane doc build Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258 | 26 April 2021, 16:01:59 UTC |
f186910 | Antoine R. Dumont (@ardumont) | 20 April 2021, 13:47:08 UTC | Add default index to task(type, next_run) in schema The staging scheduler runner was slow when fetching task due to that missing index. Related to T3271#63831 | 20 April 2021, 13:50:19 UTC |
f33f743 | Valentin Lorentz | 20 April 2021, 09:46:53 UTC | Simplify priority computation in tests + improve exhaustivity We no longer need to deal with ratios, so let's count the objects directly instead. Plus, the existing tests did not check tasks with None priority (because they did not have access to it when ratios were given by the backend), so they do now. | 20 April 2021, 11:01:33 UTC |
f4e6292 | Antoine R. Dumont (@ardumont) | 20 April 2021, 10:18:23 UTC | sql/updates/27: Fix sql upgrade script Related to TT3271 | 20 April 2021, 10:18:23 UTC |
befccb9 | Antoine R. Dumont (@ardumont) | 19 April 2021, 10:26:58 UTC | scheduler: Clean up priority/ratio task dead code Since [1], tasks with priority are routed to dedicated queues (see tasks for more details). The tasks with priority to be scheduled have their own dedicated endpoints to be called. [1] Related to T3084 Related to T3271 | 20 April 2021, 09:27:18 UTC |
4e06bcd | Valentin Lorentz | 20 April 2021, 08:59:53 UTC | Parse task_ids before calling set_status_tasks. So errors on the CLI side do not trigger an exception on the server | 20 April 2021, 09:19:52 UTC |
974c0c2 | Antoine R. Dumont (@ardumont) | 15 April 2021, 11:22:11 UTC | tests: Complete checks on message with priority consumption Related to T3084 | 15 April 2021, 12:57:25 UTC |
17052c4 | Antoine R. Dumont (@ardumont) | 14 April 2021, 10:49:47 UTC | Route priority tasks to dedicated save code now queues This splits the calls to read tasks into 2 calls, one for tasks with no priority (standard), another call for tasks with priority. If any tasks with priority are detected, they are routed to dedicated `save_code_now:` prefixed named queues (per task type). Related to T3084 | 15 April 2021, 11:24:13 UTC |
bfc1a87 | Valentin Lorentz | 13 April 2021, 19:51:42 UTC | Fix various Sphinx warnings | 15 April 2021, 08:19:50 UTC |
3e2ae3d | Antoine R. Dumont (@ardumont) | 13 April 2021, 15:16:37 UTC | backend: Open endpoints to peek/grab tasks with any priority The priority notion becomes a blur. Any tasks with a non null priority is considered for reading or grabbing. In a future commit, this should allow to make the runner evolve to reroute tasks with priority to other queues. Related to T3084 | 13 April 2021, 16:05:29 UTC |
ecab745 | Nicolas Dandrimont | 11 February 2021, 18:16:19 UTC | Make origin_visit_stats_get return results from all pages psycopg2.extras.execute_values executes queries in batches of 100 by default. At the end of execute_values, only the last batch of results is available in the cursor; To fetch all results, one needs to set fetch=True instead of using the cursor. | 11 February 2021, 18:39:29 UTC |
86ada44 | Nicolas Dandrimont | 09 February 2021, 09:36:23 UTC | journal client: Filter out status messages without type This allows us to support reading the journal from the beginning, ignoring messages with the old schema. | 11 February 2021, 18:38:44 UTC |
cdb1775 | Nicolas Dandrimont | 09 February 2021, 09:33:19 UTC | Simplify max_date() The built-in `max` function can take an iterable directly, no need to reimplement it. | 11 February 2021, 18:24:01 UTC |
cf32e37 | Vincent SELLIER | 09 February 2021, 13:56:56 UTC | journal_client: Fix date computations for (un)eventful visits Fix a wrong computation when several messages (>=3) for the same snapshot are received in the wrong order For example, before the fix, the following occurs: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2021** | **2020** | S2 | ``` as it should be: ``` | date | snapshot | | last_ev | last_unev | Snap | | ---- | -------- | --- | -------- | --------- | ---- | | 2022 | S2 | | 2022 | | S2 | | 2020 | S2 | | 2020 | 2022 | S2 | | 2021 | S2 | | **2020** | **2022** | S2 | ``` Related to T3000 | 09 February 2021, 17:10:46 UTC |
aa507ac | Antoine R. Dumont (@ardumont) | 05 February 2021, 13:31:26 UTC | journal_client: Deal with failed status message As loader will start to create failed status message, deal with them if any. Related to T3030 | 05 February 2021, 14:06:48 UTC |
14feab9 | Nicolas Dandrimont | 03 February 2021, 18:51:30 UTC | celery: acknowledge tasks as soon as they're received With late acknowledgements, RabbitMQ will re-send tasks to clients even if they can't ever complete the task (e.g. when the task gets killed because the machine is out of memory). This problem only increases over time, leading to complete starvation of the ingestion system. Now that we have multiple mechanisms to issue retries of tasks, we can use early acknowledgements for tasks instead, which should mitigate the ongoing starvation, at the expense of having to retry tasks externally. | 03 February 2021, 19:10:26 UTC |
aaffff2 | David Douard | 22 January 2021, 11:17:00 UTC | Simulator: allow to export results in a csv file | 01 February 2021, 14:37:31 UTC |
9fce3f6 | David Douard | 01 February 2021, 14:36:16 UTC | Add minimal tests for the SimulationReport.format() method | 01 February 2021, 14:37:31 UTC |
aaf7dd6 | David Douard | 22 January 2021, 11:15:47 UTC | Make plottings optional in simulator cli output | 29 January 2021, 15:00:36 UTC |
cf0583b | Nicolas Dandrimont | 21 January 2021, 16:55:43 UTC | simulator: stop validating the scheduling policy in the CLI We already do that in the scheduler backend function | 26 January 2021, 12:33:16 UTC |
ebb5847 | Nicolas Dandrimont | 21 January 2021, 16:55:16 UTC | Run simulator tests on all known scheduling policies | 26 January 2021, 12:33:05 UTC |
1f77521 | Nicolas Dandrimont | 21 January 2021, 16:48:38 UTC | simulator: record visit metrics alongside scheduler metrics This allows us to check the behavior of the archive over time in terms of number of visits. | 26 January 2021, 12:32:54 UTC |
8898394 | Nicolas Dandrimont | 21 January 2021, 16:45:23 UTC | simulator: stop using the database as a cache for origin data This was a significant bottleneck of the simulator. To work around this, we: - Generate snapshot ids consistently in the OriginModel - Cache the origin data locally in the simulator, to compute the eventfulness of visits - Cache the last visit time for all origins to compute the estimated run time of visit tasks. | 26 January 2021, 12:31:57 UTC |
c92ead5 | Nicolas Dandrimont | 21 January 2021, 16:31:43 UTC | grab_next_visits: don't re-schedule visits too fast The earlier implementation would just schedule new visits for origins forever, regardless of whether they were already scheduled or not. | 26 January 2021, 12:20:39 UTC |
2b39cbc | Nicolas Dandrimont | 21 January 2021, 16:29:45 UTC | Allow overriding the timestamp of grab_next_visits This makes the simulator behavior more consistent with reality. | 26 January 2021, 12:20:39 UTC |
7ffbdd1 | Nicolas Dandrimont | 21 January 2021, 16:27:40 UTC | Construct grab_next_visits query arguments incrementally | 26 January 2021, 12:20:39 UTC |
ea068b4 | Valentin Lorentz | 21 January 2021, 13:57:42 UTC | simulator: add simple lister simulation | 26 January 2021, 12:20:39 UTC |
7af98e2 | Valentin Lorentz | 21 January 2021, 13:54:53 UTC | Factor out ListedOrigin generation to use the OriginModel This generates consistent last_update values according to the model and simulated time. | 25 January 2021, 13:39:30 UTC |
2906b4e | Antoine Lambert | 22 January 2021, 10:24:49 UTC | model/ListedOrigin: Set extra_loader_arguments type to Dict[str, Any] Some loaders, for instance the debian one, can have non string arguments so change the extra_loader_arguments type of the ListedOrigin model to something more generic. Related to T2979 | 25 January 2021, 13:10:25 UTC |
3d13cda | Vincent SELLIER | 21 January 2021, 17:49:38 UTC | Solve uneventful/eventful with unordered messages with snapshots Fix the case: m1: date2/snapshot1 m2: date1/snaptshot1 which results to: last_eventful = date2 last_uneventful = date2 The upsert was always keeping the most recent date when the eventful/uneventful dates were switched Related to T2978 | 23 January 2021, 18:57:17 UTC |
d528998 | Vincent SELLIER | 21 January 2021, 15:15:26 UTC | Do not consider duplicated messages as uneventful event Avoid to copy the eventful date to the uneventful date when a duplicated message (same date/same snapshot) is received, related to T2978 | 23 January 2021, 18:57:17 UTC |
86b2555 | David Douard | 21 January 2021, 10:30:21 UTC | Add a --num-origins option to the fill-test-data cli command | 22 January 2021, 13:10:59 UTC |
abb513c | David Douard | 22 January 2021, 09:47:18 UTC | Simulation: log at info level recorded metrics this allows to follows what the simulation is doing. | 22 January 2021, 13:08:30 UTC |
b93aa5b | Valentin Lorentz | 21 January 2021, 12:01:53 UTC | Make PaginatedListedOriginList a concretization of PagedResult 1. consistent with swh-storage and swh-indexer-storage 2. we can use swh.core.api.classes.stream_results on scheduler.get_listed_origins. | 21 January 2021, 13:26:39 UTC |
2f47936 | Nicolas Dandrimont | 20 January 2021, 16:29:16 UTC | Add scheduling policy for already visited origins with known last update This policy schedules origins by decreasing order of "visit lag" (that is, origins with the most lag are scheduled first). | 21 January 2021, 12:02:39 UTC |
acad712 | Nicolas Dandrimont | 20 January 2021, 16:25:46 UTC | Add scheduling policy for never visited origins This policy orders never visited origins by increasing date of last update (scheduling the "oldest" never visited origins first). | 21 January 2021, 12:02:39 UTC |
0346020 | Nicolas Dandrimont | 20 January 2021, 16:23:03 UTC | Reorganize grab_next_visits tests to better check sorting behavior - factor out test setup and results checking - properly exercize corner cases of the oldest_scheduled_first policy | 21 January 2021, 12:02:39 UTC |
af37898 | Valentin Lorentz | 21 January 2021, 11:04:55 UTC | Run Black. It wasn't ran on d464b4cc1f9ae6a5c5c94a534826eff5cc27f12f. | 21 January 2021, 11:04:55 UTC |
b641ac8 | Nicolas Dandrimont | 20 January 2021, 16:17:17 UTC | Make the grab_next_visits sql query modular This will allow us to easily plug new scheduling policies in that function. | 21 January 2021, 10:32:33 UTC |
9fb0dd6 | Antoine R. Dumont (@ardumont) | 19 January 2021, 17:45:36 UTC | journal_client: Read visit_stats entries by batch out of the loop Related to T2967 | 21 January 2021, 09:53:48 UTC |
d464b4c | Antoine R. Dumont (@ardumont) | 19 January 2021, 17:12:33 UTC | scheduler: Make origin_visit_stats_get read multiple entries Related to T2967 | 21 January 2021, 09:53:46 UTC |
ffe2aed | David Douard | 20 January 2021, 11:32:40 UTC | Simplify journal client tests - sort visits by default (there is a test dedicated to dealing with unsorted messagaes from the journal), - remove "intermediate checks" in several tests: these do not help much but make the code more difficult to read and maintain, - rename VISIT_STATUSES1 as VISIT_STATUSES_1 to make less prone to being confused with VISIT_STATUSES (which also exists). | 20 January 2021, 17:02:57 UTC |
c7b740c | David Douard | 20 January 2021, 16:39:37 UTC | Revert "Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py" This reverts commit b03d978241a67e741e0f62696a0bbca17d768271. It's actually not needed, after all... | 20 January 2021, 17:01:51 UTC |
898820f | Nicolas Dandrimont | 20 January 2021, 11:11:05 UTC | simulator: collect and plot scheduler metrics over time For now, only plot the known_origins and origins_never_visited metrics. | 20 January 2021, 16:37:44 UTC |
9ce68f8 | Valentin Lorentz | 19 January 2021, 17:36:53 UTC | simulator: stop using get_scheduler directly This reuses the scheduler instantiated by the cli instead of hardcoding our own using the PG* variables. | 20 January 2021, 16:37:44 UTC |
88e0b42 | Valentin Lorentz | 19 January 2021, 15:32:27 UTC | simulator: Add documentation. | 20 January 2021, 16:37:44 UTC |
62c6d90 | Valentin Lorentz | 19 January 2021, 15:17:24 UTC | simulator: Make min_batch_size a parameter defined in the setup. | 20 January 2021, 16:37:44 UTC |
9468bb9 | Nicolas Dandrimont | 18 January 2021, 12:51:35 UTC | simulator: add basic tests for fill_test_data and run | 20 January 2021, 16:37:44 UTC |
ead7b34 | Nicolas Dandrimont | 15 January 2021, 15:33:43 UTC | simulator: implement a simulator for the "old" task-based scheduler We extend the Task object with an autogenerated uuid allowing us to track the task lifetime between its creation and the generation of visit statuses, as the task-based scheduler does. | 20 January 2021, 16:37:44 UTC |
aecd27e | Nicolas Dandrimont | 15 January 2021, 15:31:42 UTC | Move the simulator cli to the main cli module | 20 January 2021, 16:37:44 UTC |
05067e3 | Nicolas Dandrimont | 15 January 2021, 14:37:59 UTC | simulator: Replace attrs with dataclasses for consistency | 20 January 2021, 16:37:44 UTC |
24922fe | Nicolas Dandrimont | 15 January 2021, 14:31:41 UTC | simulator: wrap tasks and task events in typechecked objects This allows us to extend these objects without redefining a bunch of type annotations. | 20 January 2021, 16:37:44 UTC |
d5318ae | Nicolas Dandrimont | 15 January 2021, 13:47:33 UTC | simulator: also fill data for the task-based scheduler | 20 January 2021, 16:37:44 UTC |
22ebb7a | Valentin Lorentz | 15 January 2021, 13:41:05 UTC | simulator: Split into smaller files in the same package | 20 January 2021, 16:37:44 UTC |
ad7bfbe | Nicolas Dandrimont | 15 January 2021, 11:50:00 UTC | simulator: Make the run time a CLI argument | 20 January 2021, 16:37:44 UTC |
df34db0 | Nicolas Dandrimont | 15 January 2021, 11:40:16 UTC | simulator: tweak simulation environment constants | 20 January 2021, 16:37:44 UTC |
21ce2c8 | Nicolas Dandrimont | 15 January 2021, 11:37:00 UTC | simulator: generate more origins in fill_data | 20 January 2021, 16:37:44 UTC |
2920419 | Nicolas Dandrimont | 15 January 2021, 11:35:01 UTC | simulator: add typing for Environment.scheduler | 20 January 2021, 16:37:44 UTC |
6433266 | Nicolas Dandrimont | 15 January 2021, 11:00:21 UTC | simulator: add support for a basic SimulationReport For now, this collects the runtime of tasks that have run, and gets printed at the end of the simulation. | 20 January 2021, 16:37:44 UTC |
c474a82 | Nicolas Dandrimont | 15 January 2021, 10:45:23 UTC | simulator: refine origin model to follow an exponential distribution This models origins using a consistent characteristic "time between commits" that follows an exponential distribution between 1 second and 10 years. From this characteristic time, and feedback from the OriginVisitStats, we can generate the expected run time and output status of the next visit of that origin. | 20 January 2021, 16:37:44 UTC |
2459bad | Nicolas Dandrimont | 15 January 2021, 10:43:20 UTC | simulator: Remove some debug statements and lower log level | 20 January 2021, 16:37:44 UTC |
cb12449 | Valentin Lorentz | 14 January 2021, 14:17:11 UTC | simulator: simulate the scheduler journal client | 20 January 2021, 16:37:44 UTC |
20b7f9c | Valentin Lorentz | 14 January 2021, 14:12:38 UTC | simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. | 20 January 2021, 16:37:44 UTC |
39ad47d | Valentin Lorentz | 14 January 2021, 14:09:58 UTC | simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. | 20 January 2021, 16:37:44 UTC |
31967fa | Valentin Lorentz | 14 January 2021, 14:07:56 UTC | simulator: Use datetimes instead of a floating point simulated time | 20 January 2021, 16:37:44 UTC |
fc3f06b | Nicolas Dandrimont | 13 January 2021, 15:13:01 UTC | Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. | 20 January 2021, 16:37:44 UTC |
7905a6b | Valentin Lorentz | 19 January 2021, 17:39:21 UTC | Add a cli for the scheduler metrics update endpoint | 20 January 2021, 16:35:05 UTC |
c386fdf | David Douard | 20 January 2021, 11:24:10 UTC | Make the max_date() helper function accept *dates as argument so it can be called with more than 2 dates. | 20 January 2021, 11:28:02 UTC |
b03d978 | David Douard | 20 January 2021, 11:14:23 UTC | Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py needed to make pytest able to run directly (without tox). | 20 January 2021, 11:18:25 UTC |
737d12e | Nicolas Dandrimont | 19 January 2021, 16:48:31 UTC | Introduce a new lister_get endpoint | 20 January 2021, 10:02:21 UTC |
114ed95 | Nicolas Dandrimont | 19 January 2021, 13:23:32 UTC | Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. The metrics are aggregated for each lister instance and visit type (allowing complete reaggregation by visit type for cross-cutting statistics). The following metrics have been implemented: - number of known origins overall - number of enabled origins (origins seen in the last listing) - number of enabled origins that have never been successfully visited - number of enabled origins with known activity since our last successful visit | 20 January 2021, 09:54:27 UTC |
9852653 | Nicolas Dandrimont | 19 January 2021, 16:56:44 UTC | Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal. | 20 January 2021, 09:35:09 UTC |
f8627a9 | David Douard | 19 January 2021, 13:45:09 UTC | Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus this timestamp being actually a loading-related value, it makes more sense to keep it in the OriginVisitStatus table. Related to T2444. | 19 January 2021, 16:48:51 UTC |
0a32a31 | David Douard | 19 January 2021, 15:16:30 UTC | Make the journal-client cli subcommand automagically loaded otherwise it won't be advertized as a `swh scehduler` subcommand by default. Also add a short dosctring for better --help. | 19 January 2021, 15:18:49 UTC |
5e609d5 | Antoine R. Dumont (@ardumont) | 19 January 2021, 11:04:37 UTC | requirements: Make swh.journal and optional dependency This avoids pulling journal dependencies when modules only needs the swh-scheduler dependency. | 19 January 2021, 11:04:37 UTC |
9395aa0 | Antoine R. Dumont (@ardumont) | 18 January 2021, 13:46:57 UTC | scheduler.cli.journal: Add `swh scheduler journal-client` cli This adds the cli entrypoint to actually process origin_visit_status topics and write to the origin_visit_stats db table. Related to T2967 | 19 January 2021, 10:10:41 UTC |
58ca796 | Antoine R. Dumont (@ardumont) | 15 January 2021, 14:49:41 UTC | journal_client: Improve stats detection This adds an integration test which permutes input to ensure out of order renders the same result. This also improves the current algorithm which revealed some hit-and-miss cases: - Initialization of the first visit detection (through the "last_snapshot" absence field, the previous implementation check could fail otherwise). - out of order policy (ignore old event) in case of supposedly "eventful" event was done too early which ignored too much messages (those new test cases failed in some permutations). This is now specifically checked in case of referenced snapshots which led to cases of possibly changing eventful event into uneventful one. For example, the case of an anterior eventful event is caught which means that the current most-up-to-date eventful event is actually an uneventful one). ... Related to T2967 | 19 January 2021, 09:17:05 UTC |
d3afd14 | Nicolas Dandrimont | 15 January 2021, 14:10:44 UTC | Use the recorded task end time for the task scheduler feedback loop This allows us to run "time-warping" simulations without interference from the real wall clock time. | 15 January 2021, 16:04:30 UTC |
a5fb291 | Antoine R. Dumont (@ardumont) | 14 January 2021, 17:38:05 UTC | backend: Make origin_visit_stats_upsert a batch api Related to T2967 | 15 January 2021, 13:34:06 UTC |
608aa20 | Antoine R. Dumont (@ardumont) | 13 January 2021, 12:03:28 UTC | Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967 | 15 January 2021, 13:34:05 UTC |
ca45d40 | Nicolas Dandrimont | 13 January 2021, 14:31:55 UTC | Filter origins by visit type when scheduling the next visits We have separate task queues and workers for each visit type, so it makes sense to split this endpoint along these lines too, at least for now. | 14 January 2021, 12:53:31 UTC |
59b4cb3 | Nicolas Dandrimont | 13 January 2021, 14:25:56 UTC | Reorganize ListedOrigin fixtures to generate multiple visit_types | 14 January 2021, 12:53:31 UTC |
4f5338f | Nicolas Dandrimont | 12 January 2021, 16:10:39 UTC | Introduce a `swh scheduler origin schedule-next` cli This creates one-shot tasks in the classic scheduler for the next visits to run according to the visit scheduling policy. | 14 January 2021, 12:53:31 UTC |
3dd1d5f | Nicolas Dandrimont | 12 January 2021, 16:28:33 UTC | Rename test task types to names that match real tasks The success of tests using these task types would depend on the test run order, because these task types are (currently) being created by swh/scheduler/sql/50-data.sql, but the table is truncated after the first test completes. | 14 January 2021, 12:53:31 UTC |
5d7b002 | Nicolas Dandrimont | 12 January 2021, 15:16:31 UTC | Introduce a `swh scheduler origin grab-next` cli This returns, as CSV, the next origins to be visited according to the passed scheduling policy. | 14 January 2021, 12:53:31 UTC |