https://forge.softwareheritage.org/source/swh-scheduler.git

sort by:
Revision Author Date Message Commit Date
20b7f9c simulator: generate OriginVisitStatus objects in modeled visits To be able to generate uneventful visits, we would need to store the last snapshot seen for a given origin. Instead of storing this within the simulator, which would be a concern for large scale simulations, we use the scheduler visit cache directly. 20 January 2021, 16:37:44 UTC
39ad47d simulator: Move scheduler into the simulation environment object The scheduler is used by a lot of the simulated actors, it makes sense to share it all the time. 20 January 2021, 16:37:44 UTC
31967fa simulator: Use datetimes instead of a floating point simulated time 20 January 2021, 16:37:44 UTC
fc3f06b Introduce scaffolding for a scheduler simulator This simulator will allow us to compare the behavior of the old and new schedulers, as well as to test the impact of scheduler policies and their parameters on the performance of the Software Heritage archival infrastructure as a whole. 20 January 2021, 16:37:44 UTC
7905a6b Add a cli for the scheduler metrics update endpoint 20 January 2021, 16:35:05 UTC
c386fdf Make the max_date() helper function accept *dates as argument so it can be called with more than 2 dates. 20 January 2021, 11:28:02 UTC
b03d978 Make sure swh.scheduler.cli.journal is loaded in test_cli_journal.py needed to make pytest able to run directly (without tox). 20 January 2021, 11:18:25 UTC
737d12e Introduce a new lister_get endpoint 20 January 2021, 10:02:21 UTC
114ed95 Implement some basic aggregated metrics on listed origins Metrics are computed and cached database-side by the `update_metrics` function. The `get_metrics` function only retrieves the cached data. The metrics are aggregated for each lister instance and visit type (allowing complete reaggregation by visit type for cross-cutting statistics). The following metrics have been implemented: - number of known origins overall - number of enabled origins (origins seen in the last listing) - number of enabled origins that have never been successfully visited - number of enabled origins with known activity since our last successful visit 20 January 2021, 09:54:27 UTC
9852653 Import the journal subcommand in the main swh.scheduler cli This issue was masked by tox.ini using pytest with --doctest-modules, which imports all modules during test collection, and therefore executing the side-effects of swh.scheduler.cli.journal. 20 January 2021, 09:35:09 UTC
f8627a9 Move the `last_scheduled` ts from ListedOrigin to OriginVisitStatus this timestamp being actually a loading-related value, it makes more sense to keep it in the OriginVisitStatus table. Related to T2444. 19 January 2021, 16:48:51 UTC
0a32a31 Make the journal-client cli subcommand automagically loaded otherwise it won't be advertized as a `swh scehduler` subcommand by default. Also add a short dosctring for better --help. 19 January 2021, 15:18:49 UTC
5e609d5 requirements: Make swh.journal and optional dependency This avoids pulling journal dependencies when modules only needs the swh-scheduler dependency. 19 January 2021, 11:04:37 UTC
9395aa0 scheduler.cli.journal: Add `swh scheduler journal-client` cli This adds the cli entrypoint to actually process origin_visit_status topics and write to the origin_visit_stats db table. Related to T2967 19 January 2021, 10:10:41 UTC
58ca796 journal_client: Improve stats detection This adds an integration test which permutes input to ensure out of order renders the same result. This also improves the current algorithm which revealed some hit-and-miss cases: - Initialization of the first visit detection (through the "last_snapshot" absence field, the previous implementation check could fail otherwise). - out of order policy (ignore old event) in case of supposedly "eventful" event was done too early which ignored too much messages (those new test cases failed in some permutations). This is now specifically checked in case of referenced snapshots which led to cases of possibly changing eventful event into uneventful one. For example, the case of an anterior eventful event is caught which means that the current most-up-to-date eventful event is actually an uneventful one). ... Related to T2967 19 January 2021, 09:17:05 UTC
d3afd14 Use the recorded task end time for the task scheduler feedback loop This allows us to run "time-warping" simulations without interference from the real wall clock time. 15 January 2021, 16:04:30 UTC
a5fb291 backend: Make origin_visit_stats_upsert a batch api Related to T2967 15 January 2021, 13:34:06 UTC
608aa20 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967 15 January 2021, 13:34:05 UTC
ca45d40 Filter origins by visit type when scheduling the next visits We have separate task queues and workers for each visit type, so it makes sense to split this endpoint along these lines too, at least for now. 14 January 2021, 12:53:31 UTC
59b4cb3 Reorganize ListedOrigin fixtures to generate multiple visit_types 14 January 2021, 12:53:31 UTC
4f5338f Introduce a `swh scheduler origin schedule-next` cli This creates one-shot tasks in the classic scheduler for the next visits to run according to the visit scheduling policy. 14 January 2021, 12:53:31 UTC
3dd1d5f Rename test task types to names that match real tasks The success of tests using these task types would depend on the test run order, because these task types are (currently) being created by swh/scheduler/sql/50-data.sql, but the table is truncated after the first test completes. 14 January 2021, 12:53:31 UTC
5d7b002 Introduce a `swh scheduler origin grab-next` cli This returns, as CSV, the next origins to be visited according to the passed scheduling policy. 14 January 2021, 12:53:31 UTC
a620033 Add an new origin visit info model object and related backend api Upsert and Read methods Related to T2443 12 January 2021, 13:47:49 UTC
b13cb1f Implement a basic endpoint for getting the next origins to visit The basic policy implemented is a FIFO, to get things going. 11 January 2021, 14:40:17 UTC
619100e Add a cli section to the doc 18 December 2020, 14:57:00 UTC
ebff12b requirement: Adapt celery requirements This adapts the celery requirements to the last known where our builds are fine. Currently, 5.0.3 got released and this ends up making all the swh modules relying on tasks timeout. A bug upstream is opened [1]. In the mean time, this workaround fixes [2] and most probably the remaining swh builds. [1] https://github.com/celery/celery/issues/6521 [2] https://jenkins.softwareheritage.org/job/DSCH/job/tests/1132/console 07 December 2020, 08:28:00 UTC
3c87075 Replace usage of arrow datetime objects in favor of pure datetime ones Note that the humanize library is now used in the cli pretty printing function (in place of the arrow hiumanize feature). As a result, displayed output from some cli commands may slightly differ. Closes T2835. 03 December 2020, 09:25:36 UTC
1b390a7 Stop using the deprecated configuration scheme 25 November 2020, 14:56:18 UTC
1f68031 cli.task_type: All task_type clis without a scheduler should raise As the code will plainly fail on calling methods on None instance if not caught early. 25 November 2020, 14:56:18 UTC
9e5b17f conftest: Reference swh.core.db.pytest_plugin As it's exposed through the swh.scheduler.pytest_plugin itself used by other swh modules, this needs to be declared to avoid other swh module build failures. Related to T2746 24 November 2020, 13:10:31 UTC
49ed819 requirements-test.txt: Drop no longer needed pytest-postgresql requirement requirements-swh.txt already declares the swh.core[db] dependency which transitively pulls it. Related to T2746 23 November 2020, 12:11:02 UTC
2f9e8ec scheduler.pytest_plugin: Make scheduler tests faster Reuse the swh.core.db.pytest_plugin factory 22 October 2020, 10:09:06 UTC
6a4455c pytest_plugin: Explicitely name the scheduler test db differently When using tests on modules with different lower level modules (e.g storage, scheduler, ...) this avoids clashes. 19 October 2020, 07:25:04 UTC
13dcadd scheduler: Type and unify get_scheduler factory with other factories Related to T1410 16 October 2020, 16:24:03 UTC
dd33cdc test_server: Simplify exception manipulations 16 October 2020, 11:43:54 UTC
315a2c9 tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops 02 October 2020, 14:24:01 UTC
b7e5358 Drop vcversioner from requirements We stopped using it months ago. 25 September 2020, 15:19:17 UTC
4951a23 Run isort after the CLI import changes 25 September 2020, 12:19:21 UTC
be7a5ae Rename sql files according to swh.core 0.3 25 September 2020, 07:53:53 UTC
5cc573d Adapt cli declaration entrypoint to swh.core 0.3 25 September 2020, 07:48:38 UTC
99e5af8 Move from kombu.five.monotonic to time.monotonic Looks like kombu finally axed python2 support. 24 September 2020, 15:44:00 UTC
7b0d48f python: Reorder imports with isort Related to T2610 17 September 2020, 16:03:39 UTC
8d8b58f pre-commit: Add isort hook and configuration Related to T2610 17 September 2020, 16:03:39 UTC
4bec5c8 pre-commit: Update flake8 hook configuration flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8 17 September 2020, 16:03:39 UTC
f5c8154 cli: speedup the `swh` cli command startup time by moving import statements in functions and using conditional import of typechecking modules (especially StorageInterface which triggers the loading of 300+ modules). Related to T2575. 10 September 2020, 15:46:08 UTC
b24be0c Tell pytest not to recurse in dotdirs. pytest wastes a lot of time in .hypothesis and .git; this commit excludes them. 25 August 2020, 08:41:38 UTC
6426208 cli.task: Migrate scheduler cli to latest storage change on iter_origins Related to T645 03 August 2020, 10:18:23 UTC
849d063 test_cli: Adapt tests data and drop unsupported "validate" proxy 24 July 2020, 08:22:07 UTC
9f52d95 cli.task: Fix iter_origin returned types Related to T2494 21 July 2020, 08:36:03 UTC
254e24a Do no expose pytest-plugin through setuptools, let modules require it when needed Defining the pytest-plugin though the pytest-plugin [1] makes it loaded by default. This creates loading issues on modules depending on scheduler but not on the pytest plugin scheduler exposes as explained in the doc [2] Instead we'll explicitely define to modules depending on the pytest plugins in their root conftest [3]: pytest_plugins = [ "swh.scheduler.pytest_plugin" ] [1] https://docs.pytest.org/en/stable/writing_plugins.html#setuptools-entry-points [2] https://docs.pytest.org/en/stable/writing_plugins.html#plugin-discovery-order-at-tool-startup [3] https://docs.pytest.org/en/stable/writing_plugins.html#requiring-loading-plugins-in-a-test-module-or-conftest-file Related to D3475 Related to T2484 10 July 2020, 10:27:42 UTC
ece598c requirements.txt: Remove future dependency This was needed for celery 4.4.4 but that version is not used anymore. 08 July 2020, 16:33:25 UTC
7009c3b Move all celery-related fixtures to the swh.scheduler pytest plugin This allows us to reuse these fixtures in other modules without brittle swh.scheduler.tests.conftest star imports. Unfortunately, we can't really override pytest fixtures from one plugin to another. We therefore reimplement the fixtures provided by celery, inlining the static configuration and renaming them to our names in the process. This also adds a backwards-compatibility import from pytest_plugin to conftest, to allow old users of the conftest fixtures to keep working. 08 July 2020, 15:59:15 UTC
ce63e6a pytest.ini: Drop filterwarnings which never worked 07 July 2020, 10:18:50 UTC
b2cbb9b Move shareable fixtures out of conftest into a dedicated pytest plugin This avoids having to run `from swh.scheduler.tests.conftest import *` in other modules, e.g. swh.lister, to import and use the swh_scheduler pytest fixture. 06 July 2020, 14:42:04 UTC
5b373ce Introduce a get_listed_origins endpoint This paginated endpoint allows retrieving information about the origins recorded by listers. 06 July 2020, 09:51:10 UTC
aefc5c9 Don't recurse into attrs objects when serializing We need to use our serialization hook recursively to make sure that we can deserialize nested data structures. 06 July 2020, 07:48:29 UTC
cc8fa7f Re-introduce the root endpoint for the rpc server 22 June 2020, 10:55:11 UTC
265bc8b The celery-monitor subcommand glob filtering needs celery >= 4.3 22 June 2020, 08:58:09 UTC
8a1724a Add SQL for version 16 of the schema 22 June 2020, 08:26:40 UTC
d107a55 Implement storage of listed origins This new API endpoint allows listers to record the origins they have seen during their current run. Origins are identified by the lister instance, the url of the origin, and the type of loader that should be used to load this origin. The implementation allows listers just send the list of origins they've seen (with some lightweight extra information), leaving the backend to handle whether to do an insertion or an update to an existing origin. The current implementation doesn't disable origins that have disappeared when doing a full listing run. This step will be done by a separate "origin garbage collection" endpoint, which will peruse the `last_seen` field. 16 June 2020, 08:25:08 UTC
e0fa5c5 Move lister addition in scheduler tests to a pytest fixture This lets us keep the tests a little DRYer. 16 June 2020, 08:24:03 UTC
04894bd Lister.instance_name doesn't need a factory/default value 16 June 2020, 08:22:23 UTC
f520108 Improve support of primary keys This splits primary keys across "automatic" primary keys (handled by the database) and manual primary keys (managed by the user). Use the opportunity to improve/clarify the documentation of field metadata attributes. 16 June 2020, 08:22:12 UTC
1c93e55 Implement basic storage and retrieval of lister information This adds a pair a functions to the backend: - `get_or_create_lister` pulls the record for a given lister from the database - `update_lister` updates the record for a given lister in the database This is one of the basic building blocks for the integration of lister information directly in the scheduler database. Related to T2442. 15 June 2020, 13:41:02 UTC
466ac59 Introduce a SchedulerException base class This allows us to automatically serialize/deserialize exceptions under this base class within our RPC framework. 15 June 2020, 12:53:30 UTC
c509a12 Introduce some scaffolding for an attrs-based BaseSchedulerModel Alongside swh.model.model, this allows us to define data models for the objects the scheduler is working with, and to serialize/deserialize these objects transparently at the RPC layer. This also introduces some mild ORM-like logic so we can keep the actual SQL a little DRYer. 15 June 2020, 10:49:25 UTC
4c0c37b Use the automatic RPC client/server generation 11 June 2020, 09:42:37 UTC
aedd323 Replace swh-worker-control with a swh scheduler celery-monitor subcommand This new subcommand has two commands: - ping: checks whether the given worker instance answers within a given timeout - list-running: lists running tasks on the given worker instance 10 June 2020, 10:15:54 UTC
8411335 Remove double logging setup in cli The logging module is already initialized by the main swh.core cli; This only creates double logging with no advantages whatsoever. 10 June 2020, 09:30:31 UTC
873cdac Handle psycopg2 OperationalError in cli initialization When running the cli with default settings (i.e. pointing to a softwareheritage-scheduler-dev database), and the database doesn't exist, an OperationalError is raised. This shouldn't prevent (some of the) cli subcommands from working, so catch this error and ignore it as one of the scheduler backend setup failure modes. 10 June 2020, 09:28:19 UTC
28c5b8d Replace vcversioner with setuptools-scm 09 June 2020, 13:49:00 UTC
14cd5bb Blacken for python3.7+ 03 June 2020, 15:19:00 UTC
6ac3d56 Drop use of pifpaf and the "db" pytest mark We've been using pytest-postgresql for... a year (4117d5a). 03 June 2020, 10:34:11 UTC
3f42423 Add future dependency, missing from celery 4.4.4 Without future, the tests involving celery hang indefinitely. Upstream issue: https://github.com/celery/celery/issues/6145 03 June 2020, 09:29:58 UTC
92c0869 Celery runner: only schedule tasks when the buffer is less than 80% full The queries to pick up tasks from the scheduler sometimes degenerate when the number of tasks fetched is too low, which hangs the runner for all other tasks. Adding this lower bound helps postgresql use proper optimizations to pull tasks. 19 May 2020, 09:34:52 UTC
b839906 Disable the azure http logger in the celery worker base config This is suboptimal (we should move all of this to a logconfig where we can set this stuff), but this is consistent with how we do things currently. 19 May 2020, 09:14:25 UTC
2ea919c Fix black for py37 19 May 2020, 09:12:26 UTC
3a74069 test_scheduler: Fix pep8 violation This fixes ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/859/console 12 May 2020, 09:55:09 UTC
2cc8aa0 setup.py: add documentation link 29 April 2020, 16:33:16 UTC
1abff22 setup: Update the minimum required runtime python3 version Related to T2367 20 April 2020, 15:29:49 UTC
551ceac Add a pyproject.toml file to target py37 for black 08 April 2020, 20:16:58 UTC
cc0ef04 Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. 08 April 2020, 14:58:01 UTC
77b2d0b tests: Adapt model according to latest change origin model no longer allows to have type. Related to f533f62bbf114cfcc29f7c72307c4dfbe99cf048 27 March 2020, 06:43:03 UTC
e6c2a86 Implement listener on top of pika instead of celery 23 March 2020, 11:52:06 UTC
68c42fb scheduler.backend_es: Leave index opened when streaming bulk Prior to this commit, we had the proper behavior of closing index when done streaming. Unfortunately, this created too much gc on es nodes down the line. So for now, we remove that behavior. Note that this implies we need another cog that makes a pass once in a while on indices to close. Also, this has been running on production for 2 weeks now and no more gc issues arose since then. 26 February 2020, 09:34:09 UTC
af58466 backend: Make create_task_type idempotent There is no reason to raise an error when a task type has already been created and it enables to stop leaking psycopg2 IntegrityError exception as part of the scheduler interface. 18 February 2020, 14:17:02 UTC
b92e3fd Use swh-storage validation proxy. Required by swh-storage >= v0.0.172. 12 February 2020, 12:48:52 UTC
73d1e5e cli.task: Change `get_storage` according to latest change 31 January 2020, 08:18:25 UTC
1c923aa test_cli: Fix storage instantiation following api change Using the `swh.storage.get_storage` function instead of calling directly the class name. This actually fixes the master ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/743/console 31 January 2020, 08:16:20 UTC
f6cc231 sentry: Fix initialization init_sentry call Api wise, the `sentry_dsn` is expected to be passed as first parameter. Which in the scheduler's case is not set yet. Forcing it to None for now. 23 January 2020, 13:21:21 UTC
0712207 Use swh.core.sentry instead of calling sentry_sdk.init directly. This adds support for SWH_MAIN_PACKAGE to initialize sentry_sdk with a release. 10 January 2020, 14:13:07 UTC
b488d69 backend_es: Fix configuration mapping 17 December 2019, 22:23:35 UTC
cc2de16 tests: Try to avoid fixture redefinition Somehow, that messes other tests in the debian build. 17 December 2019, 14:57:33 UTC
73ade78 tests: Avoid fixture clash in different purposes fixture Somehow, that fails in the debian build 17 December 2019, 14:27:50 UTC
e9d8a5f scheduler.backend: Rename appropriately module elasticsearch_memory 17 December 2019, 12:33:43 UTC
2cbfb78 Add tests to in memory elasticsearch implementation 17 December 2019, 12:33:43 UTC
ba5920d backend_es: Add tests around elasticsearch client instantiation 17 December 2019, 12:33:43 UTC
38d17de tests/common: Remove uneeded behavior 17 December 2019, 12:33:43 UTC
ac32b5e backend: Add alternate memory elasticsearch implem to allow testing 17 December 2019, 12:33:43 UTC
back to top