5b373ce | Nicolas Dandrimont | 06 July 2020, 07:49:44 UTC | Introduce a get_listed_origins endpoint This paginated endpoint allows retrieving information about the origins recorded by listers. | 06 July 2020, 09:51:10 UTC |
aefc5c9 | Nicolas Dandrimont | 06 July 2020, 07:48:29 UTC | Don't recurse into attrs objects when serializing We need to use our serialization hook recursively to make sure that we can deserialize nested data structures. | 06 July 2020, 07:48:29 UTC |
cc8fa7f | Nicolas Dandrimont | 22 June 2020, 10:46:09 UTC | Re-introduce the root endpoint for the rpc server | 22 June 2020, 10:55:11 UTC |
265bc8b | Nicolas Dandrimont | 22 June 2020, 08:58:09 UTC | The celery-monitor subcommand glob filtering needs celery >= 4.3 | 22 June 2020, 08:58:09 UTC |
8a1724a | Nicolas Dandrimont | 22 June 2020, 08:26:40 UTC | Add SQL for version 16 of the schema | 22 June 2020, 08:26:40 UTC |
d107a55 | Nicolas Dandrimont | 16 June 2020, 08:25:08 UTC | Implement storage of listed origins This new API endpoint allows listers to record the origins they have seen during their current run. Origins are identified by the lister instance, the url of the origin, and the type of loader that should be used to load this origin. The implementation allows listers just send the list of origins they've seen (with some lightweight extra information), leaving the backend to handle whether to do an insertion or an update to an existing origin. The current implementation doesn't disable origins that have disappeared when doing a full listing run. This step will be done by a separate "origin garbage collection" endpoint, which will peruse the `last_seen` field. | 16 June 2020, 08:25:08 UTC |
e0fa5c5 | Nicolas Dandrimont | 16 June 2020, 08:24:03 UTC | Move lister addition in scheduler tests to a pytest fixture This lets us keep the tests a little DRYer. | 16 June 2020, 08:24:03 UTC |
04894bd | Nicolas Dandrimont | 16 June 2020, 08:22:23 UTC | Lister.instance_name doesn't need a factory/default value | 16 June 2020, 08:22:23 UTC |
f520108 | Nicolas Dandrimont | 16 June 2020, 08:08:59 UTC | Improve support of primary keys This splits primary keys across "automatic" primary keys (handled by the database) and manual primary keys (managed by the user). Use the opportunity to improve/clarify the documentation of field metadata attributes. | 16 June 2020, 08:22:12 UTC |
1c93e55 | Nicolas Dandrimont | 12 June 2020, 10:24:20 UTC | Implement basic storage and retrieval of lister information This adds a pair a functions to the backend: - `get_or_create_lister` pulls the record for a given lister from the database - `update_lister` updates the record for a given lister in the database This is one of the basic building blocks for the integration of lister information directly in the scheduler database. Related to T2442. | 15 June 2020, 13:41:02 UTC |
466ac59 | Nicolas Dandrimont | 15 June 2020, 12:46:28 UTC | Introduce a SchedulerException base class This allows us to automatically serialize/deserialize exceptions under this base class within our RPC framework. | 15 June 2020, 12:53:30 UTC |
c509a12 | Nicolas Dandrimont | 12 June 2020, 09:03:26 UTC | Introduce some scaffolding for an attrs-based BaseSchedulerModel Alongside swh.model.model, this allows us to define data models for the objects the scheduler is working with, and to serialize/deserialize these objects transparently at the RPC layer. This also introduces some mild ORM-like logic so we can keep the actual SQL a little DRYer. | 15 June 2020, 10:49:25 UTC |
4c0c37b | Nicolas Dandrimont | 10 June 2020, 14:09:53 UTC | Use the automatic RPC client/server generation | 11 June 2020, 09:42:37 UTC |
aedd323 | Nicolas Dandrimont | 10 June 2020, 09:31:45 UTC | Replace swh-worker-control with a swh scheduler celery-monitor subcommand This new subcommand has two commands: - ping: checks whether the given worker instance answers within a given timeout - list-running: lists running tasks on the given worker instance | 10 June 2020, 10:15:54 UTC |
8411335 | Nicolas Dandrimont | 10 June 2020, 09:30:31 UTC | Remove double logging setup in cli The logging module is already initialized by the main swh.core cli; This only creates double logging with no advantages whatsoever. | 10 June 2020, 09:30:31 UTC |
873cdac | Nicolas Dandrimont | 10 June 2020, 09:28:19 UTC | Handle psycopg2 OperationalError in cli initialization When running the cli with default settings (i.e. pointing to a softwareheritage-scheduler-dev database), and the database doesn't exist, an OperationalError is raised. This shouldn't prevent (some of the) cli subcommands from working, so catch this error and ignore it as one of the scheduler backend setup failure modes. | 10 June 2020, 09:28:19 UTC |
28c5b8d | Nicolas Dandrimont | 09 June 2020, 13:47:26 UTC | Replace vcversioner with setuptools-scm | 09 June 2020, 13:49:00 UTC |
14cd5bb | Nicolas Dandrimont | 03 June 2020, 15:17:50 UTC | Blacken for python3.7+ | 03 June 2020, 15:19:00 UTC |
6ac3d56 | Nicolas Dandrimont | 03 June 2020, 10:34:11 UTC | Drop use of pifpaf and the "db" pytest mark We've been using pytest-postgresql for... a year (4117d5a). | 03 June 2020, 10:34:11 UTC |
3f42423 | Nicolas Dandrimont | 03 June 2020, 09:29:58 UTC | Add future dependency, missing from celery 4.4.4 Without future, the tests involving celery hang indefinitely. Upstream issue: https://github.com/celery/celery/issues/6145 | 03 June 2020, 09:29:58 UTC |
92c0869 | Nicolas Dandrimont | 19 May 2020, 09:30:13 UTC | Celery runner: only schedule tasks when the buffer is less than 80% full The queries to pick up tasks from the scheduler sometimes degenerate when the number of tasks fetched is too low, which hangs the runner for all other tasks. Adding this lower bound helps postgresql use proper optimizations to pull tasks. | 19 May 2020, 09:34:52 UTC |
b839906 | Nicolas Dandrimont | 19 May 2020, 09:12:55 UTC | Disable the azure http logger in the celery worker base config This is suboptimal (we should move all of this to a logconfig where we can set this stuff), but this is consistent with how we do things currently. | 19 May 2020, 09:14:25 UTC |
2ea919c | Nicolas Dandrimont | 19 May 2020, 09:12:26 UTC | Fix black for py37 | 19 May 2020, 09:12:26 UTC |
3a74069 | Antoine R. Dumont (@ardumont) | 12 May 2020, 09:55:09 UTC | test_scheduler: Fix pep8 violation This fixes ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/859/console | 12 May 2020, 09:55:09 UTC |
2cc8aa0 | Stefano Zacchiroli | 29 April 2020, 16:33:16 UTC | setup.py: add documentation link | 29 April 2020, 16:33:16 UTC |
1abff22 | Antoine R. Dumont (@ardumont) | 20 April 2020, 15:29:49 UTC | setup: Update the minimum required runtime python3 version Related to T2367 | 20 April 2020, 15:29:49 UTC |
551ceac | David Douard | 08 April 2020, 20:16:58 UTC | Add a pyproject.toml file to target py37 for black | 08 April 2020, 20:16:58 UTC |
cc0ef04 | David Douard | 08 April 2020, 14:58:01 UTC | Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. | 08 April 2020, 14:58:01 UTC |
77b2d0b | Antoine R. Dumont (@ardumont) | 27 March 2020, 06:43:03 UTC | tests: Adapt model according to latest change origin model no longer allows to have type. Related to f533f62bbf114cfcc29f7c72307c4dfbe99cf048 | 27 March 2020, 06:43:03 UTC |
e6c2a86 | Nicolas Dandrimont | 23 March 2020, 09:45:30 UTC | Implement listener on top of pika instead of celery | 23 March 2020, 11:52:06 UTC |
68c42fb | Antoine R. Dumont (@ardumont) | 03 February 2020, 08:20:57 UTC | scheduler.backend_es: Leave index opened when streaming bulk Prior to this commit, we had the proper behavior of closing index when done streaming. Unfortunately, this created too much gc on es nodes down the line. So for now, we remove that behavior. Note that this implies we need another cog that makes a pass once in a while on indices to close. Also, this has been running on production for 2 weeks now and no more gc issues arose since then. | 26 February 2020, 09:34:09 UTC |
af58466 | Antoine Lambert | 17 February 2020, 15:55:20 UTC | backend: Make create_task_type idempotent There is no reason to raise an error when a task type has already been created and it enables to stop leaking psycopg2 IntegrityError exception as part of the scheduler interface. | 18 February 2020, 14:17:02 UTC |
b92e3fd | Valentin Lorentz | 12 February 2020, 12:48:52 UTC | Use swh-storage validation proxy. Required by swh-storage >= v0.0.172. | 12 February 2020, 12:48:52 UTC |
73d1e5e | Antoine R. Dumont (@ardumont) | 31 January 2020, 08:18:25 UTC | cli.task: Change `get_storage` according to latest change | 31 January 2020, 08:18:25 UTC |
1c923aa | Antoine R. Dumont (@ardumont) | 31 January 2020, 08:16:20 UTC | test_cli: Fix storage instantiation following api change Using the `swh.storage.get_storage` function instead of calling directly the class name. This actually fixes the master ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/743/console | 31 January 2020, 08:16:20 UTC |
f6cc231 | Antoine R. Dumont (@ardumont) | 23 January 2020, 13:21:21 UTC | sentry: Fix initialization init_sentry call Api wise, the `sentry_dsn` is expected to be passed as first parameter. Which in the scheduler's case is not set yet. Forcing it to None for now. | 23 January 2020, 13:21:21 UTC |
0712207 | Valentin Lorentz | 10 January 2020, 14:13:07 UTC | Use swh.core.sentry instead of calling sentry_sdk.init directly. This adds support for SWH_MAIN_PACKAGE to initialize sentry_sdk with a release. | 10 January 2020, 14:13:07 UTC |
b488d69 | Antoine R. Dumont (@ardumont) | 17 December 2019, 22:23:35 UTC | backend_es: Fix configuration mapping | 17 December 2019, 22:23:35 UTC |
cc2de16 | Antoine R. Dumont (@ardumont) | 17 December 2019, 14:57:33 UTC | tests: Try to avoid fixture redefinition Somehow, that messes other tests in the debian build. | 17 December 2019, 14:57:33 UTC |
73ade78 | Antoine R. Dumont (@ardumont) | 17 December 2019, 14:27:15 UTC | tests: Avoid fixture clash in different purposes fixture Somehow, that fails in the debian build | 17 December 2019, 14:27:50 UTC |
e9d8a5f | Antoine R. Dumont (@ardumont) | 17 December 2019, 12:28:42 UTC | scheduler.backend: Rename appropriately module elasticsearch_memory | 17 December 2019, 12:33:43 UTC |
2cbfb78 | Antoine R. Dumont (@ardumont) | 17 December 2019, 11:51:28 UTC | Add tests to in memory elasticsearch implementation | 17 December 2019, 12:33:43 UTC |
ba5920d | Antoine R. Dumont (@ardumont) | 17 December 2019, 11:51:13 UTC | backend_es: Add tests around elasticsearch client instantiation | 17 December 2019, 12:33:43 UTC |
38d17de | Antoine R. Dumont (@ardumont) | 17 December 2019, 11:50:13 UTC | tests/common: Remove uneeded behavior | 17 December 2019, 12:33:43 UTC |
ac32b5e | Antoine R. Dumont (@ardumont) | 17 December 2019, 09:59:19 UTC | backend: Add alternate memory elasticsearch implem to allow testing | 17 December 2019, 12:33:43 UTC |
7b1c2d5 | Antoine R. Dumont (@ardumont) | 17 December 2019, 09:57:31 UTC | scheduler.backend_es: Allow using different elasticsearch clients For the moment, only 1 official es client exists | 17 December 2019, 12:33:43 UTC |
ec207fb | Antoine R. Dumont (@ardumont) | 17 December 2019, 09:51:20 UTC | scheduler.backend: Make the returned result a dict | 17 December 2019, 12:33:42 UTC |
f97bff6 | Antoine R. Dumont (@ardumont) | 17 December 2019, 09:50:27 UTC | cli.task: Make page_token actually a string even from the cli That actually make it consistent with the api | 17 December 2019, 12:33:42 UTC |
d8859d7 | Antoine R. Dumont (@ardumont) | 16 December 2019, 16:15:42 UTC | backend_es: Add initialization endpoint | 17 December 2019, 12:33:42 UTC |
d5cea20 | Antoine R. Dumont (@ardumont) | 16 December 2019, 16:15:24 UTC | backend_es: Remove unused endpoint | 17 December 2019, 12:33:42 UTC |
18df124 | Antoine R. Dumont (@ardumont) | 16 December 2019, 16:14:54 UTC | cli.tasks: Unify logging instruction | 17 December 2019, 12:33:42 UTC |
c5e189b | Antoine R. Dumont (@ardumont) | 16 December 2019, 16:14:08 UTC | test: Allow status definition during task template generation | 17 December 2019, 12:33:42 UTC |
844f3e0 | Antoine R. Dumont (@ardumont) | 16 December 2019, 10:07:27 UTC | tests.scheduler: Extract common utility function and test it | 17 December 2019, 12:33:42 UTC |
2d56669 | Antoine R. Dumont (@ardumont) | 16 December 2019, 09:07:01 UTC | scheduler.cli.task: Rename appropriately backend variable | 17 December 2019, 12:33:42 UTC |
793c233 | Antoine R. Dumont (@ardumont) | 16 December 2019, 09:06:10 UTC | scheduler.backend_es: Rename backend class appropriately | 17 December 2019, 12:33:42 UTC |
d5bf6b1 | Antoine R. Dumont (@ardumont) | 14 December 2019, 17:44:57 UTC | cli.task: Rename internal method appropriately | 17 December 2019, 12:33:42 UTC |
eb1c3d3 | Antoine R. Dumont (@ardumont) | 14 December 2019, 17:43:21 UTC | backend_es: Use consistent logging instruction | 17 December 2019, 12:33:42 UTC |
b376eb9 | Antoine R. Dumont (@ardumont) | 14 December 2019, 17:42:14 UTC | backend_es: Enclose close instruction within finally | 17 December 2019, 12:33:42 UTC |
f6726e9 | Antoine R. Dumont (@ardumont) | 14 December 2019, 10:10:49 UTC | backend_es: Create index when it does not exist | 17 December 2019, 12:33:41 UTC |
ad54c6b | Antoine R. Dumont (@ardumont) | 14 December 2019, 09:49:59 UTC | backend_es: Open indices prior to indexing method calls | 17 December 2019, 12:33:41 UTC |
305422b | Antoine R. Dumont (@ardumont) | 14 December 2019, 09:49:35 UTC | cli.task: Tasks needs to be sorted prior to group by call | 17 December 2019, 12:33:41 UTC |
d603608 | Antoine R. Dumont (@ardumont) | 13 December 2019, 14:35:38 UTC | cli.task: Use the configuration provided by the cli | 17 December 2019, 12:33:41 UTC |
e0dd669 | Valentin Lorentz | 10 December 2019, 15:44:47 UTC | Initialize Sentry on worker startup. | 16 December 2019, 17:55:11 UTC |
f1b3f49 | Valentin Lorentz | 10 December 2019, 15:44:02 UTC | Print a traceback in case a signal callback crashes. Celery silently eats errors happening in these functions. | 16 December 2019, 17:54:54 UTC |
dbd4a2f | Antoine R. Dumont (@ardumont) | 14 December 2019, 17:25:07 UTC | backend: Align paginated endpoint consistently with others | 16 December 2019, 15:34:27 UTC |
3ab0348 | Antoine R. Dumont (@ardumont) | 14 December 2019, 13:25:17 UTC | backend: Filter properly archive within the defined range Prior to this commit, we could list tasks whose started date was null. Now we fallback on the scheduled task which is the next best date we have. | 14 December 2019, 13:25:17 UTC |
080db58 | Antoine R. Dumont (@ardumont) | 13 December 2019, 15:33:48 UTC | test_scheduler: Add some more check on filtering test | 13 December 2019, 15:34:08 UTC |
b8b171d | Antoine R. Dumont (@ardumont) | 13 December 2019, 14:07:28 UTC | backend: Make filter_task_to_archive a paginated endpoint Related to T1931 | 13 December 2019, 14:08:20 UTC |
2b93efb | Antoine R. Dumont (@ardumont) | 13 December 2019, 14:05:55 UTC | tox: Add ipdb dependency on py3-dev env | 13 December 2019, 14:08:10 UTC |
ee162fe | Nicolas Dandrimont | 13 December 2019, 10:29:22 UTC | Use a btree of (task_type, md5(arguments)) to match task arguments The former index on hash(arguments->'args') has lost relevance as about half the tasks (the ones for the loader) have the same value (an empty list) for this field. This index is more universal, faster, and also easier to convince the planner of using. If we want more specific indexes (e.g. on specific keyword arguments) we'll be able to add that separately. | 13 December 2019, 10:32:33 UTC |
0b04220 | David Douard | 12 December 2019, 16:37:39 UTC | Remove the creation of the 'load-deposit' task type it's now managed by swh-loader-core directly. | 12 December 2019, 16:37:39 UTC |
4071d71 | David Douard | 12 December 2019, 11:27:13 UTC | Make --status option of 'swh scheduler task list' a click.Choice | 12 December 2019, 11:27:13 UTC |
a18f562 | David Douard | 04 December 2019, 09:55:25 UTC | celery: add 2 statsd probes for the runner and listener - runner: counting the number of scheduled tasks, - listener: counting the number of processed events. | 04 December 2019, 15:36:19 UTC |
f206076 | David Douard | 04 December 2019, 09:25:37 UTC | celery: make SWHTask send start/end of execution statsd gauges with timestamps Closes T2119. | 04 December 2019, 09:28:26 UTC |
08243bb | David Douard | 04 December 2019, 09:23:15 UTC | tests: fix celery_task's test_multiping kwargs were not passed correctly. Also add a test_ping_with_kw test. | 04 December 2019, 09:25:05 UTC |
8c1e051 | Antoine R. Dumont (@ardumont) | 26 November 2019, 11:29:47 UTC | scheduler.updater: Remove dead code | 26 November 2019, 11:29:47 UTC |
95940a8 | Nicolas Dandrimont | 21 November 2019, 12:57:25 UTC | Migrate tox.ini to extras = xxx instead of deps = .[testing] | 21 November 2019, 12:57:25 UTC |
7c40132 | Nicolas Dandrimont | 21 November 2019, 12:50:03 UTC | Merge tox test environment configurations | 21 November 2019, 12:54:47 UTC |
101a131 | David Douard | 21 November 2019, 12:50:21 UTC | Add a pre-commit config file | 21 November 2019, 12:50:21 UTC |
104fee0 | Nicolas Dandrimont | 21 November 2019, 11:04:54 UTC | Drop version constraint on pytest < 4 | 21 November 2019, 11:04:54 UTC |
56e4a12 | Nicolas Dandrimont | 20 November 2019, 18:56:38 UTC | Include all requirements in MANIFEST.in | 20 November 2019, 18:56:38 UTC |
c973ec0 | Antoine R. Dumont (@ardumont) | 19 November 2019, 14:12:27 UTC | req-swh*: Remove old package loader backend names Related to T1389 T2098 Related to D2306 D2305 D2304 | 19 November 2019, 14:12:27 UTC |
9358572 | Antoine R. Dumont (@ardumont) | 15 November 2019, 15:02:12 UTC | swh.scheduler.cli: Add `swh scheduler task-type register` cli This allos registering of worker's task types to the scheduler through setuptools' mechanism. | 19 November 2019, 11:11:33 UTC |
8ec34fe | Nicolas Dandrimont | 23 October 2019, 08:40:37 UTC | Remove collect_ignore from conftest.py This got solved when we started using the shared_task decorator instead of instantiating our own app. | 23 October 2019, 08:40:42 UTC |
4df2406 | Nicolas Dandrimont | 18 October 2019, 15:54:04 UTC | Use the shared_task decorator instead of binding to a specific celery app | 23 October 2019, 08:32:26 UTC |
ecf38eb | David Douard | 18 October 2019, 14:37:28 UTC | celery/tests: mostly revert e770eb30 to fix celery app initialization in tests This revision did fix tests for the scheduler itself, but broke all other tests of scheduler dependent swh packages. In this fix, we ensure we override the `app` in swh.scheduler.celery_backend.config, since it is used by all celery task declarations (via the @app.task() decorator). | 18 October 2019, 14:37:28 UTC |
787c7a9 | Antoine R. Dumont (@ardumont) | 18 October 2019, 09:49:39 UTC | celery_backend.config: Make JournalHandler import optional swh-core no longer comes with JournalHandler by default. | 18 October 2019, 11:33:25 UTC |
c2a020d | David Douard | 16 October 2019, 08:50:06 UTC | tests: rewrite tests using pytest and the new rpc fixtures from swh.core | 16 October 2019, 11:20:42 UTC |
a7e15bf | David Douard | 16 October 2019, 08:46:11 UTC | add a new get_priority_ratios endpoint to the scheduler this is necessary to make it much easier to write tests so they do not need to execute SQL statements, which makes possible to run exactly the same tests with the SchedulerBackend as the RemoteScheduler one (see the following revision). | 16 October 2019, 11:20:42 UTC |
c2ccf46 | David Douard | 16 October 2019, 08:43:03 UTC | updater/tests: rewrite updater's tests as pytest functions The way the scheduler_db and updater_db fixtures are built is not very straighforward nor satisfying, but it works. | 16 October 2019, 11:20:42 UTC |
37b909e | David Douard | 16 October 2019, 08:40:28 UTC | conftest: simplify the swh_scheduler() fixture simply use the postgresql.dsn as connection string. | 16 October 2019, 08:42:04 UTC |
e770eb3 | Antoine R. Dumont (@ardumont) | 10 October 2019, 13:14:25 UTC | tests: Explicit registering test tasks step for the swh_app Prior to this commit, the celery "app" import changed. Making the runtime application load prior to the tests "swh_app". In effect, making the tasks not being consumed by workers. This explicitely forces the tests tasks registering to "swh_app". In effect clarifying code and fixing the current tests. Related D2082 Related 8eafc70 | 10 October 2019, 13:26:54 UTC |
349d23e | Antoine R. Dumont (@ardumont) | 10 October 2019, 10:05:38 UTC | scheduler: Use directly the package's server module to start server Related D2109 Related D2110 | 10 October 2019, 10:05:38 UTC |
8eafc70 | Antoine R. Dumont (@ardumont) | 10 October 2019, 09:51:27 UTC | tox.ini: Use tests installed files instead of working directory Related D2082 | 10 October 2019, 09:51:27 UTC |
5955c8d | Antoine R. Dumont (@ardumont) | 02 October 2019, 04:53:21 UTC | celery_backend/config: Fix wrong statement Dict's get method does not take keyword argument. Related D2033#47672 | 02 October 2019, 04:56:13 UTC |
06137f0 | Stefano Zacchiroli | 01 October 2019, 11:08:23 UTC | tox: anticipate mypy run to just after flake8 | 01 October 2019, 11:08:23 UTC |
c78b846 | Stefano Zacchiroli | 27 September 2019, 08:38:31 UTC | init.py: switch to documented way of extending path make mypy 0.730 pass cleanly again | 27 September 2019, 08:38:31 UTC |
1a691b5 | Stefano Zacchiroli | 24 September 2019, 11:55:04 UTC | tox.ini: add mypy section | 24 September 2019, 11:55:04 UTC |
c4fa353 | Stefano Zacchiroli | 24 September 2019, 11:43:34 UTC | typing: minimal changes to make a no-op mypy run pass | 24 September 2019, 11:44:50 UTC |
3cd5697 | Stefano Zacchiroli | 24 September 2019, 11:42:23 UTC | fix typo in docstring and sample file name courtesy of codespell | 24 September 2019, 11:42:23 UTC |