https://forge.softwareheritage.org/source/swh-scheduler.git

sort by:
Revision Author Date Message Commit Date
5b373ce Introduce a get_listed_origins endpoint This paginated endpoint allows retrieving information about the origins recorded by listers. 06 July 2020, 09:51:10 UTC
aefc5c9 Don't recurse into attrs objects when serializing We need to use our serialization hook recursively to make sure that we can deserialize nested data structures. 06 July 2020, 07:48:29 UTC
cc8fa7f Re-introduce the root endpoint for the rpc server 22 June 2020, 10:55:11 UTC
265bc8b The celery-monitor subcommand glob filtering needs celery >= 4.3 22 June 2020, 08:58:09 UTC
8a1724a Add SQL for version 16 of the schema 22 June 2020, 08:26:40 UTC
d107a55 Implement storage of listed origins This new API endpoint allows listers to record the origins they have seen during their current run. Origins are identified by the lister instance, the url of the origin, and the type of loader that should be used to load this origin. The implementation allows listers just send the list of origins they've seen (with some lightweight extra information), leaving the backend to handle whether to do an insertion or an update to an existing origin. The current implementation doesn't disable origins that have disappeared when doing a full listing run. This step will be done by a separate "origin garbage collection" endpoint, which will peruse the `last_seen` field. 16 June 2020, 08:25:08 UTC
e0fa5c5 Move lister addition in scheduler tests to a pytest fixture This lets us keep the tests a little DRYer. 16 June 2020, 08:24:03 UTC
04894bd Lister.instance_name doesn't need a factory/default value 16 June 2020, 08:22:23 UTC
f520108 Improve support of primary keys This splits primary keys across "automatic" primary keys (handled by the database) and manual primary keys (managed by the user). Use the opportunity to improve/clarify the documentation of field metadata attributes. 16 June 2020, 08:22:12 UTC
1c93e55 Implement basic storage and retrieval of lister information This adds a pair a functions to the backend: - `get_or_create_lister` pulls the record for a given lister from the database - `update_lister` updates the record for a given lister in the database This is one of the basic building blocks for the integration of lister information directly in the scheduler database. Related to T2442. 15 June 2020, 13:41:02 UTC
466ac59 Introduce a SchedulerException base class This allows us to automatically serialize/deserialize exceptions under this base class within our RPC framework. 15 June 2020, 12:53:30 UTC
c509a12 Introduce some scaffolding for an attrs-based BaseSchedulerModel Alongside swh.model.model, this allows us to define data models for the objects the scheduler is working with, and to serialize/deserialize these objects transparently at the RPC layer. This also introduces some mild ORM-like logic so we can keep the actual SQL a little DRYer. 15 June 2020, 10:49:25 UTC
4c0c37b Use the automatic RPC client/server generation 11 June 2020, 09:42:37 UTC
aedd323 Replace swh-worker-control with a swh scheduler celery-monitor subcommand This new subcommand has two commands: - ping: checks whether the given worker instance answers within a given timeout - list-running: lists running tasks on the given worker instance 10 June 2020, 10:15:54 UTC
8411335 Remove double logging setup in cli The logging module is already initialized by the main swh.core cli; This only creates double logging with no advantages whatsoever. 10 June 2020, 09:30:31 UTC
873cdac Handle psycopg2 OperationalError in cli initialization When running the cli with default settings (i.e. pointing to a softwareheritage-scheduler-dev database), and the database doesn't exist, an OperationalError is raised. This shouldn't prevent (some of the) cli subcommands from working, so catch this error and ignore it as one of the scheduler backend setup failure modes. 10 June 2020, 09:28:19 UTC
28c5b8d Replace vcversioner with setuptools-scm 09 June 2020, 13:49:00 UTC
14cd5bb Blacken for python3.7+ 03 June 2020, 15:19:00 UTC
6ac3d56 Drop use of pifpaf and the "db" pytest mark We've been using pytest-postgresql for... a year (4117d5a). 03 June 2020, 10:34:11 UTC
3f42423 Add future dependency, missing from celery 4.4.4 Without future, the tests involving celery hang indefinitely. Upstream issue: https://github.com/celery/celery/issues/6145 03 June 2020, 09:29:58 UTC
92c0869 Celery runner: only schedule tasks when the buffer is less than 80% full The queries to pick up tasks from the scheduler sometimes degenerate when the number of tasks fetched is too low, which hangs the runner for all other tasks. Adding this lower bound helps postgresql use proper optimizations to pull tasks. 19 May 2020, 09:34:52 UTC
b839906 Disable the azure http logger in the celery worker base config This is suboptimal (we should move all of this to a logconfig where we can set this stuff), but this is consistent with how we do things currently. 19 May 2020, 09:14:25 UTC
2ea919c Fix black for py37 19 May 2020, 09:12:26 UTC
3a74069 test_scheduler: Fix pep8 violation This fixes ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/859/console 12 May 2020, 09:55:09 UTC
2cc8aa0 setup.py: add documentation link 29 April 2020, 16:33:16 UTC
1abff22 setup: Update the minimum required runtime python3 version Related to T2367 20 April 2020, 15:29:49 UTC
551ceac Add a pyproject.toml file to target py37 for black 08 April 2020, 20:16:58 UTC
cc0ef04 Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment. 08 April 2020, 14:58:01 UTC
77b2d0b tests: Adapt model according to latest change origin model no longer allows to have type. Related to f533f62bbf114cfcc29f7c72307c4dfbe99cf048 27 March 2020, 06:43:03 UTC
e6c2a86 Implement listener on top of pika instead of celery 23 March 2020, 11:52:06 UTC
68c42fb scheduler.backend_es: Leave index opened when streaming bulk Prior to this commit, we had the proper behavior of closing index when done streaming. Unfortunately, this created too much gc on es nodes down the line. So for now, we remove that behavior. Note that this implies we need another cog that makes a pass once in a while on indices to close. Also, this has been running on production for 2 weeks now and no more gc issues arose since then. 26 February 2020, 09:34:09 UTC
af58466 backend: Make create_task_type idempotent There is no reason to raise an error when a task type has already been created and it enables to stop leaking psycopg2 IntegrityError exception as part of the scheduler interface. 18 February 2020, 14:17:02 UTC
b92e3fd Use swh-storage validation proxy. Required by swh-storage >= v0.0.172. 12 February 2020, 12:48:52 UTC
73d1e5e cli.task: Change `get_storage` according to latest change 31 January 2020, 08:18:25 UTC
1c923aa test_cli: Fix storage instantiation following api change Using the `swh.storage.get_storage` function instead of calling directly the class name. This actually fixes the master ci build [1] [1] https://jenkins.softwareheritage.org/job/DSCH/job/tests/743/console 31 January 2020, 08:16:20 UTC
f6cc231 sentry: Fix initialization init_sentry call Api wise, the `sentry_dsn` is expected to be passed as first parameter. Which in the scheduler's case is not set yet. Forcing it to None for now. 23 January 2020, 13:21:21 UTC
0712207 Use swh.core.sentry instead of calling sentry_sdk.init directly. This adds support for SWH_MAIN_PACKAGE to initialize sentry_sdk with a release. 10 January 2020, 14:13:07 UTC
b488d69 backend_es: Fix configuration mapping 17 December 2019, 22:23:35 UTC
cc2de16 tests: Try to avoid fixture redefinition Somehow, that messes other tests in the debian build. 17 December 2019, 14:57:33 UTC
73ade78 tests: Avoid fixture clash in different purposes fixture Somehow, that fails in the debian build 17 December 2019, 14:27:50 UTC
e9d8a5f scheduler.backend: Rename appropriately module elasticsearch_memory 17 December 2019, 12:33:43 UTC
2cbfb78 Add tests to in memory elasticsearch implementation 17 December 2019, 12:33:43 UTC
ba5920d backend_es: Add tests around elasticsearch client instantiation 17 December 2019, 12:33:43 UTC
38d17de tests/common: Remove uneeded behavior 17 December 2019, 12:33:43 UTC
ac32b5e backend: Add alternate memory elasticsearch implem to allow testing 17 December 2019, 12:33:43 UTC
7b1c2d5 scheduler.backend_es: Allow using different elasticsearch clients For the moment, only 1 official es client exists 17 December 2019, 12:33:43 UTC
ec207fb scheduler.backend: Make the returned result a dict 17 December 2019, 12:33:42 UTC
f97bff6 cli.task: Make page_token actually a string even from the cli That actually make it consistent with the api 17 December 2019, 12:33:42 UTC
d8859d7 backend_es: Add initialization endpoint 17 December 2019, 12:33:42 UTC
d5cea20 backend_es: Remove unused endpoint 17 December 2019, 12:33:42 UTC
18df124 cli.tasks: Unify logging instruction 17 December 2019, 12:33:42 UTC
c5e189b test: Allow status definition during task template generation 17 December 2019, 12:33:42 UTC
844f3e0 tests.scheduler: Extract common utility function and test it 17 December 2019, 12:33:42 UTC
2d56669 scheduler.cli.task: Rename appropriately backend variable 17 December 2019, 12:33:42 UTC
793c233 scheduler.backend_es: Rename backend class appropriately 17 December 2019, 12:33:42 UTC
d5bf6b1 cli.task: Rename internal method appropriately 17 December 2019, 12:33:42 UTC
eb1c3d3 backend_es: Use consistent logging instruction 17 December 2019, 12:33:42 UTC
b376eb9 backend_es: Enclose close instruction within finally 17 December 2019, 12:33:42 UTC
f6726e9 backend_es: Create index when it does not exist 17 December 2019, 12:33:41 UTC
ad54c6b backend_es: Open indices prior to indexing method calls 17 December 2019, 12:33:41 UTC
305422b cli.task: Tasks needs to be sorted prior to group by call 17 December 2019, 12:33:41 UTC
d603608 cli.task: Use the configuration provided by the cli 17 December 2019, 12:33:41 UTC
e0dd669 Initialize Sentry on worker startup. 16 December 2019, 17:55:11 UTC
f1b3f49 Print a traceback in case a signal callback crashes. Celery silently eats errors happening in these functions. 16 December 2019, 17:54:54 UTC
dbd4a2f backend: Align paginated endpoint consistently with others 16 December 2019, 15:34:27 UTC
3ab0348 backend: Filter properly archive within the defined range Prior to this commit, we could list tasks whose started date was null. Now we fallback on the scheduled task which is the next best date we have. 14 December 2019, 13:25:17 UTC
080db58 test_scheduler: Add some more check on filtering test 13 December 2019, 15:34:08 UTC
b8b171d backend: Make filter_task_to_archive a paginated endpoint Related to T1931 13 December 2019, 14:08:20 UTC
2b93efb tox: Add ipdb dependency on py3-dev env 13 December 2019, 14:08:10 UTC
ee162fe Use a btree of (task_type, md5(arguments)) to match task arguments The former index on hash(arguments->'args') has lost relevance as about half the tasks (the ones for the loader) have the same value (an empty list) for this field. This index is more universal, faster, and also easier to convince the planner of using. If we want more specific indexes (e.g. on specific keyword arguments) we'll be able to add that separately. 13 December 2019, 10:32:33 UTC
0b04220 Remove the creation of the 'load-deposit' task type it's now managed by swh-loader-core directly. 12 December 2019, 16:37:39 UTC
4071d71 Make --status option of 'swh scheduler task list' a click.Choice 12 December 2019, 11:27:13 UTC
a18f562 celery: add 2 statsd probes for the runner and listener - runner: counting the number of scheduled tasks, - listener: counting the number of processed events. 04 December 2019, 15:36:19 UTC
f206076 celery: make SWHTask send start/end of execution statsd gauges with timestamps Closes T2119. 04 December 2019, 09:28:26 UTC
08243bb tests: fix celery_task's test_multiping kwargs were not passed correctly. Also add a test_ping_with_kw test. 04 December 2019, 09:25:05 UTC
8c1e051 scheduler.updater: Remove dead code 26 November 2019, 11:29:47 UTC
95940a8 Migrate tox.ini to extras = xxx instead of deps = .[testing] 21 November 2019, 12:57:25 UTC
7c40132 Merge tox test environment configurations 21 November 2019, 12:54:47 UTC
101a131 Add a pre-commit config file 21 November 2019, 12:50:21 UTC
104fee0 Drop version constraint on pytest < 4 21 November 2019, 11:04:54 UTC
56e4a12 Include all requirements in MANIFEST.in 20 November 2019, 18:56:38 UTC
c973ec0 req-swh*: Remove old package loader backend names Related to T1389 T2098 Related to D2306 D2305 D2304 19 November 2019, 14:12:27 UTC
9358572 swh.scheduler.cli: Add `swh scheduler task-type register` cli This allos registering of worker's task types to the scheduler through setuptools' mechanism. 19 November 2019, 11:11:33 UTC
8ec34fe Remove collect_ignore from conftest.py This got solved when we started using the shared_task decorator instead of instantiating our own app. 23 October 2019, 08:40:42 UTC
4df2406 Use the shared_task decorator instead of binding to a specific celery app 23 October 2019, 08:32:26 UTC
ecf38eb celery/tests: mostly revert e770eb30 to fix celery app initialization in tests This revision did fix tests for the scheduler itself, but broke all other tests of scheduler dependent swh packages. In this fix, we ensure we override the `app` in swh.scheduler.celery_backend.config, since it is used by all celery task declarations (via the @app.task() decorator). 18 October 2019, 14:37:28 UTC
787c7a9 celery_backend.config: Make JournalHandler import optional swh-core no longer comes with JournalHandler by default. 18 October 2019, 11:33:25 UTC
c2a020d tests: rewrite tests using pytest and the new rpc fixtures from swh.core 16 October 2019, 11:20:42 UTC
a7e15bf add a new get_priority_ratios endpoint to the scheduler this is necessary to make it much easier to write tests so they do not need to execute SQL statements, which makes possible to run exactly the same tests with the SchedulerBackend as the RemoteScheduler one (see the following revision). 16 October 2019, 11:20:42 UTC
c2ccf46 updater/tests: rewrite updater's tests as pytest functions The way the scheduler_db and updater_db fixtures are built is not very straighforward nor satisfying, but it works. 16 October 2019, 11:20:42 UTC
37b909e conftest: simplify the swh_scheduler() fixture simply use the postgresql.dsn as connection string. 16 October 2019, 08:42:04 UTC
e770eb3 tests: Explicit registering test tasks step for the swh_app Prior to this commit, the celery "app" import changed. Making the runtime application load prior to the tests "swh_app". In effect, making the tasks not being consumed by workers. This explicitely forces the tests tasks registering to "swh_app". In effect clarifying code and fixing the current tests. Related D2082 Related 8eafc70 10 October 2019, 13:26:54 UTC
349d23e scheduler: Use directly the package's server module to start server Related D2109 Related D2110 10 October 2019, 10:05:38 UTC
8eafc70 tox.ini: Use tests installed files instead of working directory Related D2082 10 October 2019, 09:51:27 UTC
5955c8d celery_backend/config: Fix wrong statement Dict's get method does not take keyword argument. Related D2033#47672 02 October 2019, 04:56:13 UTC
06137f0 tox: anticipate mypy run to just after flake8 01 October 2019, 11:08:23 UTC
c78b846 init.py: switch to documented way of extending path make mypy 0.730 pass cleanly again 27 September 2019, 08:38:31 UTC
1a691b5 tox.ini: add mypy section 24 September 2019, 11:55:04 UTC
c4fa353 typing: minimal changes to make a no-op mypy run pass 24 September 2019, 11:44:50 UTC
3cd5697 fix typo in docstring and sample file name courtesy of codespell 24 September 2019, 11:42:23 UTC
back to top