f413d93 | Antoine R. Dumont (@ardumont) | 31 May 2018, 09:42:51 UTC | New upstream version 0.0.29 | 31 May 2018, 09:42:51 UTC |
8bbbe7b | Antoine R. Dumont (@ardumont) | 31 May 2018, 09:37:04 UTC | swh.scheduler.cli: Change archival period to rolling month - 1 week This will permit a time window of 1 week to check everything is going smoothly in the scheduler db. Related T1031 Close T986 | 31 May 2018, 09:42:21 UTC |
05c64c0 | Antoine R. Dumont (@ardumont) | 30 May 2018, 18:28:43 UTC | swh.scheduler.updater.writer: Force filter resolution to list | 30 May 2018, 18:29:05 UTC |
c4dd3c8 | Antoine R. Dumont (@ardumont) | 30 May 2018, 18:28:17 UTC | swh.scheduler.cli: Change default archival period to current month | 30 May 2018, 18:28:17 UTC |
d0f7e94 | Antoine R. Dumont (@ardumont) | 30 May 2018, 14:18:11 UTC | swh.scheduler.cli: Improve logging message | 30 May 2018, 14:18:11 UTC |
7e03d80 | Antoine R. Dumont (@ardumont) | 30 May 2018, 14:17:44 UTC | swh.scheduler.updater.backend: Adapt configuration path accordingly | 30 May 2018, 14:17:44 UTC |
1b652ed | Antoine R. Dumont (@ardumont) | 29 May 2018, 12:12:15 UTC | New upstream version 0.0.28 | 29 May 2018, 12:12:15 UTC |
9f41ce3 | Antoine R. Dumont (@ardumont) | 29 May 2018, 12:11:07 UTC | packaging: Remove hypothesis as a runtime dependencies | 29 May 2018, 12:11:07 UTC |
8fd3bd7 | Antoine R. Dumont (@ardumont) | 29 May 2018, 10:27:34 UTC | New upstream version 0.0.27 | 29 May 2018, 10:27:34 UTC |
cde2def | Antoine R. Dumont (@ardumont) | 29 May 2018, 10:08:41 UTC | packaging: Remove .hypothesis folder | 29 May 2018, 10:26:22 UTC |
7b45010 | Antoine R. Dumont (@ardumont) | 29 May 2018, 10:00:04 UTC | packaging: Fix tests in packaging | 29 May 2018, 10:26:22 UTC |
a864649 | Antoine R. Dumont (@ardumont) | 29 May 2018, 09:36:32 UTC | packaging: Fix python3-swh.scheduler.updater tests package | 29 May 2018, 10:26:22 UTC |
1bdfaf1 | Antoine R. Dumont (@ardumont) | 29 May 2018, 09:22:40 UTC | d/control: Add new python3-swh.scheduler.updater package | 29 May 2018, 10:26:22 UTC |
5e56731 | Antoine R. Dumont (@ardumont) | 29 May 2018, 09:22:22 UTC | d/control: Fix runtime dependency version | 29 May 2018, 09:22:22 UTC |
58352e2 | Antoine R. Dumont (@ardumont) | 28 May 2018, 14:46:43 UTC | swh.scheduler.updater.writer: Add tests around writer | 28 May 2018, 14:57:35 UTC |
09e028e | Antoine R. Dumont (@ardumont) | 28 May 2018, 14:45:37 UTC | swh.scheduler.updater.backend: Make the reading timestamp optional | 28 May 2018, 14:46:07 UTC |
9def570 | Antoine R. Dumont (@ardumont) | 28 May 2018, 14:45:10 UTC | swh.scheduler.updater.writer: Do not set the limit at this level | 28 May 2018, 14:45:10 UTC |
c288da2 | Antoine R. Dumont (@ardumont) | 28 May 2018, 09:34:05 UTC | swh.scheduler.updater.writer: Write tasks in batch Also, make the writer stops when no more data to write | 28 May 2018, 09:34:05 UTC |
ab844be | Antoine R. Dumont (@ardumont) | 25 May 2018, 13:57:38 UTC | swh-scheduler: Fill in the blanks between priority tasks Related T1035 Related T1031 | 25 May 2018, 15:21:09 UTC |
378d23b | Antoine R. Dumont (@ardumont) | 25 May 2018, 08:33:38 UTC | data/elastic-template.json: Use 1 shard for the swh-tasks index | 25 May 2018, 08:33:38 UTC |
d925b05 | Antoine R. Dumont (@ardumont) | 25 May 2018, 07:40:10 UTC | swh.scheduler.backend: Drop duplicate tasks at creation time Related T1031 Related T1051 | 25 May 2018, 07:41:27 UTC |
3f02cc3 | Antoine R. Dumont (@ardumont) | 24 May 2018, 09:50:46 UTC | swh.scheduler.api.client: Permit to specify the query timeout option Related T1061 | 25 May 2018, 07:41:27 UTC |
2b660ca | Antoine R. Dumont (@ardumont) | 23 May 2018, 13:59:38 UTC | swh.scheduler.backend: Add missing type cast | 23 May 2018, 13:59:38 UTC |
455f6d4 | Antoine R. Dumont (@ardumont) | 23 May 2018, 13:59:21 UTC | swh.scheduler.updater.writer: Remove noqa statement | 23 May 2018, 13:59:21 UTC |
56ce0f0 | Antoine R. Dumont (@ardumont) | 23 May 2018, 12:40:34 UTC | swh.scheduler.celery_backend.runner: Read more oneshot tasks | 23 May 2018, 12:40:34 UTC |
801d8e2 | Antoine R. Dumont (@ardumont) | 23 May 2018, 12:40:11 UTC | swh.scheduler.updater.writer: Sleep in between read/write cycle | 23 May 2018, 12:40:11 UTC |
3b0d71b | Antoine R. Dumont (@ardumont) | 23 May 2018, 12:39:29 UTC | swh.scheduler.updater.writer: Reuse oneshot task dict creation api | 23 May 2018, 12:39:29 UTC |
6980dbb | Antoine R. Dumont (@ardumont) | 23 May 2018, 12:07:00 UTC | swh.scheduler.updater.writer: Add configuration docstring | 23 May 2018, 12:07:35 UTC |
142f77d | Antoine R. Dumont (@ardumont) | 23 May 2018, 08:17:46 UTC | swh.scheduler.updater: Rename appropriately rate to cnt | 23 May 2018, 08:35:44 UTC |
a06c20b | Antoine R. Dumont (@ardumont) | 22 May 2018, 16:17:13 UTC | swh.scheduler.updater.sql: Add first_seen column | 22 May 2018, 16:17:13 UTC |
070e516 | Antoine R. Dumont (@ardumont) | 22 May 2018, 15:09:37 UTC | swh.scheduler.updater.writer: Update variable names appropriately | 22 May 2018, 15:09:37 UTC |
eeb5d77 | Antoine R. Dumont (@ardumont) | 18 May 2018, 16:17:17 UTC | swh.scheduler.updater.writer: Bootstrap the scheduler updater writer | 18 May 2018, 16:17:17 UTC |
33bc608 | Antoine R. Dumont (@ardumont) | 18 May 2018, 11:13:21 UTC | swh.scheduler.updater: Reference the origin's type | 18 May 2018, 11:29:30 UTC |
7b0c316 | Antoine R. Dumont (@ardumont) | 18 May 2018, 09:38:07 UTC | swh.scheduler.updater.ghtorrent: Don't open too many channels | 18 May 2018, 09:39:07 UTC |
c7d17a5 | Antoine R. Dumont (@ardumont) | 18 May 2018, 09:37:47 UTC | swh.scheduler.updater.events: Fix event's __str__ implementation | 18 May 2018, 09:39:07 UTC |
7642339 | Antoine R. Dumont (@ardumont) | 18 May 2018, 09:37:40 UTC | swh.scheduler.updater.consumer: Make the logging actually log | 18 May 2018, 09:39:07 UTC |
8e0a169 | Antoine R. Dumont (@ardumont) | 18 May 2018, 08:43:27 UTC | swh.scheduler.updater.ghtorrent: Explicit interesting event keys | 18 May 2018, 08:45:43 UTC |
8e39b42 | Antoine R. Dumont (@ardumont) | 18 May 2018, 08:43:02 UTC | swh.scheduler.tests.updater: Reuse code in mixin | 18 May 2018, 08:45:43 UTC |
8add41c | Antoine R. Dumont (@ardumont) | 18 May 2018, 08:30:01 UTC | swh.scheduler.updater.consumer: Improve event management | 18 May 2018, 08:30:01 UTC |
2aa60e0 | Antoine R. Dumont (@ardumont) | 18 May 2018, 08:21:10 UTC | swh.scheduler.updater.consumer: Use logging instead of print | 18 May 2018, 08:21:10 UTC |
ff65e77 | Antoine R. Dumont (@ardumont) | 17 May 2018, 14:07:53 UTC | swh.scheduler.updater.ghtorrent: Remove scratch code | 17 May 2018, 14:10:36 UTC |
f2b0500 | Antoine R. Dumont (@ardumont) | 17 May 2018, 13:52:56 UTC | swh.scheduler.updater.consumer: Test consumer interface | 17 May 2018, 14:10:36 UTC |
24a887e | Antoine R. Dumont (@ardumont) | 17 May 2018, 12:38:18 UTC | swh.scheduler.updater.ghtorrent: Test ghtorrent implementation | 17 May 2018, 14:06:04 UTC |
e281091 | Antoine R. Dumont (@ardumont) | 17 May 2018, 12:37:51 UTC | swh.scheduler.updater.ghtorrent: Simplify connection setup | 17 May 2018, 12:37:51 UTC |
86e0daf | Antoine R. Dumont (@ardumont) | 17 May 2018, 11:43:45 UTC | swh.scheduler.updater.ghtorrent: Clarify docstrings | 17 May 2018, 11:43:45 UTC |
07397ae | Antoine R. Dumont (@ardumont) | 17 May 2018, 11:43:30 UTC | swh.scheduler.updater.ghtorrent: Explicit configuration options | 17 May 2018, 11:43:30 UTC |
c454b41 | Antoine R. Dumont (@ardumont) | 17 May 2018, 11:37:28 UTC | swh.scheduler.updater.ghtorrent: Clean up dead code | 17 May 2018, 11:37:28 UTC |
52614b9 | Antoine R. Dumont (@ardumont) | 17 May 2018, 11:32:06 UTC | swh.scheduler.updater.ghtorrent: Move implem inside its own module | 17 May 2018, 11:35:08 UTC |
e2be888 | Antoine R. Dumont (@ardumont) | 17 May 2018, 10:37:46 UTC | swh.scheduler.updater.ghtorrent: Simplify consumer interface | 17 May 2018, 11:26:17 UTC |
0c1b67e | Antoine R. Dumont (@ardumont) | 17 May 2018, 10:24:48 UTC | swh.scheduler.updater.ghtorrent: Improve reading events | 17 May 2018, 10:29:50 UTC |
441aa92 | Antoine R. Dumont (@ardumont) | 16 May 2018, 16:37:56 UTC | swh.scheduler.updater.consumer: Flush memory cache | 16 May 2018, 16:39:39 UTC |
d4dbf0a | Antoine R. Dumont (@ardumont) | 16 May 2018, 16:37:30 UTC | swh.scheduler.updater.consumer: Check for data to send in final step | 16 May 2018, 16:37:30 UTC |
333afe9 | Antoine R. Dumont (@ardumont) | 16 May 2018, 16:26:08 UTC | swh.scheduler.updater: Design the UpdaterConsumer interface | 16 May 2018, 16:26:08 UTC |
f16fa5f | Antoine R. Dumont (@ardumont) | 16 May 2018, 15:38:21 UTC | swh.scheduler.updater: Don't subscribe to create event Empty repository are not that interesting, a push event on it will be much better later. | 16 May 2018, 15:38:21 UTC |
5bda011 | Antoine R. Dumont (@ardumont) | 16 May 2018, 13:21:24 UTC | swh.scheduler.updater: Actually use the cache_put method with list | 16 May 2018, 13:21:24 UTC |
3a39baa | Antoine R. Dumont (@ardumont) | 16 May 2018, 12:19:50 UTC | updater/ghtorrent: Open ghtorrent consumer as cli script | 16 May 2018, 12:27:34 UTC |
8d5bb5b | Antoine R. Dumont (@ardumont) | 16 May 2018, 11:44:12 UTC | updater: Actually consume ghtorrent event Also make GHTorrent and FakeGHTorrent publisher converge. Related T1051 | 16 May 2018, 11:44:12 UTC |
b9b3c04 | Antoine R. Dumont (@ardumont) | 16 May 2018, 11:43:18 UTC | tests/updater: Simplify events tests | 16 May 2018, 11:43:18 UTC |
0f28201 | Antoine R. Dumont (@ardumont) | 16 May 2018, 08:18:04 UTC | swh.scheduler.updater: Move updater tests to its own arborescence | 16 May 2018, 11:42:51 UTC |
05cc4c3 | Antoine R. Dumont (@ardumont) | 14 May 2018, 13:17:43 UTC | updater.ghtorrent: Write events to scheduler updater backend | 14 May 2018, 13:19:18 UTC |
1f6b4d8 | Antoine R. Dumont (@ardumont) | 14 May 2018, 13:11:16 UTC | events: Rename event key to type | 14 May 2018, 13:11:16 UTC |
fd37df2 | Antoine R. Dumont (@ardumont) | 14 May 2018, 13:09:56 UTC | updater.scratch: Update tryout code | 14 May 2018, 13:09:56 UTC |
7e88ec2 | Antoine R. Dumont (@ardumont) | 14 May 2018, 09:05:33 UTC | scheduler.api.server: Instantiate scheduler backend once per import Related 18c9dad986a1f6f19d57dd97079dc22ad10b04df | 14 May 2018, 09:05:33 UTC |
97f03a8 | Antoine R. Dumont (@ardumont) | 14 May 2018, 07:00:30 UTC | Fix pep8 violation, remove unused import, fix typo | 14 May 2018, 07:00:30 UTC |
9dd3cd3 | Antoine R. Dumont (@ardumont) | 09 May 2018, 16:06:38 UTC | swh.scheduler.updater.backend: Bootstrap backend api Related T1051 | 09 May 2018, 16:06:38 UTC |
464d759 | Antoine R. Dumont (@ardumont) | 09 May 2018, 09:11:40 UTC | Reference tryout code work to assert we can work with ghtorrent We cannot so far Related T1051 | 09 May 2018, 09:13:17 UTC |
ccdc134 | Antoine R. Dumont (@ardumont) | 09 May 2018, 08:31:24 UTC | swh.scheduler.updater: Add SWHEvent class and tests around it Related T1051 | 09 May 2018, 08:33:47 UTC |
5de5eb9 | Antoine R. Dumont (@ardumont) | 09 May 2018, 08:29:28 UTC | scheduler.updater: Add publish/subscribe fake ghtorrent class As there remain issue with ghtorrent's infra [1], i'm using fake random generator event for now. [1] https://github.com/ghtorrent/ghtorrent.org/issues/397#issuecomment-387052462 Related T1051 | 09 May 2018, 08:33:47 UTC |
65e4b21 | Antoine R. Dumont (@ardumont) | 02 May 2018, 15:26:34 UTC | swh.scheduler.backend: Permit to create tasks with priority Related T1035 | 03 May 2018, 10:15:03 UTC |
5e92ae1 | Antoine R. Dumont (@ardumont) | 27 April 2018, 17:29:34 UTC | swh.scheduler: Schedule tasks with/without priority Related T1035 | 03 May 2018, 10:15:03 UTC |
9bf1a79 | Nicolas Dandrimont | 02 May 2018, 11:15:53 UTC | Don't override scheduler configuration by default | 02 May 2018, 11:15:53 UTC |
bd3da9b | Antoine R. Dumont (@ardumont) | 26 April 2018, 15:34:07 UTC | New upstream version 0.0.26 | 26 April 2018, 15:34:07 UTC |
ffd2dda | Antoine R. Dumont (@ardumont) | 26 April 2018, 15:33:52 UTC | d/rules: Fix package build | 26 April 2018, 15:33:52 UTC |
950a7d5 | Antoine R. Dumont (@ardumont) | 25 April 2018, 14:10:26 UTC | swh.scheduler.tests: Test remote scheduler api as well Related T1036 | 25 April 2018, 16:37:43 UTC |
6ef0a88 | Antoine R. Dumont (@ardumont) | 25 April 2018, 11:03:44 UTC | swh.scheduler: Add tests around removing archivable tasks Related T986 Related T1034 | 25 April 2018, 16:36:34 UTC |
7afd050 | Antoine R. Dumont (@ardumont) | 25 April 2018, 09:57:17 UTC | swh.scheduler: Add tests around filtering archivable tasks Related T986 Related T1034 | 25 April 2018, 16:36:34 UTC |
5ccfa8b | Antoine R. Dumont (@ardumont) | 25 April 2018, 09:02:02 UTC | swh-scheduler-schema: Fix unneeded drop instructions Should not have been committed in that file, only in the migration part. | 25 April 2018, 16:36:34 UTC |
b7490ee | Antoine R. Dumont (@ardumont) | 24 April 2018, 14:48:05 UTC | swh.scheduler.cli: Improve docstring | 25 April 2018, 16:36:34 UTC |
8c3910a | Antoine R. Dumont (@ardumont) | 24 April 2018, 14:26:13 UTC | swh.scheduler.cli: Permit to specify the backend to use in cli Related T1034 | 25 April 2018, 16:36:34 UTC |
9e13fd2 | Antoine R. Dumont (@ardumont) | 24 April 2018, 13:40:33 UTC | swh.scheduler.api: Bootstrap scheduler's remote api Related T1034 | 25 April 2018, 16:36:34 UTC |
5349689 | Antoine R. Dumont (@ardumont) | 24 April 2018, 12:08:28 UTC | swh.scheduler: Use `get_scheduler` api to instantiate a scheduler Related T1034 | 24 April 2018, 14:55:43 UTC |
ef4eb14 | Antoine R. Dumont (@ardumont) | 24 April 2018, 12:07:48 UTC | swh.scheduler.backend: Fix docstring | 24 April 2018, 12:07:48 UTC |
7d95da2 | Antoine R. Dumont (@ardumont) | 18 April 2018, 10:34:43 UTC | New upstream version 0.0.25 | 18 April 2018, 10:34:43 UTC |
8124229 | Antoine R. Dumont (@ardumont) | 18 April 2018, 09:33:45 UTC | swh.scheduler.cli.archive: Index arguments.kwargs as text Related T1023 | 18 April 2018, 09:33:45 UTC |
b67f570 | Antoine R. Dumont (@ardumont) | 13 April 2018, 12:55:32 UTC | New upstream version 0.0.24 | 13 April 2018, 12:55:32 UTC |
f4587a3 | Antoine R. Dumont (@ardumont) | 13 April 2018, 09:45:32 UTC | data/template: Do not index the arguments field (it's in _source) As this field is randomly large depending on the task at end, this triggers a limit (index.mapping.total_fields.limit to 1000). We do not really need this in the index as the data will still be in the _source. [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings Related T1023 | 13 April 2018, 09:59:37 UTC |
915db34 | Antoine R. Dumont (@ardumont) | 13 April 2018, 09:10:30 UTC | data/README: Add a small readme to explain es install step | 13 April 2018, 09:59:36 UTC |
547eb89 | Antoine R. Dumont (@ardumont) | 13 April 2018, 09:07:51 UTC | swh.scheduler.cli: Add a bulk index flag to separate read from index Try to reduce the number of connection timeout error. There are 3 solutions: - either increase the default timeout of 10 per request - same on server configuration basis (no) - either reduce the number of data to bulk index, this is the chosen solution Related T1023 | 13 April 2018, 09:59:27 UTC |
532a340 | Antoine R. Dumont (@ardumont) | 10 April 2018, 15:43:07 UTC | New upstream version 0.0.23 | 10 April 2018, 15:43:07 UTC |
e972b6a | Antoine R. Dumont (@ardumont) | 10 April 2018, 14:21:57 UTC | swh.scheduler.cli.archive: Simplify task and task_run ids extraction Related T986 | 10 April 2018, 15:43:03 UTC |
6c11eb6 | Antoine R. Dumont (@ardumont) | 10 April 2018, 14:06:38 UTC | swh.sched.cli.archive: Improve logging Related T986 | 10 April 2018, 15:43:03 UTC |
1da2d71 | Antoine R. Dumont (@ardumont) | 10 April 2018, 13:53:05 UTC | swh.scheduler.cli.archive: Delete only completely indexed tasks Prior to this commit, it could happen that we removed tasks even though we did not yet index associated task_run. Related T986 | 10 April 2018, 15:42:57 UTC |
04ccc2d | Antoine R. Dumont (@ardumont) | 10 April 2018, 12:44:15 UTC | swh.sched.cli.archive: Use interval period to filter archival tasks Related T986 | 10 April 2018, 14:00:12 UTC |
962fd8b | Antoine R. Dumont (@ardumont) | 10 April 2018, 12:42:35 UTC | swh.scheduler.backend_es: Return operation failure instead of raising Prior to this commit, an error would raise and stop all indexation. As the code is already waiting for a tuple (operation-status, item), we instead leverage this and continue working on indexation. All items whose `operation-status` is False (meaning failure to index, whatever the reason) is not indexed. Another run would then pickup the leftover and index it. Related T986 | 10 April 2018, 13:55:38 UTC |
a04fb85 | Antoine R. Dumont (@ardumont) | 09 April 2018, 14:09:16 UTC | New upstream version 0.0.22 | 09 April 2018, 14:09:16 UTC |
9b9b88c | Antoine R. Dumont (@ardumont) | 09 April 2018, 13:59:16 UTC | d/control: Update to recent python3-elasticsearch client | 09 April 2018, 14:09:12 UTC |
9e5bf35 | Antoine R. Dumont (@ardumont) | 30 March 2018, 13:02:55 UTC | New upstream version 0.0.21 | 30 March 2018, 13:02:55 UTC |
ffd00cb | Antoine R. Dumont (@ardumont) | 30 March 2018, 12:54:26 UTC | swh.scheduler.backend_es: Fix config base filename variable name | 30 March 2018, 12:55:25 UTC |
ad0f0e2 | Antoine R. Dumont (@ardumont) | 30 March 2018, 10:22:28 UTC | data/elastic-template.json: Use elasticsearch's default conf Defaults to 5 shards and 1 replica | 30 March 2018, 12:54:54 UTC |
6cc6cb7 | Antoine R. Dumont (@ardumont) | 30 March 2018, 09:44:18 UTC | New upstream version 0.0.20 | 30 March 2018, 09:44:18 UTC |