https://forge.softwareheritage.org/source/snippets.git

sort by:
Revision Author Date Message Commit Date
b1a81b1 Merge branch 'generated-differential-D6957-source' into 'generated-differential-D6957-target' Add recover_corrupt_objects.py See merge request swh/devel/snippets!13 06 January 2023, 22:22:55 UTC
219ca5b recover_corrupt_objects.py: Double-check insertion 24 January 2022, 12:47:04 UTC
7a7d0b3 Add recover_corrupt_objects.py 20 January 2022, 09:04:49 UTC
412f0c6 Add backup-weird-dirs-ok-in-kafka.py 17 January 2022, 14:19:14 UTC
39ccc5c analyze_consistency_failures: More heuristics 17 January 2022, 14:17:38 UTC
6fef04f analyze_consistency_failures: Cache corrupt objects to save time 17 January 2022, 14:15:54 UTC
6a2068d add swh-log-oneline.py; displays a one-line summary for each revision in history 01 December 2021, 15:09:18 UTC
b64f95c add swh-diff.py; displays SWH revision changes as a unified diff 01 December 2021, 15:07:46 UTC
6d92628 grid5000/cassandra: improve vault deployment - fix storage deployment affinity - add a storage-vault using the staging r/o storage - add the scheduler-runner-with-priority - add a retry filter on the vault and cooker to improve reliability when the remote objstorage failed Related to T3683 03 November 2021, 15:39:03 UTC
5411afb grid5000/cassandra: deploy the vault components - Stop to explicitely name the server in the affinity definitions, switch to node labels - deploy the components for the vault Related to T3683 22 October 2021, 08:31:25 UTC
ec226b0 grid5000/cassandra: commit some pending changes - use ops in grafana storage dashboard - fix loaders configuration - declare all cassandra nodes in the storage configuration - add storage access logs to trace request durations Related to T3357 19 October 2021, 23:58:26 UTC
0d2c1eb grid5000/cassandra: generate statistics and graphics Add 2 scripts to extract statistics and graph the results from the loaders logs Related to T3577 19 October 2021, 23:58:17 UTC
f204896 analyze_consistency_failures.py: get rid of origin liveliness detection It's a false negative for about a hundred revisions, and isn't a very useful optimization in the end. 15 October 2021, 11:58:31 UTC
671c5d4 analyze_consistency_failures.py: add support for directories 15 October 2021, 11:11:18 UTC
2dbd41a analyze_consistency_failures.py: add support for a fallback swh-graph instance it's more up to date, but partial. 13 October 2021, 15:01:51 UTC
04b5bcd analyze_consistency_failures.py: minor tweaks 13 October 2021, 15:01:21 UTC
e1d8090 analyze_consistency_failures.py: various performance improvements when all origins are already cloned, this is a 5 to 10% improvement in throughput. Content: 1. freeze the GC before forking 2. remove some very costly but rather useless heuristics 3. reorder heuristics to run the cheapest ones first 11 October 2021, 08:41:25 UTC
357946d Autoscale workers depending on number of messages in queues Also dedicate loaders per queue Related to T3592 08 October 2021, 14:01:39 UTC
e2d6fc8 config-map: Change pool policy to prefork and drop --events Related to T3592 08 October 2021, 14:01:39 UTC
498edfa analyze_consistency_failures.py: misc fixes 08 October 2021, 10:35:39 UTC
1e7a1a1 analyze_consistency_failures.py: Avoid cloning/fetching the same origins over and over 08 October 2021, 10:35:07 UTC
9732757 analyze_consistency_failures.py: Clone all linux forks in the same repo (saves a lot of space) 07 October 2021, 13:08:11 UTC
2bbf8e6 analyze_consistency_failures.py: Add support for releases + mitigate FD leaks 07 October 2021, 13:08:11 UTC
541a8e0 analyze_consistency_failures.py: Fix some remaining bugs in revision handling 07 October 2021, 13:08:11 UTC
0b2c9ff grid5000/cassandra: increase objstorage capacity - increase replica count to 4 (tmpfs volumes need to be prepared on the servers before) - deploy them on the loader nodes for more parallelism Related to T3357 06 October 2021, 00:45:10 UTC
2a7a2ef grid5000/cassandra: improve git loader benchmark stability - upgrade rabitmq to 3.7 (same version as in production) - Configure the loader to don't send the task result - remove the limits on the storage server, not needed as there is enough capacity on the server and there is no autoscaling for the moment - Deploy the kubernetes dashboard to help monitoring the pods Related to T3577 06 October 2021, 00:21:03 UTC
315248b analyze_consistency_failures.py: Cache swh-graph output, it's too slow 05 October 2021, 09:55:41 UTC
279cb4a analyze_consistency_failures.py: Add more edge cases, normalize digest keys, some better error handling 05 October 2021, 09:55:41 UTC
5448697 analyze_consistency_failures.py: Add more heuristics + write out fixed/recovered objects. 01 October 2021, 15:06:10 UTC
7cc495e grid5000/cassandra: kubernetes configuration for massive parallel loader test Related to T3577 01 October 2021, 14:37:14 UTC
e866727 analyze_consistency_failures.py: Make multiprocess + handle more edge cases 30 September 2021, 17:39:25 UTC
9a1e273 Add autoscale configuration 30 September 2021, 14:49:53 UTC
9f7568f vlorentz/analyze_consistency_failures.py: Initial commit 29 September 2021, 18:35:08 UTC
80a5ca1 check_consistency.py: Fix Kafka message corruption issues 29 September 2021, 18:35:08 UTC
4c59e23 check_consistency.py: Fix deprecation warnings on swh-model >= 3.0.0 29 September 2021, 18:35:08 UTC
bec9e45 Try some autoscaling based on metrics usage It's a priori only relatedly to the container so not exactly our goal. Let's try it nonetheless. 29 September 2021, 15:48:19 UTC
151bf47 Make the number of replicas configurable 29 September 2021, 15:48:19 UTC
1a00f11 config-map: Reuse service name to match vhost 29 September 2021, 15:48:19 UTC
e6f0510 helm: Try celery's solo policy instead of prefork Related to T3592 29 September 2021, 15:48:19 UTC
6b321e6 helm: Mount container's /tmp as tmpfs mount Related to T3592 29 September 2021, 15:48:19 UTC
3aa92c4 helm: Install parametrized deployment for loaders Dry run install check: ``` helm install --dry-run stuff ./worker NAME: stuff LAST DEPLOYED: Thu Sep 23 17:01:19 2021 NAMESPACE: default STATUS: pending-install REVISION: 1 TEST SUITE: None HOOKS: MANIFEST: --- apiVersion: v1 kind: ConfigMap metadata: name: loaders data: config.yml: | storage: cls: pipeline steps: - cls: buffer min_batch_size: content: 10000 content_bytes: 104857600 directory: 10000 revision: 10000 - cls: filter - cls: retry - cls: remote url: http://storage:5002/ celery: task_broker: amqp://guest:guest@amqp// task_queues: - swh.loader.git.tasks.UpdateGitRepository entrypoint.sh: | #!/bin/bash set -e echo Starting the swh Celery worker exec python -m celery \ --app=swh.scheduler.celery_backend.config.app \ worker \ --pool=prefork --events \ --concurrency=${CONCURRENCY} \ --max-tasks-per-child=${MAX_TASKS_PER_CHILD} \ -Ofair --loglevel=${LOGLEVEL} \ --hostname "${HOSTNAME}" --- apiVersion: v1 kind: Service metadata: name: storage spec: type: ExternalName externalName: swh-storage --- apiVersion: v1 kind: Service metadata: name: amqp spec: type: ExternalName externalName: amqp --- apiVersion: apps/v1 kind: Deployment metadata: name: loaders labels: app: loaders spec: replicas: 1 selector: matchLabels: app: loaders strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 template: metadata: labels: app: loaders spec: containers: - name: loaders image: swh-loaders:latest imagePullPolicy: Always command: - /entrypoint.sh resources: requests: memory: "256Mi" cpu: "200m" limits: memory: "4000Mi" cpu: "1200m" lifecycle: preStop: exec: command: ["kill", "1"] env: - name: CONCURRENCY value: "1" - name: MAX_TASKS_PER_CHILD value: "5" - name: LOGLEVEL value: "INFO" - name: SWH_CONFIG_FILENAME value: /etc/softwareheritage/config.yml volumeMounts: - name: config mountPath: /etc/softwareheritage/config.yml subPath: config.yml readOnly: true - name: config mountPath: /entrypoint.sh subPath: entrypoint.sh readOnly: true volumes: - name: config configMap: name: loaders defaultMode: 0777$ helm install --dry-run stuff ./worker ... ``` Related to T3592 29 September 2021, 15:48:19 UTC
35bcef6 helm: Install parametrized services Dry run install check: ``` $ helm install --dry-run stuff ./worker NAME: stuff LAST DEPLOYED: Thu Sep 23 16:55:25 2021 NAMESPACE: default STATUS: pending-install REVISION: 1 TEST SUITE: None HOOKS: MANIFEST: --- apiVersion: v1 kind: ConfigMap metadata: name: loaders data: config.yml: | storage: cls: pipeline steps: - cls: buffer min_batch_size: content: 10000 content_bytes: 104857600 directory: 10000 revision: 10000 - cls: filter - cls: retry - cls: remote url: http://storage:5002/ celery: task_broker: amqp://guest:guest@amqp// task_queues: - swh.loader.git.tasks.UpdateGitRepository entrypoint.sh: | #!/bin/bash set -e echo Starting the swh Celery worker exec python -m celery \ --app=swh.scheduler.celery_backend.config.app \ worker \ --pool=prefork --events \ --concurrency=${CONCURRENCY} \ --max-tasks-per-child=${MAX_TASKS_PER_CHILD} \ -Ofair --loglevel=${LOGLEVEL} \ --hostname "${HOSTNAME}" --- apiVersion: v1 kind: Service metadata: name: storage spec: type: ExternalName externalName: swh-storage --- apiVersion: v1 kind: Service metadata: name: amqp spec: type: ExternalName externalName: amqp --- ... ``` Related to T3592 29 September 2021, 15:48:19 UTC
ca46647 Declare templatized config-map Dry run install check: ``` $ helm install --dry-run stuff ./worker NAME: stuff LAST DEPLOYED: Thu Sep 23 16:45:40 2021 NAMESPACE: default STATUS: pending-install REVISION: 1 TEST SUITE: None HOOKS: MANIFEST: --- ... ``` Related to T3592 29 September 2021, 15:48:19 UTC
3a2cef8 Drop unneeded files for now 29 September 2021, 15:48:18 UTC
ea9cff9 Bootstrap helm chart ``` $ helm create worker ``` Related to T3592 29 September 2021, 15:48:18 UTC
d8af145 check_consistency.py: Add support for releases 28 September 2021, 09:18:24 UTC
4dbfdc9 vault_repro.py: update to latest CLI 22 September 2021, 09:52:49 UTC
c9deabf add check_consistency.py 22 September 2021, 09:52:24 UTC
41c771d grid5000/cassandra: add statsd metrics dashboards and change the unit to ops Related to T3357 17 September 2021, 11:57:08 UTC
7ec3d29 grid5000/cassandra: allow to test scheduler db persistance locally Related to T3577 17 September 2021, 11:55:18 UTC
d4b4d44 grid5000/cassandra: allow to instantiate a best_effort node without an exclusion list Related to T3357 17 September 2021, 11:53:21 UTC
118ba1a grid5000/cassandra: declare a server type with only a big zfs dataset The goal is to prepare the persistence of the scheduler database across the g5k reservations Related to T3577 15 September 2021, 16:18:56 UTC
5002184 Compute the last swhid of an origin Related to T3192 08 September 2021, 14:02:30 UTC
30b06cc grid5000/cassandra: fix statsd configuration of gunicorn services Related to T3357 01 September 2021, 08:32:54 UTC
4e34b32 grid5000/cassandra: add the reservation date on the environment config file Related to T3357 30 August 2021, 10:49:45 UTC
0594610 grid5000/cassandra: Add a missing filter on the cluster on the cassandra dashboards Related to T3465 30 August 2021, 10:48:27 UTC
7a754f6 grid5000/cassandra: Add more nodes on the second datacenter Related to T3465 27 August 2021, 16:15:24 UTC
237c76e grid5000/cassandra: fix zfs configuration when only one dataset is used Related to T3465 27 August 2021, 12:25:37 UTC
79906e5 grid5000/cassandra: fix besteffort nodes deployment Related to T3357 27 August 2021, 12:24:54 UTC
4a4eaea grid5000/cassandra Adapt the script to support a multidc deployment Related to T3465 26 August 2021, 10:46:01 UTC
f4c8abe grid5000/cassandra: add a script to refresh the besteffort node list Related to T3357 18 August 2021, 08:30:23 UTC
db9574d grid5000/cassadra: declare the best effort nodes only when they are fully installed Related to T3357 18 August 2021, 08:30:22 UTC
19515af grid5000/cassadra: count best_effort jobs in waiting/launching state Related to T3357 18 August 2021, 08:30:22 UTC
a31433b grid5000/cassandra: replay extid topic Related to T3357 17 August 2021, 15:26:35 UTC
35813e5 grid5000/cassandra: adapt the number of replayers reduce the number of replayers for the topics completely consumed Related to T3357 17 August 2021, 15:02:35 UTC
33178f4 grid5000/cassandra: adapt number of consummers Reduce the number of consumers for the topics already replayed Related to T3357 16 August 2021, 10:23:50 UTC
41e8ee2 grid5000/cassandra: increase message size limit to allow revision replaying Related to T3357 16 August 2021, 10:22:33 UTC
a98c1f6 grid5000/cassandra: improbe cassandra monitoring To diagnose IO issues on one node: - Add disk space on system dashboard - Add the sst table count per node and per node and table on the cassandra dashboard Related to T3357 13 August 2021, 10:39:33 UTC
f1b4f18 vault_repro.py: Better formatting 13 August 2021, 07:54:42 UTC
1385b62 vault_repro.py: Fix git-clone command 13 August 2021, 07:54:42 UTC
3913245 vault_repro.py: Add support for packed refs 13 August 2021, 07:54:42 UTC
c0a25df kibana_fetch_logs: Update and make it work again It's wrongly named as it really just queries elasticsearch but it's fine for now. Related to T3468 06 August 2021, 15:21:19 UTC
f306dfd sentry/sentry: Fix indentation 06 August 2021, 15:21:19 UTC
326a9e3 grid5000/cassandra: save the temporary scylla configuration Related to T3357 06 August 2021, 15:17:46 UTC
78b96c8 grid5000/cassandra: add dashboards dedicated to scylla generated from https://github.com/scylladb/scylla-monitoring/ with the commande `./generate-dashboards.sh -v 4.4` Related to T3357 06 August 2021, 14:35:35 UTC
52b4eea grid5000/cassandra: add a dashboard to monitor the concurrent client connections Related to T3357 06 August 2021, 14:35:35 UTC
26f5d65 Improve correctness of sourceforge-ls Summary: Before this patch, these useful tools did not ignore duplicated projects in `/projects/` the namespace, did not account for non-`/p/` namespaces nor suprojects, as rare as they might both be. Reviewers: #reviewers, ardumont Reviewed By: #reviewers, ardumont Maniphest Tasks: T735 Differential Revision: https://forge.softwareheritage.org/D5294 05 August 2021, 15:20:47 UTC
34ceae3 Adapt schedule script to allow bitbucket mercurial ingestion Related to T3455 05 August 2021, 10:27:57 UTC
a96df2f grid5000/cassandra: add a script to cleanup the zfs pools to avoid side effect when the directory mapping is changes Related to T3357 03 August 2021, 12:28:27 UTC
c56684a grid5000/cassandra: quick and dirty script to manage best effort nodes Related to T3357 03 August 2021, 12:26:20 UTC
1e05681 grid5000/cassandra: extend the prometheus data retention Related to T3357 02 August 2021, 21:18:05 UTC
7801b4d grid5000/cassandra: remove a pre-configured debian repo with an expired key It breaks the apt upgrade performed by ansible Related to T3357 02 August 2021, 21:14:18 UTC
7aab9f0 Add vault_repro.py, a script to test round-tripping between the git loader and the vault 02 August 2021, 08:09:12 UTC
32a4440 summer_planning.py: don't mark a week as 'partial' when vacation starts on saturday 15 July 2021, 09:25:19 UTC
69f9de3 summer_planning.py: fix work-week computation (it's 5 days, not 6 or 1...) 15 July 2021, 09:23:37 UTC
3de7e11 summer_planning.py: fix false positives (like 'remote work' events) 15 July 2021, 09:19:40 UTC
0687a41 summer_planning.py: clarify what the dates are 15 July 2021, 08:53:31 UTC
3e67400 summer_planning.py: fix off-by-one error on week ends 15 July 2021, 08:47:48 UTC
97fbeaf Add summer_planning.py to visualize the summer planning 15 July 2021, 08:44:57 UTC
466a651 Increase the max commit log segment size to allow the import of big revisions 09 July 2021, 14:10:35 UTC
f4ac822 grid5000/cassandra: avoid oversize mutations on revisions and add a graph on the cassandra dashboard to monitor the behavior Related to T3357 08 July 2021, 11:57:06 UTC
5a50b5d grid5000/cassandra: Restart the monitoring containers and configure the service to run on a vms Related to T3357 07 July 2021, 17:31:16 UTC
bb9854f grid5000/cassandra: small updates on grafana dashboards Related to T3357 07 July 2021, 17:31:03 UTC
a397ec1 grid5000/cassandra: Use a more secure consistency level Related to T3357 06 July 2021, 17:47:39 UTC
68cf74d grid5000/cassandra: open unauthenticated access to cassandra Related to T3357 06 July 2021, 10:37:50 UTC
ae799c5 grid5000/cassandra: restore the initial number of replayers per object type Related to T3357 06 July 2021, 10:16:35 UTC
63cb436 grid5000/cassandra: Add a statistics dashboards It will be used to directly compute the result of a benchmark Related to T3357 06 July 2021, 10:03:12 UTC
8b61753 grid5000/cassandra: increase commit log size Related to T3357 06 July 2021, 10:02:22 UTC
d74a14b grid5000/cassandra: add annotations on graphs Related to T3357 05 July 2021, 13:26:18 UTC
80d3f83 grid5000/cassandra: add read/write statistics on the system dashboard Related to T3357 05 July 2021, 12:55:07 UTC
back to top