05f46d6 | Vincent SELLIER | 29 September 2020, 13:41:55 UTC | Improve facts import - Add an initialization script - Import the puppet environment - Add a flag to keep the server was created by an initial import | 29 September 2020, 13:41:55 UTC |
f14410a | Vincent SELLIER | 25 September 2020, 16:08:50 UTC | netbox - import interfaces and ip addresses | 25 September 2020, 16:08:50 UTC |
8cc97ea | Vincent SELLIER | 24 September 2020, 17:24:14 UTC | initial commit of the netbox importer | 24 September 2020, 17:24:14 UTC |
c29d2cd | Valentin Lorentz | 02 September 2020, 09:21:02 UTC | reproduce-tarball: catch failure of 'disarchive save' earlier. | 02 September 2020, 09:21:02 UTC |
59eecb9 | Valentin Lorentz | 02 September 2020, 09:20:35 UTC | reproduce-tarball: black | 02 September 2020, 09:20:35 UTC |
5412c4b | Valentin Lorentz | 02 September 2020, 09:19:54 UTC | reproduce-tarball: add --catch-errors, to continue running if there is an error because disarchive may fail to save some tarballs, and I want to get stats on that. | 02 September 2020, 09:19:54 UTC |
68288cb | Valentin Lorentz | 02 September 2020, 09:19:03 UTC | reproduce-tarball: add --pattern, to expand a pattern of files internally with the archive I work with, the list of files is too large for the shell, so it errors if I let zsh expand the list. | 02 September 2020, 09:19:03 UTC |
f26d205 | Valentin Lorentz | 28 August 2020, 20:10:59 UTC | reproduce-tarball: add support for disarchive, as an alternative to pristine-tar. | 28 August 2020, 20:10:59 UTC |
22fe8e9 | Valentin Lorentz | 28 August 2020, 19:43:20 UTC | reproduce-tarball: fixes to work with the current versions of swh-loader-core/swh-model/swh-storage. | 28 August 2020, 19:43:20 UTC |
8dc4b03 | Antoine R. Dumont (@ardumont) | 08 August 2020, 18:14:34 UTC | group_by_extension: Sort output with order desc Also fallbacks on keying on basename when no extension is detected (for crates for example...) | 08 August 2020, 18:14:34 UTC |
8175175 | Antoine R. Dumont (@ardumont) | 08 August 2020, 18:08:01 UTC | Add group_by_extension helper utility | 08 August 2020, 18:08:01 UTC |
e9a26ad | Antoine R. Dumont (@ardumont) | 08 August 2020, 18:06:27 UTC | ardumont/schedule_csv_partition.py Related to scheduling of new indexer tasks | 08 August 2020, 18:06:27 UTC |
1558892 | Antoine R. Dumont (@ardumont) | 06 August 2020, 15:16:24 UTC | Adapt and rename cli to generate partition tasks | 06 August 2020, 15:21:00 UTC |
e7bb94e | Valentin Lorentz | 06 August 2020, 12:03:58 UTC | reproduce-tarball: fail early if the file is not an archive. | 06 August 2020, 13:49:06 UTC |
d49f000 | Valentin Lorentz | 06 August 2020, 12:02:54 UTC | reproduce-tarball: Use --lax-guess option of pristine-zip. | 06 August 2020, 12:03:37 UTC |
0800b67 | Valentin Lorentz | 06 August 2020, 09:54:22 UTC | reproduce-tarball: Show stats on the size of the delta. | 06 August 2020, 12:03:23 UTC |
d3760dc | Valentin Lorentz | 05 August 2020, 18:02:46 UTC | reproduce-tarball: add support for pristine-zip | 05 August 2020, 18:02:46 UTC |
e59e04e | Valentin Lorentz | 05 August 2020, 18:01:48 UTC | reproduce-tarball: don't count loading failures as irreproducibility | 05 August 2020, 18:02:26 UTC |
4ef44fc | Valentin Lorentz | 05 August 2020, 18:00:58 UTC | reproduce-tarball: add 'checkout' command, useful for debugging | 05 August 2020, 18:00:58 UTC |
a29d80c | Valentin Lorentz | 05 August 2020, 18:00:31 UTC | reproduce-tarball: add --fail-early option | 05 August 2020, 18:00:31 UTC |
22bc8aa | Valentin Lorentz | 04 August 2020, 10:54:16 UTC | reproduce-tarball: fix typo | 04 August 2020, 10:54:16 UTC |
02aaf68 | Valentin Lorentz | 04 August 2020, 10:09:41 UTC | reproduce-tarball: improve readability | 04 August 2020, 10:09:41 UTC |
5be0a79 | Valentin Lorentz | 04 August 2020, 10:06:04 UTC | reproduce-tarball: get rid of tarball headers, fully use pristine-tar. Now it works properly on all tarballs I tried. | 04 August 2020, 10:06:04 UTC |
3a8baa2 | Valentin Lorentz | 04 August 2020, 09:08:44 UTC | reproduce-tarball: attempt at reading and writing tarballs directly, without tarfile. | 04 August 2020, 09:08:44 UTC |
0b7af74 | Valentin Lorentz | 03 August 2020, 17:06:25 UTC | reproduce-tarball: attempt at using pristine-gz. | 03 August 2020, 17:06:25 UTC |
b8c348a | Valentin Lorentz | 03 August 2020, 16:18:54 UTC | reproduce-tarball: don't store files, it was the loader's job. | 03 August 2020, 16:18:54 UTC |
f46bc14 | Valentin Lorentz | 03 August 2020, 16:18:35 UTC | reproduce-tarball: pass magic number in metadata. | 03 August 2020, 16:18:35 UTC |
dfb9edd | Valentin Lorentz | 03 August 2020, 15:00:13 UTC | reproduce-tarball: add warning about sparse files | 03 August 2020, 15:00:13 UTC |
6addfa2 | Valentin Lorentz | 03 August 2020, 14:49:46 UTC | reproduce-tarball: fix --diffoscope | 03 August 2020, 14:49:46 UTC |
74fe8a9 | Valentin Lorentz | 03 August 2020, 14:39:13 UTC | reproduce-tarball: add 'many' subcommand | 03 August 2020, 14:39:13 UTC |
9a52fe1 | Valentin Lorentz | 03 August 2020, 14:37:03 UTC | reproduce-tarball: Switch to click | 03 August 2020, 14:38:59 UTC |
10ade7a | Valentin Lorentz | 03 August 2020, 14:30:19 UTC | reproduce-tarball: Move file comparison to its own function. | 03 August 2020, 14:30:19 UTC |
d80ab8c | Valentin Lorentz | 03 August 2020, 14:28:07 UTC | reproduce-tarball: Always compare before running diffoscope. To get a useful display even if it's missing. | 03 August 2020, 14:28:07 UTC |
e407bd8 | Valentin Lorentz | 03 August 2020, 14:25:32 UTC | reproduce-tarball: Make the target file arg optional, run diffoscope if it's not given. | 03 August 2020, 14:25:32 UTC |
2326c10 | Valentin Lorentz | 03 August 2020, 14:21:10 UTC | reproduce-tarball: add BufPreservingTarInfo, which doesn't decode and reencode the buffer. | 03 August 2020, 14:21:10 UTC |
89548c7 | Valentin Lorentz | 03 August 2020, 14:17:36 UTC | reproduce-tarball: break on success. | 03 August 2020, 14:17:36 UTC |
a4d4ab3 | Valentin Lorentz | 03 August 2020, 13:47:14 UTC | Start writing reproduce-tarball.py Naive implementation, can almost reproduce a tarball, but not all bytes match. | 03 August 2020, 13:47:14 UTC |
b8c98ac | Antoine R. Dumont (@ardumont) | 14 April 2020, 08:40:16 UTC | swh-team: Update shebang | 14 April 2020, 08:40:16 UTC |
37434e3 | Antoine R. Dumont (@ardumont) | 03 April 2020, 16:57:22 UTC | analyse_hash_collision: Deal with new exception format Related to T2332 | 03 April 2020, 17:01:13 UTC |
f3f84aa | Antoine R. Dumont (@ardumont) | 03 April 2020, 13:07:04 UTC | analyse_hash_collision: Format | 03 April 2020, 13:07:04 UTC |
883a999 | Antoine R. Dumont (@ardumont) | 03 April 2020, 13:06:58 UTC | Ignore archives/ folder | 03 April 2020, 13:06:58 UTC |
170a3d9 | Antoine R. Dumont (@ardumont) | 24 March 2020, 13:56:12 UTC | Allow ctime/date reported by sentry comparison Related to T2332#42793 | 24 March 2020, 13:56:12 UTC |
2ec39dd | Antoine R. Dumont (@ardumont) | 24 March 2020, 11:48:10 UTC | sentry: Distinguish between real and falsy collisions | 24 March 2020, 11:48:10 UTC |
369c999 | Antoine R. Dumont (@ardumont) | 23 March 2020, 09:06:49 UTC | sentry/analyse_hash_collision: Type contents + add a diff entry This allows to more clearly see the difference and distinguish betwee real collisions from "falsy" ones: - real: in "difference" entry, there are at least 2 algo in collisions (most likely 3 out of 4) - "falsy": in the "difference" entry, only 1 algo in collision. And the diff is an extra "25" which inserts itself somewhere in the hash | 23 March 2020, 09:06:49 UTC |
ba2a9b7 | Antoine R. Dumont (@ardumont) | 20 March 2020, 14:54:59 UTC | analyse_hash_collision: Add sample cli use | 20 March 2020, 14:54:59 UTC |
24905d9 | Antoine R. Dumont (@ardumont) | 20 March 2020, 14:51:13 UTC | sentry: Add copyright header | 20 March 2020, 14:53:27 UTC |
ff2230a | Antoine R. Dumont (@ardumont) | 20 March 2020, 13:46:08 UTC | sentry: Adds deps on storage | 20 March 2020, 14:53:27 UTC |
fc80c03 | Antoine R. Dumont (@ardumont) | 19 March 2020, 16:20:43 UTC | Analyse hash collisions sample | 20 March 2020, 14:53:27 UTC |
d3b2061 | Antoine R. Dumont (@ardumont) | 19 March 2020, 14:30:24 UTC | sentry: Resolve the pagination when needed | 19 March 2020, 14:30:24 UTC |
4eec410 | Antoine R. Dumont (@ardumont) | 19 March 2020, 14:10:33 UTC | sentry: Add events per issue listing | 19 March 2020, 14:10:33 UTC |
1742303 | Antoine R. Dumont (@ardumont) | 19 March 2020, 13:16:40 UTC | sentry: Add detailed issue cli entrypoint | 19 March 2020, 13:47:57 UTC |
2b305c0 | Antoine R. Dumont (@ardumont) | 19 March 2020, 13:10:29 UTC | sentry: Reduce the output on issue listing output | 19 March 2020, 13:10:29 UTC |
efa4688 | Antoine R. Dumont (@ardumont) | 19 March 2020, 12:51:31 UTC | sentry: Add a basic readme | 19 March 2020, 12:51:31 UTC |
21db7db | Antoine R. Dumont (@ardumont) | 19 March 2020, 12:45:16 UTC | sentry: Add list issues entry point | 19 March 2020, 12:45:16 UTC |
48fcb3f | Antoine R. Dumont (@ardumont) | 19 March 2020, 12:45:06 UTC | snippets/sentry: Update dependencies | 19 March 2020, 12:45:06 UTC |
d696012 | Antoine R. Dumont (@ardumont) | 19 March 2020, 09:51:38 UTC | sentry: Make the cli use subcommand and add docstring | 19 March 2020, 09:53:34 UTC |
64a7d27 | Antoine R. Dumont (@ardumont) | 19 March 2020, 09:22:22 UTC | Add basic sentry script to start listing issues from the cli | 19 March 2020, 09:30:03 UTC |
acf8045 | Valentin Lorentz | 03 March 2020, 13:50:39 UTC | Add script and instructions to generate the objstorage replay exclusion file. | 03 March 2020, 13:50:39 UTC |
60fd51e | Valentin Lorentz | 08 January 2020, 15:39:16 UTC | Log only every 60 seconds. | 08 January 2020, 15:39:16 UTC |
c96448d | Valentin Lorentz | 08 January 2020, 14:44:00 UTC | Update for new schema, with revision parents in their own table. | 08 January 2020, 15:17:28 UTC |
b491acd | Valentin Lorentz | 08 January 2020, 14:39:37 UTC | Connect to azure cluster | 08 January 2020, 14:39:44 UTC |
4e59433 | Valentin Lorentz | 08 January 2020, 14:39:13 UTC | Adapt for new signature of get_storage. | 08 January 2020, 14:39:13 UTC |
effc91b | Valentin Lorentz | 08 November 2019, 12:07:45 UTC | Export all tables and compress outputs. | 08 November 2019, 12:07:45 UTC |
5e3c13c | Valentin Lorentz | 08 November 2019, 12:07:26 UTC | Fetch larger pages. Better perfs | 08 November 2019, 12:07:26 UTC |
851dc76 | Valentin Lorentz | 07 November 2019, 17:10:22 UTC | Add support for all tables. | 07 November 2019, 17:10:22 UTC |
f8f92c2 | Valentin Lorentz | 07 November 2019, 15:47:39 UTC | Make cassandra_stream_graph.py parallelizable. | 07 November 2019, 15:47:39 UTC |
669dcde | Valentin Lorentz | 07 November 2019, 14:33:40 UTC | Add cassandra_stream_graph.py to export the graph from Cassandra to CSV | 07 November 2019, 14:33:40 UTC |
383a899 | Stefano Zacchiroli | 29 October 2019, 16:32:46 UTC | git2blobs: add support for tarring (+ deleting) object storage This is important to avoid running out of i-nodes on FSs that haven't been tuned for software heritage like workloads. As part of this, switch CLI parsing to click, as we now have more complicated usage patterns. | 29 October 2019, 16:32:46 UTC |
0fc3a54 | Stefano Zacchiroli | 29 October 2019, 16:31:37 UTC | git2blobs: avoid re-writing objects over and over again this optimization is only relevant when the same target object storage dir is used to process different repositories, but still | 29 October 2019, 16:31:37 UTC |
51714bb | Stefano Zacchiroli | 29 October 2019, 16:02:35 UTC | add GPL as default license, as one was missing individual snippets can override the license in their headers | 29 October 2019, 16:02:35 UTC |
a4bb8b7 | Stefano Zacchiroli | 27 October 2019, 17:08:03 UTC | git ignore mypy cache dir | 27 October 2019, 17:08:03 UTC |
923c8f6 | Stefano Zacchiroli | 27 October 2019, 17:07:09 UTC | git2filenames: new script to extract <sha1, filename> blob maps from git | 27 October 2019, 17:07:09 UTC |
e96958c | Stefano Zacchiroli | 27 October 2019, 17:07:01 UTC | git2blobs.py: add docstring | 27 October 2019, 17:07:01 UTC |
c059c9b | Stefano Zacchiroli | 27 October 2019, 13:26:03 UTC | git2blobs: new script to extract blobs from git repos | 27 October 2019, 13:26:03 UTC |
7e370dd | Antoine R. Dumont (@ardumont) | 05 October 2019, 16:55:16 UTC | gnu.analysis: Bootstrap some code to analyze the gnu tree dataset This is wip Related D2076 | 07 October 2019, 12:42:11 UTC |
b6980b1 | Antoine R. Dumont (@ardumont) | 03 October 2019, 19:37:24 UTC | cran/analysis: Fix conditionals Related T2026#37637 | 03 October 2019, 19:37:24 UTC |
0c3f015 | Antoine R. Dumont (@ardumont) | 03 October 2019, 19:29:40 UTC | cran/analysis: Add author/maintainer pattern analysis Related T2026#37637 | 03 October 2019, 19:30:30 UTC |
47dae37 | Antoine R. Dumont (@ardumont) | 03 October 2019, 17:54:22 UTC | ardumont/cran/analysis: Add some tools to analyse cran dataset Related T2026#37637 | 03 October 2019, 17:54:22 UTC |
9c03bef | Antoine R. Dumont (@ardumont) | 30 September 2019, 17:16:59 UTC | ardumont/kibana_fetch_logs: Improve fetching logs routine | 30 September 2019, 17:16:59 UTC |
c75e96e | Antoine R. Dumont (@ardumont) | 30 September 2019, 14:12:09 UTC | Add snippet to allow synchroneous listing of a github user's repos Use case: ``` for r in $(python3 bin/list_repos.py --user SocialGouv); do python3 -m swh.loader.git.loader --origin-url $r done ``` | 30 September 2019, 14:12:09 UTC |
2e3efce | Stefano Zacchiroli | 27 September 2019, 11:16:07 UTC | graph export: better handle snp/rel pointing to various object types | 27 September 2019, 11:16:07 UTC |
526bd01 | Stefano Zacchiroli | 26 August 2019, 09:24:00 UTC | SQL graph export: actually use a fifo, rather than a regular file (oops) | 26 August 2019, 09:24:00 UTC |
3e4f5ff | Stefano Zacchiroli | 26 August 2019, 08:33:36 UTC | SQL graph export: remove ORDER BY to be more memory savvy this change make the export of a swh-storage containining the linux kernel doable on ordinary laptop | 26 August 2019, 08:33:36 UTC |
4e612e8 | Stefano Zacchiroli | 25 August 2019, 16:20:09 UTC | SQL snippet: add export of the archive as a graph for development use only | 25 August 2019, 16:20:09 UTC |
f7c08d3 | Antoine Pietri | 22 July 2019, 15:51:15 UTC | seirl: parquet upload: fix script and schema | 22 July 2019, 15:51:26 UTC |
0e8cf90 | Stefano Zacchiroli | 11 July 2019, 14:29:30 UTC | add code of conduct document | 11 July 2019, 14:29:30 UTC |
d524262 | Stefano Zacchiroli | 04 July 2019, 12:41:55 UTC | check-contributors: add missing team members | 04 July 2019, 12:41:55 UTC |
1268eca | Stefano Zacchiroli | 04 July 2019, 12:29:29 UTC | check-contributors: make the CLI more useful in particular, it now supports both running on the current git working directory (the default) and specifying a list of git working directories to analyze one after the other | 04 July 2019, 12:29:29 UTC |
288055f | Stefano Zacchiroli | 28 June 2019, 08:15:43 UTC | check-contributors: new script to check CONTRIBUTORS completeness | 28 June 2019, 08:15:43 UTC |
e51ca4c | François Tigeot | 19 April 2019, 13:32:52 UTC | Grafanalib dashboards: Add disk statistics | 19 April 2019, 13:32:52 UTC |
cde445d | François Tigeot | 15 April 2019, 09:17:21 UTC | Grafanalib dashboards: Add swap and network traffic graphs | 15 April 2019, 09:17:55 UTC |
ce7403f | François Tigeot | 10 April 2019, 14:26:21 UTC | Snippets: add Grafanalib dashboards | 10 April 2019, 14:26:21 UTC |
789a301 | Stefano Zacchiroli | 25 March 2019, 09:28:32 UTC | swh-monthly-report: filter on committer date to avoid stopping prematurely on out-of-order commits | 28 March 2019, 21:01:54 UTC |
22f4ce1 | Stefano Zacchiroli | 22 March 2019, 14:59:34 UTC | swh-monthly-report: helper script to draft monthly activity team reports | 28 March 2019, 21:01:54 UTC |
b1f0b7b | Stefano Zacchiroli | 28 March 2019, 20:58:17 UTC | swh-weekly-report: preserve iterators and port to current Phabricator | 28 March 2019, 20:58:17 UTC |
9e6ad5f | Stefano Zacchiroli | 25 March 2019, 09:27:54 UTC | swh-weekly-report: filter on committer date to avoid stopping prematurely on out-of-order commits | 25 March 2019, 09:27:54 UTC |
b5db881 | Stefano Zacchiroli | 22 March 2019, 14:56:18 UTC | swhphab.py: include status when printing task summaries | 22 March 2019, 14:56:18 UTC |
bd77aa2 | Stefano Zacchiroli | 22 March 2019, 14:55:55 UTC | swhphab.py: do not crash when printing summary of repo-less diffs | 22 March 2019, 14:55:55 UTC |
f37fb0a | Stefano Zacchiroli | 22 March 2019, 09:33:34 UTC | swh-weekly-report: further refactoring/clean-up against swhphab.py | 22 March 2019, 09:33:34 UTC |
98373da | Stefano Zacchiroli | 22 March 2019, 09:22:14 UTC | swh-weekly-report: split generic code to swhphab.py | 22 March 2019, 09:22:14 UTC |