swh:1:snp:4c7110a28ae1b1b8c7e6881001ec9dfd75200920

sort by:
Revision Author Date Message Commit Date
cf77331 Upgrade counters architecture to handle the historical data management 26 March 2021, 11:21:53 UTC
7d2c47c migrate_users_to_keycloak: Adapt slightly default fields Related to T3166 23 March 2021, 14:10:53 UTC
5162e8e keycloak: Install users to keycloak instance Related to T2858 18 March 2021, 17:21:37 UTC
194a686 Add counters schemas sources 16 February 2021, 16:31:25 UTC
7718a0c Support unreadable messages the error should be avoidable by increasing the max messages length 05 February 2021, 08:20:24 UTC
931603f add a script to convert es indexes fields 05 February 2021, 08:20:24 UTC
f75b2e7 sql/blob-size-stats: scripts to compute basic stats about archive blob sizes 30 January 2021, 18:42:32 UTC
0fe3238 counters: batch redis calls 27 January 2021, 14:37:55 UTC
e107614 counters: add local counter to follow the message count 27 January 2021, 14:37:42 UTC
c5aa5b9 POC for counters with redis hyperloglog 25 January 2021, 11:42:31 UTC
3827da1 sf-ls-projects: drop obsolete commented out code 22 October 2020, 12:10:10 UTC
422bfaa add prototype SourceForge lister Related T735 22 October 2020, 11:12:20 UTC
05f46d6 Improve facts import - Add an initialization script - Import the puppet environment - Add a flag to keep the server was created by an initial import 29 September 2020, 13:41:55 UTC
f14410a netbox - import interfaces and ip addresses 25 September 2020, 16:08:50 UTC
8cc97ea initial commit of the netbox importer 24 September 2020, 17:24:14 UTC
c29d2cd reproduce-tarball: catch failure of 'disarchive save' earlier. 02 September 2020, 09:21:02 UTC
59eecb9 reproduce-tarball: black 02 September 2020, 09:20:35 UTC
5412c4b reproduce-tarball: add --catch-errors, to continue running if there is an error because disarchive may fail to save some tarballs, and I want to get stats on that. 02 September 2020, 09:19:54 UTC
68288cb reproduce-tarball: add --pattern, to expand a pattern of files internally with the archive I work with, the list of files is too large for the shell, so it errors if I let zsh expand the list. 02 September 2020, 09:19:03 UTC
f26d205 reproduce-tarball: add support for disarchive, as an alternative to pristine-tar. 28 August 2020, 20:10:59 UTC
22fe8e9 reproduce-tarball: fixes to work with the current versions of swh-loader-core/swh-model/swh-storage. 28 August 2020, 19:43:20 UTC
8dc4b03 group_by_extension: Sort output with order desc Also fallbacks on keying on basename when no extension is detected (for crates for example...) 08 August 2020, 18:14:34 UTC
8175175 Add group_by_extension helper utility 08 August 2020, 18:08:01 UTC
e9a26ad ardumont/schedule_csv_partition.py Related to scheduling of new indexer tasks 08 August 2020, 18:06:27 UTC
1558892 Adapt and rename cli to generate partition tasks 06 August 2020, 15:21:00 UTC
e7bb94e reproduce-tarball: fail early if the file is not an archive. 06 August 2020, 13:49:06 UTC
d49f000 reproduce-tarball: Use --lax-guess option of pristine-zip. 06 August 2020, 12:03:37 UTC
0800b67 reproduce-tarball: Show stats on the size of the delta. 06 August 2020, 12:03:23 UTC
d3760dc reproduce-tarball: add support for pristine-zip 05 August 2020, 18:02:46 UTC
e59e04e reproduce-tarball: don't count loading failures as irreproducibility 05 August 2020, 18:02:26 UTC
4ef44fc reproduce-tarball: add 'checkout' command, useful for debugging 05 August 2020, 18:00:58 UTC
a29d80c reproduce-tarball: add --fail-early option 05 August 2020, 18:00:31 UTC
22bc8aa reproduce-tarball: fix typo 04 August 2020, 10:54:16 UTC
02aaf68 reproduce-tarball: improve readability 04 August 2020, 10:09:41 UTC
5be0a79 reproduce-tarball: get rid of tarball headers, fully use pristine-tar. Now it works properly on all tarballs I tried. 04 August 2020, 10:06:04 UTC
3a8baa2 reproduce-tarball: attempt at reading and writing tarballs directly, without tarfile. 04 August 2020, 09:08:44 UTC
0b7af74 reproduce-tarball: attempt at using pristine-gz. 03 August 2020, 17:06:25 UTC
b8c348a reproduce-tarball: don't store files, it was the loader's job. 03 August 2020, 16:18:54 UTC
f46bc14 reproduce-tarball: pass magic number in metadata. 03 August 2020, 16:18:35 UTC
dfb9edd reproduce-tarball: add warning about sparse files 03 August 2020, 15:00:13 UTC
6addfa2 reproduce-tarball: fix --diffoscope 03 August 2020, 14:49:46 UTC
74fe8a9 reproduce-tarball: add 'many' subcommand 03 August 2020, 14:39:13 UTC
9a52fe1 reproduce-tarball: Switch to click 03 August 2020, 14:38:59 UTC
10ade7a reproduce-tarball: Move file comparison to its own function. 03 August 2020, 14:30:19 UTC
d80ab8c reproduce-tarball: Always compare before running diffoscope. To get a useful display even if it's missing. 03 August 2020, 14:28:07 UTC
e407bd8 reproduce-tarball: Make the target file arg optional, run diffoscope if it's not given. 03 August 2020, 14:25:32 UTC
2326c10 reproduce-tarball: add BufPreservingTarInfo, which doesn't decode and reencode the buffer. 03 August 2020, 14:21:10 UTC
89548c7 reproduce-tarball: break on success. 03 August 2020, 14:17:36 UTC
a4d4ab3 Start writing reproduce-tarball.py Naive implementation, can almost reproduce a tarball, but not all bytes match. 03 August 2020, 13:47:14 UTC
b8c98ac swh-team: Update shebang 14 April 2020, 08:40:16 UTC
37434e3 analyse_hash_collision: Deal with new exception format Related to T2332 03 April 2020, 17:01:13 UTC
f3f84aa analyse_hash_collision: Format 03 April 2020, 13:07:04 UTC
883a999 Ignore archives/ folder 03 April 2020, 13:06:58 UTC
170a3d9 Allow ctime/date reported by sentry comparison Related to T2332#42793 24 March 2020, 13:56:12 UTC
2ec39dd sentry: Distinguish between real and falsy collisions 24 March 2020, 11:48:10 UTC
369c999 sentry/analyse_hash_collision: Type contents + add a diff entry This allows to more clearly see the difference and distinguish betwee real collisions from "falsy" ones: - real: in "difference" entry, there are at least 2 algo in collisions (most likely 3 out of 4) - "falsy": in the "difference" entry, only 1 algo in collision. And the diff is an extra "25" which inserts itself somewhere in the hash 23 March 2020, 09:06:49 UTC
ba2a9b7 analyse_hash_collision: Add sample cli use 20 March 2020, 14:54:59 UTC
24905d9 sentry: Add copyright header 20 March 2020, 14:53:27 UTC
ff2230a sentry: Adds deps on storage 20 March 2020, 14:53:27 UTC
fc80c03 Analyse hash collisions sample 20 March 2020, 14:53:27 UTC
d3b2061 sentry: Resolve the pagination when needed 19 March 2020, 14:30:24 UTC
4eec410 sentry: Add events per issue listing 19 March 2020, 14:10:33 UTC
1742303 sentry: Add detailed issue cli entrypoint 19 March 2020, 13:47:57 UTC
2b305c0 sentry: Reduce the output on issue listing output 19 March 2020, 13:10:29 UTC
efa4688 sentry: Add a basic readme 19 March 2020, 12:51:31 UTC
21db7db sentry: Add list issues entry point 19 March 2020, 12:45:16 UTC
48fcb3f snippets/sentry: Update dependencies 19 March 2020, 12:45:06 UTC
d696012 sentry: Make the cli use subcommand and add docstring 19 March 2020, 09:53:34 UTC
64a7d27 Add basic sentry script to start listing issues from the cli 19 March 2020, 09:30:03 UTC
acf8045 Add script and instructions to generate the objstorage replay exclusion file. 03 March 2020, 13:50:39 UTC
60fd51e Log only every 60 seconds. 08 January 2020, 15:39:16 UTC
c96448d Update for new schema, with revision parents in their own table. 08 January 2020, 15:17:28 UTC
b491acd Connect to azure cluster 08 January 2020, 14:39:44 UTC
4e59433 Adapt for new signature of get_storage. 08 January 2020, 14:39:13 UTC
effc91b Export all tables and compress outputs. 08 November 2019, 12:07:45 UTC
5e3c13c Fetch larger pages. Better perfs 08 November 2019, 12:07:26 UTC
851dc76 Add support for all tables. 07 November 2019, 17:10:22 UTC
f8f92c2 Make cassandra_stream_graph.py parallelizable. 07 November 2019, 15:47:39 UTC
669dcde Add cassandra_stream_graph.py to export the graph from Cassandra to CSV 07 November 2019, 14:33:40 UTC
383a899 git2blobs: add support for tarring (+ deleting) object storage This is important to avoid running out of i-nodes on FSs that haven't been tuned for software heritage like workloads. As part of this, switch CLI parsing to click, as we now have more complicated usage patterns. 29 October 2019, 16:32:46 UTC
0fc3a54 git2blobs: avoid re-writing objects over and over again this optimization is only relevant when the same target object storage dir is used to process different repositories, but still 29 October 2019, 16:31:37 UTC
51714bb add GPL as default license, as one was missing individual snippets can override the license in their headers 29 October 2019, 16:02:35 UTC
a4bb8b7 git ignore mypy cache dir 27 October 2019, 17:08:03 UTC
923c8f6 git2filenames: new script to extract <sha1, filename> blob maps from git 27 October 2019, 17:07:09 UTC
e96958c git2blobs.py: add docstring 27 October 2019, 17:07:01 UTC
c059c9b git2blobs: new script to extract blobs from git repos 27 October 2019, 13:26:03 UTC
7e370dd gnu.analysis: Bootstrap some code to analyze the gnu tree dataset This is wip Related D2076 07 October 2019, 12:42:11 UTC
b6980b1 cran/analysis: Fix conditionals Related T2026#37637 03 October 2019, 19:37:24 UTC
0c3f015 cran/analysis: Add author/maintainer pattern analysis Related T2026#37637 03 October 2019, 19:30:30 UTC
47dae37 ardumont/cran/analysis: Add some tools to analyse cran dataset Related T2026#37637 03 October 2019, 17:54:22 UTC
9c03bef ardumont/kibana_fetch_logs: Improve fetching logs routine 30 September 2019, 17:16:59 UTC
c75e96e Add snippet to allow synchroneous listing of a github user's repos Use case: ``` for r in $(python3 bin/list_repos.py --user SocialGouv); do python3 -m swh.loader.git.loader --origin-url $r done ``` 30 September 2019, 14:12:09 UTC
2e3efce graph export: better handle snp/rel pointing to various object types 27 September 2019, 11:16:07 UTC
526bd01 SQL graph export: actually use a fifo, rather than a regular file (oops) 26 August 2019, 09:24:00 UTC
3e4f5ff SQL graph export: remove ORDER BY to be more memory savvy this change make the export of a swh-storage containining the linux kernel doable on ordinary laptop 26 August 2019, 08:33:36 UTC
4e612e8 SQL snippet: add export of the archive as a graph for development use only 25 August 2019, 16:20:09 UTC
f7c08d3 seirl: parquet upload: fix script and schema 22 July 2019, 15:51:26 UTC
0e8cf90 add code of conduct document 11 July 2019, 14:29:30 UTC
d524262 check-contributors: add missing team members 04 July 2019, 12:41:55 UTC
1268eca check-contributors: make the CLI more useful in particular, it now supports both running on the current git working directory (the default) and specifying a list of git working directories to analyze one after the other 04 July 2019, 12:29:29 UTC
back to top