Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/pierrepo/article-SWH-bioinfo-fr
14 January 2025, 09:51:44 UTC
  • Code
  • Branches (1)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/main
    • f85d0db6f228b21fc0f5eb3a874bdfdcb0a2dac9
    No releases to show
  • 5cb6bb7
  • /
  • README.md
Raw File Download Save again
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge
swh:1:cnt:0031068d463b5f00daff561ee453c633ab065eca
origin badgedirectory badge
swh:1:dir:5cb6bb7ee827f722ac263c83caff357a87f2c8a8
origin badgerevision badge
swh:1:rev:f85d0db6f228b21fc0f5eb3a874bdfdcb0a2dac9
origin badgesnapshot badge
swh:1:snp:27ec900903020d311214f368f4ea2102c080fda2

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
(requires biblatex-software package)
Generating citation ...
Tip revision: f85d0db6f228b21fc0f5eb3a874bdfdcb0a2dac9 authored by Pierre Poulain on 14 January 2025, 09:50:27 UTC
Add manuscript v1
Tip revision: f85d0db
README.md
# Préparation d'un article sur Software Heritage pour Bioinfo-fr

Ce dépôt contient les ressources pour préparer un article de blog sur [Software Heritage](https://www.softwareheritage.org/) pour le site [Bioinfo-fr](https://bioinfo-fr.net/).


## Ressources

Vidéo source : [Tuto@Mate#64 Pierre Poulain présente Git et l'archive Software Heritage](https://www.youtube.com/watch?v=GjVrZbU0PB0)

L'[API Whisper de Groq](https://console.groq.com/docs/speech-text) supporte des fichiers audio au format mp3, mp4, wav... avec une taille maximale de 25 Mo.

Les modèles supportant le français sont `whisper-large-v3-turbo` et `whisper-large-v3`. Le dernier est un peu plus lent, mais fait aussi moins d'erreurs.

Groq recommande également de réduire la qualité du fichier à du mono en 16 000 Hz :

```bash
ffmpeg \
  -i <your file> \
  -ar 16000 \
  -ac 1 \
  -map 0:a: \
  <output file name>
```


## Préparation du fichier audio

Installer [Pixi](https://pixi.sh/) si besoin.

Télécharger le fichier audio de la vidéo :

```bash
pixi run yt-dlp -f 140 -o audio_full.m4a https://www.youtube.com/watch?v=GjVrZbU0PB0
```

Découper la partie intéressante, de 1:32:50 à 1:52:45 :

```bash
pixi run ffmpeg \
  -i audio_full.m4a \
  -ss 01:32:50 -to 01:52:45 \
  -c:a libmp3lame \
  audio.mp3
```

Passer en mono 16 000 Hz :

```bash
pixi run ffmpeg \
  -i audio.mp3 \
  -ar 16000 \
  -ac 1 \
  -map 0:a: \
  audio_clean.mp3
```

Vérifier que le fichier audio final a une taille inférieure à 25 Mo :

```bash
$ ls -lh audio*
-rw-rw-r-- 1 pierre pierre 3,5M déc.  30 11:34 audio_clean.mp3
-rw-rw-r-- 1 pierre pierre 122M oct.  11 17:03 audio_full.m4a
-rw-rw-r-- 1 pierre pierre  19M déc.  30 11:33 audio.mp3
```


## Transcription

Exporter la [clé d'API Groq](https://console.groq.com/keys) :

```bash
export GROQ_API_KEY=gsk_...
```

Lancer la transcription :

```bash
pixi run python transcript.py > audio_text.txt
```


## Préparation de l'article de blog

Prompt ChatGPT 4o :

> Organise le texte suivant sur Software Heritage sous la forme d'un article de blog à destination de bioinformaticiens. L'article doit être structuré et factuel. N'enjolive pas mais donne envie aux lecteurs d'archiver leur code dans Software Heritage :
> [contenu de audio_text.txt]

- [Réponse de ChatGPT](2024-12-30_chatgpt_text.md)
- [Version 1 après reprise et relecture](2025-01-14_v1.md)

back to top

Software Heritage — Copyright (C) 2015–2026, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, API— Content policy— Contact— JavaScript license information— Web API