Revision history - HEAD - origin: https://github.com/alumae/kaldi-offline-transcriber

visit type:

Revision	Author	Date	Message	Commit Date
0aec0b9	Tanel Alumäe	04 August 2022, 08:39:43 UTC	added --do-language-id option	04 August 2022, 08:39:43 UTC
928adff	Tanel Alumäe	29 June 2022, 09:43:19 UTC	Skip very short segments	29 June 2022, 09:43:19 UTC
adfa8de	Tanel Alumäe	27 May 2022, 12:06:00 UTC	Merge branch 'master' of github.com:alumae/kaldi-offline-transcriber	27 May 2022, 12:06:00 UTC
0ab0929	Tanel Alumäe	27 May 2022, 12:05:49 UTC	Avoid 0-length segments	27 May 2022, 12:05:49 UTC
9e0142b	Tanel Alumäe	26 May 2022, 06:21:46 UTC	Update README.md	26 May 2022, 06:21:46 UTC
8815439	Tanel Alumäe	26 May 2022, 06:21:16 UTC	Update README.md	26 May 2022, 06:21:16 UTC
9028b47	Tanel Alumäe	25 May 2022, 11:39:16 UTC	Removed SAD	25 May 2022, 11:39:16 UTC
1c2068f	Tanel Alumäe	24 May 2022, 13:19:08 UTC	Now uses Silero VAD for speech activity detection	24 May 2022, 13:19:08 UTC
51bfdb3	Tanel Alumäe	15 June 2021, 10:50:12 UTC	Lates updates	15 June 2021, 10:50:12 UTC
c88870b	Tanel Alumäe	14 June 2021, 07:10:31 UTC	Now works with recent changes	14 June 2021, 07:10:31 UTC
dbe5ad4	Tanel Alumäe	11 June 2021, 21:15:11 UTC	Some fixes	11 June 2021, 21:15:11 UTC
7dee822	Tanel Alumäe	11 June 2021, 14:29:23 UTC	Updating to newer versions, adding language ID	11 June 2021, 14:29:23 UTC
2ddfbb0	Tanel Alumäe	04 June 2021, 15:54:41 UTC	Bug fix	04 June 2021, 15:54:41 UTC
caadc22	Tanel Alumäe	22 February 2021, 13:30:07 UTC	Docker image now inherits from kaldi's Docker image	22 February 2021, 13:30:07 UTC
d3b8372	Tanel Alumäe	28 January 2021, 12:47:05 UTC	Update README.md Update name of the models file	28 January 2021, 12:47:05 UTC
14398f8	Tanel Alumäe	05 June 2019, 15:28:03 UTC	small fixes to speaker ID server stuff	05 June 2019, 15:28:03 UTC
9b74727	Tanel Alumäe	05 June 2019, 13:39:43 UTC	Merge branch 'master' of /home/tanel/devel/kaldi-offline-transcriber	05 June 2019, 13:39:43 UTC
82036ff	Tanel Alumäe	05 June 2019, 13:39:39 UTC	reverted some changes	05 June 2019, 13:39:39 UTC
2059fe1	Tanel Alumäe	05 June 2019, 13:38:37 UTC	Small changes related to words-to-numbers	05 June 2019, 13:38:37 UTC
918806d	Tanel Alumäe	05 June 2019, 13:28:58 UTC	Merge branch 'master' of /home/tanel/devel/kaldi-offline-transcriber	05 June 2019, 13:28:58 UTC
19dea83	Tanel Alumäe	05 June 2019, 13:19:47 UTC	Can now use external speaker ID server	05 June 2019, 13:19:47 UTC
e2dbf72	Tanel Alumäe	05 June 2019, 06:50:56 UTC	Riigikogu spetsiifilised muudatused	05 June 2019, 06:50:56 UTC
d64a8e7	Tanel Alumäe	08 January 2019, 14:11:05 UTC	Added Sync elements after each Turn beginning to make TSAB happy	08 January 2019, 14:11:05 UTC
dc16cd0	Tanel Alumäe	29 November 2018, 11:06:58 UTC	Added words-to-numbers conversion to appropriate places	29 November 2018, 11:06:58 UTC
a4b93d3	Tanel Alumäe	27 November 2018, 10:20:20 UTC	Program that converts words to numbers, using Pynini	27 November 2018, 10:20:20 UTC
6e84dbf	Tanel Alumäe	27 November 2018, 10:19:37 UTC	Program that postprocesses JSON-formatted transcript using an external program	27 November 2018, 10:19:37 UTC
79f0d38	Tanel Alumäe	22 November 2018, 15:33:13 UTC	added confidence scores to CTM and JSON result files	22 November 2018, 15:33:13 UTC
b980ff6	Tanel Alumäe	22 November 2018, 15:32:03 UTC	script for converting words to numbers using FST	22 November 2018, 15:32:03 UTC
ad6ff46	Tanel Alumäe	14 November 2018, 14:43:41 UTC	Avoid small gaps between turns	14 November 2018, 14:43:41 UTC
b1d10b8	Tanel Alumäe	14 November 2018, 14:11:35 UTC	Merge branch 'master' of github.com:alumae/kaldi-offline-transcriber	14 November 2018, 14:11:35 UTC
b03a80a	Tanel Alumäe	14 November 2018, 14:10:58 UTC	Fixed a Transcriber file format issue	14 November 2018, 14:10:58 UTC
142f0aa	Tanel Alumäe	01 November 2018, 15:14:47 UTC	Update README.md	01 November 2018, 15:14:47 UTC
97152d5	Tanel Alumäe	31 October 2018, 16:25:30 UTC	minor fixes	31 October 2018, 16:25:30 UTC
b67c07b	Tanel Alumäe	31 October 2018, 14:59:35 UTC	Some minor fixes	31 October 2018, 14:59:35 UTC
b656610	Tanel Alumäe	31 October 2018, 14:58:01 UTC	fix typo	31 October 2018, 14:58:01 UTC
f7dad1a	Tanel Alumäe	31 October 2018, 11:28:12 UTC	Refactored and introduced a new JSON format that holds all information about word and segment timings	31 October 2018, 11:28:12 UTC
fe0ab0f	Tanel Alumäe	31 October 2018, 11:04:16 UTC	Refactored and introduced a new JSON format that holds all information about word and segment timings	31 October 2018, 11:04:16 UTC
623d5c9	Tanel Alumäe	28 October 2018, 12:47:57 UTC	Update README.md	28 October 2018, 12:47:57 UTC
bcec748	Tanel Alumäe	21 October 2018, 18:27:20 UTC	Merge branch 'master' of github.com:alumae/kaldi-offline-transcriber	21 October 2018, 18:27:20 UTC
df8d5e8	Tanel Alumäe	21 October 2018, 18:26:43 UTC	much faster compounding now	21 October 2018, 18:26:43 UTC
39a19df	Tanel Alumäe	21 October 2018, 18:26:21 UTC	clean up temp file in src-audio	21 October 2018, 18:26:21 UTC
4962ba9	Tanel Alumäe	09 October 2018, 12:00:31 UTC	Update README.md	09 October 2018, 12:00:31 UTC
4b02ef5	Tanel Alumäe	09 October 2018, 11:55:44 UTC	Update README.md Changed public image URL.	09 October 2018, 11:55:44 UTC
80256af	Tanel Alumäe	04 October 2018, 07:15:32 UTC	delete wav from src-audio when cleaning up	04 October 2018, 07:15:32 UTC
c2c99b9	Tanel Alumäe	12 September 2018, 18:30:23 UTC	Updated SID models	12 September 2018, 18:30:23 UTC
19541b7	Tanel Alumäe	31 August 2018, 08:58:59 UTC	Added Dockerfile for pre-building Estonian system	31 August 2018, 08:58:59 UTC
09bc2a8	Tanel Alumäe	31 August 2018, 08:54:49 UTC	Added Dockerfile for pre-building Estonian system	31 August 2018, 08:54:49 UTC
9be252a	Tanel Alumäe	31 August 2018, 08:23:57 UTC	Added Dockerfile for pre-building Estonian system	31 August 2018, 08:23:57 UTC
61b16fd	Tanel Alumäe	31 August 2018, 08:22:32 UTC	Added Dockerfile for pre-building Estonian system	31 August 2018, 08:22:32 UTC
a532032	Tanel Alumäe	24 August 2018, 13:08:43 UTC	cleanup after creating decoding graph	24 August 2018, 13:08:43 UTC
7d511f1	Tanel Alumäe	24 August 2018, 09:04:38 UTC	cleanup after init	24 August 2018, 09:04:38 UTC
64b71d0	Tanel Alumäe	23 August 2018, 05:36:34 UTC	Some minor improvements	23 August 2018, 05:36:34 UTC
b2a457d	Tanel Alumäe	21 August 2018, 13:52:45 UTC	Speaker ID system now uses Kaldi's native i-vector scoring	21 August 2018, 13:52:45 UTC
c8fb6fe	Tanel Alumäe	09 August 2018, 12:52:10 UTC	Titlecase after questions	09 August 2018, 12:52:10 UTC
924aca4	Tanel Alumäe	09 August 2018, 11:01:01 UTC	removed reduntant option	09 August 2018, 11:01:01 UTC
25a5116	Tanel Alumäe	08 August 2018, 11:42:44 UTC	doc updates	08 August 2018, 11:42:44 UTC
8a71397	Tanel Alumäe	08 August 2018, 11:34:19 UTC	doc updates	08 August 2018, 11:34:19 UTC
3f3f731	Tanel Alumäe	08 August 2018, 10:33:23 UTC	Updates related to segments file	08 August 2018, 10:33:23 UTC
11bfbd9	Tanel Alumäe	08 August 2018, 08:56:53 UTC	Refactored -- now segments file is used, instead of splitting the wav ühysically to pieces	08 August 2018, 08:56:53 UTC
194e124	Tanel Alumäe	07 August 2018, 11:20:06 UTC	Fixes related to init	07 August 2018, 11:20:06 UTC
d5982b1	Tanel Alumäe	07 August 2018, 10:32:54 UTC	Now uses LM with special unk handling, and <unk> words can be reconstructed from pronuciation	07 August 2018, 10:32:54 UTC
5995e98	Tanel Alumäe	28 June 2017, 08:27:39 UTC	Use ffmpeg for decoding mp4 files	28 June 2017, 08:27:39 UTC
2cc26c6	Tanel Alumäe	15 June 2017, 06:58:07 UTC	Added support for SubRip subtitle files (.srt)	15 June 2017, 06:58:07 UTC
b4395b6	Tanel Alumäe	30 May 2017, 11:47:10 UTC	Now uses DNN-based speaker ID, trained in a weakly unsupervised manner. Requires Keras	30 May 2017, 11:47:10 UTC
59f935b	Tanel Alumäe	29 May 2017, 13:38:30 UTC	Now uses DNN-based speaker ID, trained in a weakly unsupervised manner. Requires Keras	29 May 2017, 13:38:30 UTC
8e5c513	Tanel Alumäe	29 May 2017, 13:37:00 UTC	mistakenly commited	29 May 2017, 13:37:00 UTC
81f77c6	Tanel Alumäe	29 May 2017, 13:15:27 UTC	Now uses DNN-based speaker ID, trained in a weakly unsupervised manner. Requires Keras	29 May 2017, 13:15:27 UTC
8050527	Tanel Alumäe	02 May 2017, 12:21:39 UTC	Replaced pyfst in compounder.py with OpenFst's native extension	02 May 2017, 12:21:39 UTC
c9cdfb6	Tanel Alumäe	02 May 2017, 12:20:29 UTC	Replaced pyfst in compounder.py with OpenFst's native extension	02 May 2017, 12:20:29 UTC
50cf17b	Tanel Alumäe	26 April 2017, 12:55:54 UTC	Fixes for chain model CTM generation, too long subtitles in SBV file, and out-of-memory errors for diarization of long audio files	26 April 2017, 12:55:54 UTC
c745dce	Tanel Alumäe	20 February 2017, 14:36:07 UTC	Fixes titlecasing bug	20 February 2017, 14:36:07 UTC
a36fa2d	Tanel Alumäe	16 February 2017, 08:13:54 UTC	Fixes for some cases when transcribing is restarted	16 February 2017, 08:13:54 UTC
323fa1a	Tanel Alumäe	13 February 2017, 12:34:52 UTC	Compatibiliy to user-set LD_LIBRARY_PATH	13 February 2017, 12:34:52 UTC
87ff3c1	Tanel Alumäe	13 February 2017, 12:19:12 UTC	Migrated to chain models, made python3 compatible	13 February 2017, 12:19:12 UTC
708bcfa	Tanel Alumäe	13 February 2017, 12:14:35 UTC	Compatibiliy to user-set LD_LIBRARY_PATH	13 February 2017, 12:14:35 UTC
e12288a	Tanel Alumäe	13 February 2017, 11:59:12 UTC	Migrated to chain models, made python3 compatible	13 February 2017, 11:59:12 UTC
17d5ec5	Tanel Alumäe	13 February 2017, 10:21:06 UTC	Migrated to chain models, made python3 compatible	13 February 2017, 10:21:06 UTC
066a5db	Tanel Alumäe	13 February 2017, 10:19:38 UTC	Migrated to chain models, made python3 compatible	13 February 2017, 10:19:38 UTC
86e3c32	Tanel Alumäe	30 December 2015, 01:06:21 UTC	Updated models	30 December 2015, 01:06:21 UTC
b3e0552	Tanel Alumäe	30 December 2015, 01:03:48 UTC	Updated models	30 December 2015, 01:03:48 UTC
780eae1	Tanel Alumäe	29 December 2015, 22:11:24 UTC	Compability fix	29 December 2015, 22:11:24 UTC
a90f62d	Tanel Alumäe	04 December 2015, 14:34:08 UTC	Cosmetic fixes	04 December 2015, 14:34:08 UTC
eb00e3b	Tanel Alumäe	25 November 2015, 17:01:06 UTC	mp3 files are converted now using ffmpeg which is more robust to exotic formats	25 November 2015, 17:01:06 UTC
bb7a3b4	Tanel Alumäe	08 September 2015, 15:20:47 UTC	Fix for an issue that caused speaker ID to be executed when it was turned off	08 September 2015, 15:20:47 UTC
6730923	Tanel Alumäe	14 May 2015, 11:28:27 UTC	About memory reqs	14 May 2015, 11:28:27 UTC
2bede6f	Tanel Alumäe	14 May 2015, 10:10:09 UTC	fix in .init	14 May 2015, 10:10:09 UTC
cb389ed	Tanel Alumäe	14 May 2015, 10:03:14 UTC	Cosmetic fixes	14 May 2015, 10:03:14 UTC
343405f	Tanel Alumäe	14 May 2015, 09:32:24 UTC	Removed option to decode with old-style nnet2 models, made decodining with online nnet2 models multithreaded	14 May 2015, 09:32:24 UTC
14f9ddf	Tanel Alumäe	16 March 2015, 12:59:10 UTC	bug fix	16 March 2015, 12:59:10 UTC
48ef75c	Tanel Alumäe	11 March 2015, 14:03:41 UTC	Updated LM	11 March 2015, 14:03:41 UTC
5300497	Tanel Alumäe	10 March 2015, 15:45:35 UTC	Clarified that the system is currently for Estonian	10 March 2015, 15:45:35 UTC
2feec82	Tanel Alumäe	06 March 2015, 09:01:59 UTC	small utf-8 related fix	06 March 2015, 09:01:59 UTC
babf185	Tanel Alumäe	05 March 2015, 10:52:18 UTC	titlecases words after .	05 March 2015, 10:52:18 UTC
9c4b7f2	Tanel Alumäe	03 March 2015, 18:59:42 UTC	small bug fix	03 March 2015, 18:59:42 UTC
594b21a	Tanel Alumäe	03 March 2015, 13:35:28 UTC	txt files now have (optional) punctuation	03 March 2015, 13:35:28 UTC
0b3f213	Tanel Alumäe	03 March 2015, 13:25:59 UTC	small bug fix	03 March 2015, 13:25:59 UTC
722dc37	Tanel Alumäe	03 March 2015, 13:10:25 UTC	bug fix	03 March 2015, 13:10:25 UTC
56db315	Tanel Alumäe	03 March 2015, 11:38:33 UTC	Integrated punctuation insertion module	03 March 2015, 11:38:33 UTC
425267f	Tanel Alumäe	29 December 2014, 14:47:55 UTC	Added optional support for automatic punctuation (NB! uses SRILM currently -- not for commercial use)	29 December 2014, 14:47:55 UTC
dc2107e	Tanel Alumäe	29 December 2014, 14:36:32 UTC	Added optional support for automatic punctuation (NB! uses SRILM currently -- not for commercial use)	29 December 2014, 14:36:32 UTC

Newer
Older