ac681ac | Gábor Recski | 18 January 2016, 08:52:12 UTC | Update README.md | 18 January 2016, 08:52:12 UTC |
e170f4f | Gabor Recski | 22 December 2014, 13:53:08 UTC | adding empty directory "models" (using a gitignore that ignores everything except itself) new file: models/.gitignore | 22 December 2014, 13:53:08 UTC |
103a7da | Gabor Recski | 17 September 2014, 07:29:05 UTC | Merge branch 'input-dir' | 17 September 2014, 07:29:05 UTC |
481439a | Gabor Recski | 17 September 2014, 07:28:22 UTC | scripts and configs new file: configs/hunchunk_eng.cfg new file: configs/hunchunk_w_lemma.cfg new file: scripts/mark_errors.py new file: scripts/print_bracketing.py | 17 September 2014, 07:28:22 UTC |
e630998 | Gábor Recski | 28 April 2014, 12:28:03 UTC | Update README.md | 28 April 2014, 12:28:03 UTC |
bd6b551 | Gábor Recski | 28 April 2014, 11:27:37 UTC | Update README.md | 28 April 2014, 11:27:37 UTC |
77d6cd9 | Gábor Recski | 28 April 2014, 11:23:04 UTC | Update README.md | 28 April 2014, 11:23:04 UTC |
78f6e86 | Gábor Recski | 28 April 2014, 11:21:44 UTC | Update README.md | 28 April 2014, 11:21:44 UTC |
a922f1d | Gábor Recski | 28 April 2014, 11:17:50 UTC | Merge pull request #6 from gabor-recski/patch-1 Update README | 28 April 2014, 11:17:50 UTC |
f593d62 | Gabor Recski | 28 April 2014, 11:15:29 UTC | Update README.md | 28 April 2014, 11:15:29 UTC |
6cdca39 | Gabor Recski | 28 April 2014, 11:14:50 UTC | Rename README to README.md | 28 April 2014, 11:14:50 UTC |
0b324df | Gabor Recski | 28 April 2014, 11:13:24 UTC | Update README | 28 April 2014, 11:13:24 UTC |
b261fee | Attila Zséder | 04 March 2014, 09:56:11 UTC | Merge pull request #5 from recski/input-dir added feature that allows to process all file in a directory while loading the models only once | 04 March 2014, 09:56:11 UTC |
afd2066 | Gabor Recski | 03 March 2014, 18:58:55 UTC | added feature that allows to process all file in a directory while loading the models only once also made the three source files conform strictly to PEP8 (via flakes8) modified: huntag.py modified: tagger.py modified: tools.py | 03 March 2014, 18:58:55 UTC |
0bc252d | Gabor Recski | 08 August 2013, 09:42:29 UTC | the previous commit had a bug, this fixes it modified: tools.py | 08 August 2013, 09:42:29 UTC |
c67a974 | Gabor Recski | 02 August 2013, 08:37:49 UTC | added support for comment lines in input. TSV files passed to huntag may now have lines starting with """ before each sentence. These will be disregarded when training and included in the output whenn tagging. modified: huntag.py modified: tagger.py modified: tools.py modified: trainer.py | 02 August 2013, 08:37:49 UTC |
f592070 | Gabor Recski | 12 June 2013, 14:43:54 UTC | added description of how to patch liblinear | 12 June 2013, 14:43:54 UTC |
17f485d | Gabor Recski | 12 June 2013, 14:38:00 UTC | patch for liblinear python bindings to enable ctypes (necessary, see README)[B new file: liblinear.patch new file: liblinear.patch | 12 June 2013, 14:39:38 UTC |
b8cf0b3 | Gabor Recski | 27 February 2013, 09:39:29 UTC | bugfix: if tagging is invoked with -i option, huntag will no longer look fo a config file modified: huntag.py | 27 February 2013, 09:39:29 UTC |
c1787ac | Gabor Recski | 21 February 2013, 15:27:40 UTC | trainer.getEvents didn't write tags to outFeatFiles (making them useless). Now it does. modified: trainer.py | 21 February 2013, 15:27:40 UTC |
ec7611c | Gabor Recski | 19 February 2013, 10:48:34 UTC | new file: conll2bie1.py | 19 February 2013, 10:48:34 UTC |
44d68c9 | Gabor Recski | 13 February 2013, 17:25:00 UTC | the -i option is now available when tagging, i.e. features can be supplied directly to the tagger (output is then a single column of tags) For this purpose, the outFeatFile created before training contains empty lines between sentences. modified: huntag.py modified: tagger.py modified: trainer.py | 13 February 2013, 17:25:00 UTC |
50843b5 | Gabor Recski | 23 January 2013, 18:12:36 UTC | huntag now allows training from a feature file (e.g. one saved using the -f command). To ensure identical output on each run, the feature list of each token is sorted before the conversion to integers. modified: huntag.py modified: trainer.py | 23 January 2013, 18:12:36 UTC |
c1c902b | Gabor Recski | 12 October 2012, 09:45:18 UTC | got rid of all import * expressions | 12 October 2012, 09:45:18 UTC |
2d42d46 | Gabor Recski | 10 October 2012, 08:17:58 UTC | some leftover tabs changed to spaces | 10 October 2012, 08:17:58 UTC |
c879666 | Gabor Recski | 03 October 2012, 15:16:36 UTC | feature_select is Kata's set of tools which I slightly modified and use to get top feature weights The code needs review | 03 October 2012, 15:16:36 UTC |
522a35c | Gabor Recski | 12 September 2012, 12:54:42 UTC | trainer.writeFeats is now compatible with the new way we store contexts small change to the way output is printed, this makes the langtools wrapper work, we don't know why | 12 September 2012, 12:54:42 UTC |
2eba8a7 | Gabor Recski | 30 August 2012, 12:58:21 UTC | deleted by accident, needed for command-line use of eval.py | 30 August 2012, 12:58:21 UTC |
d136e90 | pajkossy | 30 August 2012, 12:56:00 UTC | fixed bug | 30 August 2012, 12:56:00 UTC |
a2b5b3b | Gabor Recski | 30 August 2012, 12:42:15 UTC | an ugly hack to make evalInput compatible with sentenceIterator output | 30 August 2012, 12:42:15 UTC |
48df104 | Gabor Recski | 30 August 2012, 10:46:01 UTC | forgot to remove memory profiling... | 30 August 2012, 10:46:01 UTC |
a3564f1 | Gabor Recski | 30 August 2012, 09:58:21 UTC | trainer.context now contain tuples of c_int-s, to save memory. This assumes a customized version of liblinear.py | 30 August 2012, 09:58:21 UTC |
0effcf4 | Gabor Recski | 28 August 2012, 15:30:07 UTC | small ugly function lets users run basic eval.py functionality from external program | 28 August 2012, 15:30:07 UTC |
185b21c | Gabor Recski | 22 August 2012, 09:54:19 UTC | Tagger.init now expects two BookKeeper instances as the value of the options labelCounter and featCounter, which are created in main_tag Tagger.tag doesn't print anymore, it's a generator that yields tagged sentences. huntag.tag uses tools.writeSentence to print | 22 August 2012, 09:54:19 UTC |
0f92402 | Gabor Recski | 21 August 2012, 15:58:46 UTC | Trainers and taggers now expect their options in a dictionary. huntag.py gets this from the optionsParser Values() object using vars() | 21 August 2012, 15:58:46 UTC |
592d619 | Gabor Recski | 21 August 2012, 15:16:12 UTC | Trainer.init now has a seperate argument for the usedFeats list, which must be an iterable (instead of a file path). Accordingly, if a usedFeats file is specified, it is now converted into an iterable before being passed to Trainer. | 21 August 2012, 15:16:12 UTC |
007b5f3 | Gabor Recski | 21 August 2012, 11:32:06 UTC | basic scripts for handling corpora | 21 August 2012, 11:32:06 UTC |
f374fbf | Gabor Recski | 15 August 2012, 10:04:11 UTC | global cutoff implemented, invoked by -o N cutoff will first run on the feature bookkeeper, which handles counting. the particular list of training events (trainer.contexts) is only then reduced. | 15 August 2012, 10:04:11 UTC |
5140ff8 | Gabor Recski | 01 August 2012, 08:52:26 UTC | split a couple of functions in two to improve readability | 01 August 2012, 08:52:26 UTC |
786ef0f | Gabor Recski | 30 July 2012, 10:01:32 UTC | The -u option now lets the user specify a file containing a list of features that the feature set will be limited to. This limitation will NOT automatically carry on to the tagging phase, the file will have to be specified again in order to limit the features passed to the model at the time of prediction. | 30 July 2012, 10:01:32 UTC |
9282622 | Gabor Recski | 26 July 2012, 13:57:26 UTC | minor bugfix | 26 July 2012, 13:57:26 UTC |
080653d | Gabor Recski | 26 July 2012, 09:41:22 UTC | Liblinearutil is now expected to be in the PYTHONPATH Maxent is no longer imported | 26 July 2012, 09:41:22 UTC |
abcebbb | Gabor Recski | 25 July 2012, 15:00:01 UTC | config file for Hungarian NER-tagging | 25 July 2012, 15:00:01 UTC |
39e527b | Gabor Recski | 25 July 2012, 13:34:21 UTC | README updated some minor changes in trainer.py TODO - find a neat way to install liblinear! | 25 July 2012, 13:34:21 UTC |
5e934cc | Gabor Recski | 23 July 2012, 16:53:12 UTC | New branch: liblinear Modified everything to work with liblinear instead of maxent Now works end-to-end, but hasn't been tested at all. | 23 July 2012, 16:53:12 UTC |
4883f56 | Gabor Recski | 23 July 2012, 16:25:11 UTC | And now the TABS are changed back to spaces... | 23 July 2012, 16:25:11 UTC |
2d6f1b8 | Gabor Recski | 23 July 2012, 16:22:11 UTC | fixed some random indentation errors (don't want to know how it worked till now...) | 23 July 2012, 16:22:11 UTC |
2a95fcf | Gabor Recski | 23 July 2012, 14:42:58 UTC | this was forgotten and is needed by eval.py | 23 July 2012, 14:42:58 UTC |
df77068 | Gabor Recski | 12 July 2012, 15:44:36 UTC | README extended to contain information about the config file (based on Daniel's description) | 12 July 2012, 15:44:36 UTC |
43acef8 | Gabor Recski | 09 July 2012, 17:46:15 UTC | added LGPL license fixed the bug that prevented huntag from reading feature-specific radius values added the getKrPos feature | 09 July 2012, 17:46:15 UTC |
bd6e967 | Gabor Recski | 18 June 2012, 15:51:03 UTC | config files added bug concerning sentence features fixed | 18 June 2012, 15:51:03 UTC |
b41060f | Gabor Recski | 18 June 2012, 09:57:52 UTC | Lexicons for Hungarian NER-tagging Please enter the commit message for your changes. Lines starting | 18 June 2012, 10:11:57 UTC |
e05eef7 | Gabor Recski | 18 June 2012, 08:44:07 UTC | added usage info | 18 June 2012, 08:44:07 UTC |
d6a7f7d | Gabor Recski | 14 June 2012, 08:56:14 UTC | huntag now stops with an error if language model weight is greater than or equals to 1 the Lexicon class does not print a message upon initialization anymore | 14 June 2012, 08:56:14 UTC |
ab62435 | Gabor Recski | 13 June 2012, 11:54:04 UTC | language model weight now implemented (but needs testing) | 13 June 2012, 11:54:04 UTC |
5401314 | Gabor Recski | 13 June 2012, 10:06:38 UTC | two space indentations converted to four spaces | 13 June 2012, 10:06:38 UTC |
889c5a2 | Gabor Recski | 12 June 2012, 18:19:21 UTC | renaming modules according to PEP8 deleting old and unnecessary files | 12 June 2012, 18:19:21 UTC |
b00183b | Gabor Recski | 12 June 2012, 15:49:12 UTC | cleaning up | 12 June 2012, 15:49:12 UTC |
68180dd | Gabor Recski | 12 June 2012, 15:39:31 UTC | Merge branch 'master' of https://github.com/recski/HunTag | 12 June 2012, 15:39:31 UTC |
9b92065 | recski | 12 June 2012, 15:25:11 UTC | Initial commit | 12 June 2012, 15:25:11 UTC |
8168629 | Gabor Recski | 12 June 2012, 14:29:11 UTC | fixed bug, seems to work fine now thorough testing is still encouraged | 12 June 2012, 14:29:11 UTC |
9e5934a | Gabor Recski | 11 June 2012, 15:18:55 UTC | huntag main_tag function and viterbi implemented, runs without errors, but gives false results. Needs debugging, do not use! | 11 June 2012, 15:18:55 UTC |
f3f7dfe | Gabor Recski | 25 May 2012, 15:21:32 UTC | Bigram model almost completely rewritten. bigram-train task now works only `tag' task left to implement | 25 May 2012, 15:21:32 UTC |
82e88c4 | Gabor Recski | 30 January 2012, 12:50:27 UTC | forgot this | 30 January 2012, 12:50:27 UTC |
929d82e | Gabor Recski | 30 January 2012, 12:49:17 UTC | fixed problem with krpatt features The maxent-train mode of huntag.py should now be fully functional | 30 January 2012, 12:49:17 UTC |
43af6c8 | Gabor Recski | 30 January 2012, 10:34:49 UTC | fixed krPieces function to handle CAS features properly | 30 January 2012, 10:34:49 UTC |
47df392 | Gabor Recski | 30 January 2012, 10:24:30 UTC | - fixed bug in Trainer.addEvents - user may specify a file to write features to when training - gaussian penalty has a default value (0), no. of iterations doesn't TODO: something prevents krpatt (sentence type) features from getting added | 30 January 2012, 10:24:30 UTC |
c65efd8 | Gabor Recski | 27 January 2012, 14:52:53 UTC | class for maxent training (not tested and not final) | 27 January 2012, 14:52:53 UTC |
2c870ac | Gabor Recski | 27 January 2012, 13:55:42 UTC | some minor fixes | 27 January 2012, 13:55:42 UTC |
0d0b60e | Gabor Recski | 27 January 2012, 13:32:08 UTC | - outline of huntag.py (but key functions still empty) - command-line interface for huntag.py - config reader (cleaner, slightly modified version of old getOptions function in featurize.py) | 27 January 2012, 13:32:08 UTC |
83ca1c4 | Gabor Recski | 26 January 2012, 13:49:47 UTC | files from the old version which will not be rewritten for the time being | 26 January 2012, 13:49:47 UTC |
6a01d57 | Attila Zseder | 19 January 2012, 09:05:20 UTC | email test 2 | 19 January 2012, 09:05:20 UTC |
997b07d | Attila Zseder | 18 January 2012, 19:35:05 UTC | emailtest | 18 January 2012, 19:35:05 UTC |
219d6ae | Attila Zseder | 17 January 2012, 09:50:37 UTC | test commit numero uno | 17 January 2012, 09:50:37 UTC |
48d2a7d | Adrienn Szabo | 16 January 2012, 14:36:38 UTC | First commit | 16 January 2012, 14:36:38 UTC |