https://github.com/recski/HunTag

sort by:
Revision Author Date Message Commit Date
ac681ac Update README.md 18 January 2016, 08:52:12 UTC
e170f4f adding empty directory "models" (using a gitignore that ignores everything except itself) new file: models/.gitignore 22 December 2014, 13:53:08 UTC
103a7da Merge branch 'input-dir' 17 September 2014, 07:29:05 UTC
481439a scripts and configs new file: configs/hunchunk_eng.cfg new file: configs/hunchunk_w_lemma.cfg new file: scripts/mark_errors.py new file: scripts/print_bracketing.py 17 September 2014, 07:28:22 UTC
e630998 Update README.md 28 April 2014, 12:28:03 UTC
bd6b551 Update README.md 28 April 2014, 11:27:37 UTC
77d6cd9 Update README.md 28 April 2014, 11:23:04 UTC
78f6e86 Update README.md 28 April 2014, 11:21:44 UTC
a922f1d Merge pull request #6 from gabor-recski/patch-1 Update README 28 April 2014, 11:17:50 UTC
f593d62 Update README.md 28 April 2014, 11:15:29 UTC
6cdca39 Rename README to README.md 28 April 2014, 11:14:50 UTC
0b324df Update README 28 April 2014, 11:13:24 UTC
b261fee Merge pull request #5 from recski/input-dir added feature that allows to process all file in a directory while loading the models only once 04 March 2014, 09:56:11 UTC
afd2066 added feature that allows to process all file in a directory while loading the models only once also made the three source files conform strictly to PEP8 (via flakes8) modified: huntag.py modified: tagger.py modified: tools.py 03 March 2014, 18:58:55 UTC
0bc252d the previous commit had a bug, this fixes it modified: tools.py 08 August 2013, 09:42:29 UTC
c67a974 added support for comment lines in input. TSV files passed to huntag may now have lines starting with """ before each sentence. These will be disregarded when training and included in the output whenn tagging. modified: huntag.py modified: tagger.py modified: tools.py modified: trainer.py 02 August 2013, 08:37:49 UTC
f592070 added description of how to patch liblinear 12 June 2013, 14:43:54 UTC
17f485d patch for liblinear python bindings to enable ctypes (necessary, see README)[B new file: liblinear.patch new file: liblinear.patch 12 June 2013, 14:39:38 UTC
b8cf0b3 bugfix: if tagging is invoked with -i option, huntag will no longer look fo a config file modified: huntag.py 27 February 2013, 09:39:29 UTC
c1787ac trainer.getEvents didn't write tags to outFeatFiles (making them useless). Now it does. modified: trainer.py 21 February 2013, 15:27:40 UTC
ec7611c new file: conll2bie1.py 19 February 2013, 10:48:34 UTC
44d68c9 the -i option is now available when tagging, i.e. features can be supplied directly to the tagger (output is then a single column of tags) For this purpose, the outFeatFile created before training contains empty lines between sentences. modified: huntag.py modified: tagger.py modified: trainer.py 13 February 2013, 17:25:00 UTC
50843b5 huntag now allows training from a feature file (e.g. one saved using the -f command). To ensure identical output on each run, the feature list of each token is sorted before the conversion to integers. modified: huntag.py modified: trainer.py 23 January 2013, 18:12:36 UTC
c1c902b got rid of all import * expressions 12 October 2012, 09:45:18 UTC
2d42d46 some leftover tabs changed to spaces 10 October 2012, 08:17:58 UTC
c879666 feature_select is Kata's set of tools which I slightly modified and use to get top feature weights The code needs review 03 October 2012, 15:16:36 UTC
522a35c trainer.writeFeats is now compatible with the new way we store contexts small change to the way output is printed, this makes the langtools wrapper work, we don't know why 12 September 2012, 12:54:42 UTC
2eba8a7 deleted by accident, needed for command-line use of eval.py 30 August 2012, 12:58:21 UTC
d136e90 fixed bug 30 August 2012, 12:56:00 UTC
a2b5b3b an ugly hack to make evalInput compatible with sentenceIterator output 30 August 2012, 12:42:15 UTC
48df104 forgot to remove memory profiling... 30 August 2012, 10:46:01 UTC
a3564f1 trainer.context now contain tuples of c_int-s, to save memory. This assumes a customized version of liblinear.py 30 August 2012, 09:58:21 UTC
0effcf4 small ugly function lets users run basic eval.py functionality from external program 28 August 2012, 15:30:07 UTC
185b21c Tagger.init now expects two BookKeeper instances as the value of the options labelCounter and featCounter, which are created in main_tag Tagger.tag doesn't print anymore, it's a generator that yields tagged sentences. huntag.tag uses tools.writeSentence to print 22 August 2012, 09:54:19 UTC
0f92402 Trainers and taggers now expect their options in a dictionary. huntag.py gets this from the optionsParser Values() object using vars() 21 August 2012, 15:58:46 UTC
592d619 Trainer.init now has a seperate argument for the usedFeats list, which must be an iterable (instead of a file path). Accordingly, if a usedFeats file is specified, it is now converted into an iterable before being passed to Trainer. 21 August 2012, 15:16:12 UTC
007b5f3 basic scripts for handling corpora 21 August 2012, 11:32:06 UTC
f374fbf global cutoff implemented, invoked by -o N cutoff will first run on the feature bookkeeper, which handles counting. the particular list of training events (trainer.contexts) is only then reduced. 15 August 2012, 10:04:11 UTC
5140ff8 split a couple of functions in two to improve readability 01 August 2012, 08:52:26 UTC
786ef0f The -u option now lets the user specify a file containing a list of features that the feature set will be limited to. This limitation will NOT automatically carry on to the tagging phase, the file will have to be specified again in order to limit the features passed to the model at the time of prediction. 30 July 2012, 10:01:32 UTC
9282622 minor bugfix 26 July 2012, 13:57:26 UTC
080653d Liblinearutil is now expected to be in the PYTHONPATH Maxent is no longer imported 26 July 2012, 09:41:22 UTC
abcebbb config file for Hungarian NER-tagging 25 July 2012, 15:00:01 UTC
39e527b README updated some minor changes in trainer.py TODO - find a neat way to install liblinear! 25 July 2012, 13:34:21 UTC
5e934cc New branch: liblinear Modified everything to work with liblinear instead of maxent Now works end-to-end, but hasn't been tested at all. 23 July 2012, 16:53:12 UTC
4883f56 And now the TABS are changed back to spaces... 23 July 2012, 16:25:11 UTC
2d6f1b8 fixed some random indentation errors (don't want to know how it worked till now...) 23 July 2012, 16:22:11 UTC
2a95fcf this was forgotten and is needed by eval.py 23 July 2012, 14:42:58 UTC
df77068 README extended to contain information about the config file (based on Daniel's description) 12 July 2012, 15:44:36 UTC
43acef8 added LGPL license fixed the bug that prevented huntag from reading feature-specific radius values added the getKrPos feature 09 July 2012, 17:46:15 UTC
bd6e967 config files added bug concerning sentence features fixed 18 June 2012, 15:51:03 UTC
b41060f Lexicons for Hungarian NER-tagging Please enter the commit message for your changes. Lines starting 18 June 2012, 10:11:57 UTC
e05eef7 added usage info 18 June 2012, 08:44:07 UTC
d6a7f7d huntag now stops with an error if language model weight is greater than or equals to 1 the Lexicon class does not print a message upon initialization anymore 14 June 2012, 08:56:14 UTC
ab62435 language model weight now implemented (but needs testing) 13 June 2012, 11:54:04 UTC
5401314 two space indentations converted to four spaces 13 June 2012, 10:06:38 UTC
889c5a2 renaming modules according to PEP8 deleting old and unnecessary files 12 June 2012, 18:19:21 UTC
b00183b cleaning up 12 June 2012, 15:49:12 UTC
68180dd Merge branch 'master' of https://github.com/recski/HunTag 12 June 2012, 15:39:31 UTC
9b92065 Initial commit 12 June 2012, 15:25:11 UTC
8168629 fixed bug, seems to work fine now thorough testing is still encouraged 12 June 2012, 14:29:11 UTC
9e5934a huntag main_tag function and viterbi implemented, runs without errors, but gives false results. Needs debugging, do not use! 11 June 2012, 15:18:55 UTC
f3f7dfe Bigram model almost completely rewritten. bigram-train task now works only `tag' task left to implement 25 May 2012, 15:21:32 UTC
82e88c4 forgot this 30 January 2012, 12:50:27 UTC
929d82e fixed problem with krpatt features The maxent-train mode of huntag.py should now be fully functional 30 January 2012, 12:49:17 UTC
43af6c8 fixed krPieces function to handle CAS features properly 30 January 2012, 10:34:49 UTC
47df392 - fixed bug in Trainer.addEvents - user may specify a file to write features to when training - gaussian penalty has a default value (0), no. of iterations doesn't TODO: something prevents krpatt (sentence type) features from getting added 30 January 2012, 10:24:30 UTC
c65efd8 class for maxent training (not tested and not final) 27 January 2012, 14:52:53 UTC
2c870ac some minor fixes 27 January 2012, 13:55:42 UTC
0d0b60e - outline of huntag.py (but key functions still empty) - command-line interface for huntag.py - config reader (cleaner, slightly modified version of old getOptions function in featurize.py) 27 January 2012, 13:32:08 UTC
83ca1c4 files from the old version which will not be rewritten for the time being 26 January 2012, 13:49:47 UTC
6a01d57 email test 2 19 January 2012, 09:05:20 UTC
997b07d emailtest 18 January 2012, 19:35:05 UTC
219d6ae test commit numero uno 17 January 2012, 09:50:37 UTC
48d2a7d First commit 16 January 2012, 14:36:38 UTC
back to top