https://github.com/vsiivola/variKN
Raw File
Tip revision: c525568ed62fddfb0351946efe049c4e9ead9ddf authored by Vesa Siivola on 16 January 2014, 08:27:59 UTC
Implement leave-one-out estimates for the discounts. If optimization corpus is not set, use these estimates. Also, initialize numerical search for the parameters with these values. In the latter case, preliminary tests seem to indicate that better accuracy is reached than with the original heuristic search start point.
Tip revision: c525568
arpa2arpa.cc
// This program converts the extended ARPA formats used by some tools
// to regular standard ARPA language models.
#include "conf.hh"
#include "io.hh"
#include "TreeGram.hh"
//#include "HashGram.hh"

int main (int argc, char **argv) {
  conf::Config config;
  config("Usage:  arpa2arpa nonstandard_in standard_out\nConverts interpolated arpa to backoff arpa.\n")
    ('t',"tabs","","","Use tabs instead of space between the fields when writing the ARPA file.");
  config.parse(argc,argv,2);

  std::string field_separator = config["tabs"].specified?"\t":" ";

  io::Stream::verbose=true;
  io::Stream in(config.arguments[0], "r");
  io::Stream out(config.arguments[1], "w");

  TreeGram ng;
  //HashGram_t<int> ng;

  fprintf(stderr,"Reading\n");
  ng.read(in.file);
  in.close();

  fprintf(stderr,"Writing\n");
  ng.write(out.file, false, field_separator);
  out.close();
}
back to top