Raw File

libutils.a now has a new feature, with parameter lists. Here follows the
doc for parameter lists.

The idea is that every cado tool (so far polyselect/*, sieve/makefb, and
sieve/sieve, for an example) can have its working configuration either
completely within a file, or completely on the command line.

The old way still works unchanged. There are now just many more ways to
call programs, like:

polyselect/polyselect 71641520761751435455133616475667090434063332228247871795429

or with a config file:
polyselect/kleinjung << EOF
M=5e20
degree=5
l=8
n=3835722565249558322197842586047190634357074639921908543369929770877061053472916307274359341359614012333625966961056935597194003176666977
pb=256
EOF

The syntax of config files that can be read is the union of the
ggnfs/cado format, and the params file for the now obsolete cadofactor.pl
script:

file=(<comment-line> | <non-comment>)*
comment-line = <empty-line>|<white>#.*
non-comment = <white><key><white><separator><white><value><white>(#.*)?
white=<whitespace>*
key=<alphabetic>[<alphanumeric>_]*
separator=(:= | : | =)
value=<non-space>+

I see several benefits.
- for very quick tests, putting everything on the cmdline is handy.
- for regression tests, being able to put everything in a file is handy
- this deletes some code (but adds more). The cost per extra parameter is
  lower.


The processing goes through two steps:
- build a dictionary of (key, value) pair (as character strings).
- parse some values to set variables in the program.

###################################################################
# Quick documentation for the impatient.

Common accepted syntaxes for options are --foo blah, -foo blah, or
foo=blah. On a first pass, keys are recognized as strings, and parsing
comes later on once lexing has completed.

Some terminology first.

``switches'' is the term I use for option which trigger something by their
mere presence, not requiring any other ``value'' input. The archetypal
example is for example --verbose.

``aliases'' is a means of indicating that a given option may be
referenced in more than one manner.

Effort has been put towards making it possible and easy to source a
config file hosting a (possibly large) number of options.



To use params.[ch], your main() function must follow the following
pattern.

    param_list pl;

    param_list_init(pl);

    argv++,argc--;
    /* switches, if any. See below */
    /* aliases, if any. See below */

    for( ; argc ; ) {
        if (param_list_update_cmdline(pl, &argc, &argv)) { continue; }
        /* Do perhaps some other things on the argument that haven't
         * been eaten at all. Like check whether it is a valid file to
         * source in order to get more options. See
         * param_list_read_stream and param_list_read_file for that. */
        fprintf (stderr, "Unknown option: %s\n", argv[0]);
        usage();
    }

    /* Now parse the values corresponding to options, and map their
     * meaning somewhere */
    const char * tmp;

    /* param_list_lookup_string gives a const char * pointer which will
     * live only for as long as the param_list structure isn't freed. The
     * return value is NULL if no such key exists */

    if ((tmp = param_list_lookup_string(pl, "subdir")) != NULL) {
        fprintf(stderr, "--subdir is no longer supported."
                " prepend --out with a path instead.\n");
        exit(1);
    }

    /* ... */

    /* It is possible to strdup() it of course */
    working_filename = strdup(param_list_lookup_string(pl, "out"));
    if (working_filename == NULL) {
        fprintf(stderr, "Required argument --out is missing\n");
        exit(1);
    }
    /* ... */


    /* all parsing functions return 1 when the key was found, 0 if not.
     * If no key was found, no assignment is performed and the default
     * value set beforehand remains */
    param_list_parse_int(pl, "hslices", &split[0]);
    param_list_parse_int(pl, "vslices", &split[1]);

    int split[2] = {1,1};
    param_list_parse_intxint(pl, "nbuckets", split);

    /* Good practice mandates that unused parameters trigger a warning,
     * if not an error. Only unused parameters from command line are
     * considered for issuance of a warning. Unused parameters found in
     * config files are considered normal and trigger nothing.
     */
    if (param_list_warn_unused(pl)) {
        usage();
    }
    param_list_clear(pl);


It is also possible to configure switches. Configure switches goes toghether
with specifying a pointer to the switch value, which must have type int.

    param_list_configure_switch(pl, "--legacy", &legacy);
    param_list_configure_switch(pl, "--remove-input", &remove_input);
    param_list_configure_switch(pl, "--pad", &pad_to_square);

Aliases:
    param_list_configure_alias(pl, "--pad", "--square");
    // --> means that the switch --pad may be aliased as --square

    param_list_configure_alias(pl, "out", "output-name");
    // means that --out, out=, and so on, may be aliased as
    // --output-name, etc.


###################################################################
# Slightly longer doc.

Roadmap for use:

** BUILDING THE DICTIONARY **

start with:

    param_list pl;
    param_list_init(pl);

    argv++, argc--;
    for( ; argc ; ) {

then we're parsing arguments in argv,argc as we always have been. The
only difference (for some programs) is that now argv is shifted so that
the ``current'' argument is at argv[0]

the param_list routines are good at:
- parsing arguments of the form --degree 42 , or -d 42 , or degree=42.
- parsing config files.

So far, they are _not_ made for:
- booleans (-skip-something)
- switches (-verbose)
- things not foreseen.

Therefore, before giving your argv,argc to the param_list routines, you
have the occasion to parse your command line as you always have been.
Just do so only for the strict necessary, since for those arguments, you
won't get the benefit of allowing information to come from config files:

    if (strcmp(argv[0], "-v") == 0) { verbose++; argv++,argc--; continue; }

You may call the param_list routines to recognize certain specific
arguments on the command line, like:

    if (param_list_update_cmdline(pl, "degree", &argc, &argv)) { continue; }

This will recognize --degree 42, as well as degree=42, on the command
line. Upon success, 1 is returnes, and argv, argc advance by the right
number of positions.

Or you could accept just everything that ``resembles'' a parameter.
Formally, a parameter matches
    (-<key> <value> | --<key> <value> | <key>=<value>)
    where <key>=<alphabetic>[<alphanumeric>_]*
To match in such a wildcard way (which is the easy way to go, really):

    if (param_list_update_cmdline(pl, NULL, &argc, &argv)) { continue; }

That's mostly it.  If we haven't hit a 'continue' statement within the
loop already, then we might have some unparsed garbage on the command
line (1). Or just something special (2). Or we might be very polite and
allow the user to specify the name of an input parameter file, freeform,
to be parsed right now (3). Or we've got some very peculiar way of
receiving config arguments, so we'll catch them now (4).

To illustrate these cases, here are some snippets. For case (2):

      if (strspn(argv[0], "0123456789") == strlen(argv[0])) {
          param_list_add_key(pl, "n", argv[0], PARAMETER_FROM_CMDLINE);
          argv++,argc--;
          continue;
      }

The call to param_list_add_key above is almost an innard, and having it
in the API is just an ugly convenience measure. In particular, notice
that it's your job at this point to position argv,argc correctly for the
next round.

After such a test, case (3) can be handled so:

      if ((f = fopen(argv[0], "r")) != NULL) {
          param_list_read_stream(pl, f);
          fclose(f);
          argv++,argc--;
          continue;
      }

[begin digression: another way of accepting input files can be when parsing
arguments (before calling param_list_update_cmdline):

      if (argc >= 2 && strcmp (argv[0], "-poly") == 0) {
          param_list_read_file(pl, argv[1]);
          argv++,argc--;
          argv++,argc--;
          continue;
      }

Notice that param_list_read_file fails when the file does not exist,
while the former option above allows for the file not to exist.
end digression]

End of story, case (1) would go as such:

      fprintf(stderr, "Unhandled parameter %s\n", argv[0]);
      usage();

The ``peculiar cases'' (case (4)) are to be handled in a way which
depends exactly on how parameters are expected. Look into handling of
amin amax bmin bmax within sieve/sieve.c for an example.

Extra: parameters may have aliases. For the moment, aliases are better
recognized before calling param_list_update_cmdline, by special calls to
param_list_update_cmdline_alias (see polyselect/polyselect.c). But this
may change.

** PARSING VALUES **

All the param_list_parse_* routines do parsing, and set variables. They
return 1 on success, 0 on failure.

It is possible to update the dictionary again depending on the result of
a parsing operation. For example, polyselect reads stdin only if it needs
it for knowing n:

  int have_n = param_list_parse_mpz(pl, "n", poly->n);

  if (!have_n) {
      if (verbose) {
          fprintf(stderr, "Reading n from stdin\n");
      }
      param_list_read_stream(pl, stdin);
      have_n = param_list_parse_mpz(pl, "n", poly->n);
  }

The param_list_parse_* functions should be easily understood.

Eventually, it is possible to make sure that all command-line parameters
were used, and warn otherwise:

  if (param_list_warn_unused(pl)) {
      usage();
  }

And clearing the parameter list can be done as soon as it's no longer
needed:

  param_list_clear(pl);

Before that, notice that cado_poly_read has been superseded by
cado_poly_set_plist, to be used so:

    cado_poly_init(cpoly);
    if (!cado_poly_set_plist (cpoly, pl)) {
        exit (EXIT_FAILURE);
    }


###################################################################

There is (yet) some more documentation text in params.h

back to top