# Running and Benchmarking Halide Generators ## Overview `RunGen` is a simple(ish) wrapper that allows an arbitrary Generator to be built into a single executable that can be run directly from bash, without needing to wrap it in your own custom main() driver. It also implements a rudimentary benchmarking and memory-usage functionality. If you use the standard CMake rules for Generators, you get RunGen functionality automatically. (If you use Make, you might need to add an extra rule or two to your Makefile; all the examples in `apps/` already have these rules.) For every `halide_library` (or `halide_library_from_generator`) rule, there is an implicit `name.rungen` rule that generates an executable that wraps the Generator library: ``` # In addition to defining a static library named "local_laplacian", this rule # also implicitly defines an executable target named "local_laplacian.rungen" halide_library( local_laplacian SRCS local_laplacian_generator.cc ) ``` You can build and run this like any other executable: ``` $ make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen Usage: local_laplacian.rungen argument=value [argument=value... ] [flags] ...typical "usage" text... ``` To be useful, you need to pass in values for the Generator's inputs (and locations for the output(s)) on the command line, of course. You can use the `--describe` flag to see the names and expected types: ``` # ('make bin/local_laplacian.rungen && ' prefix omitted henceforth for clarity) $ ./bin/local_laplacian.rungen --describe Filter name: "local_laplacian" Input "input" is of type Buffer with 3 dimensions Input "levels" is of type int32 Input "alpha" is of type float32 Input "beta" is of type float32 Output "local_laplacian" is of type Buffer with 3 dimensions ``` Warning: Outputs may have `$X` (where `X` is a small integer) appended to their names in some cases (or, in the case of Generators that don't explicitly declare outputs via `Output<>`, an autogenerated name of the form `fX`). If this happens, don't forget to escape the `$` with a backslash as necessary. These are both bugs we intend to fix; see https://github.com/halide/Halide/issues/2194 As a convenience, there is also an implicit target that builds-and-runs, named simply "NAME.run": ``` # This is equivalent to "make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen" $ make bin/local_laplacian.run Usage: local_laplacian.rungen argument=value [argument=value... ] [flags] # To pass arguments to local_laplacian.rungen, set the RUNARGS var: $ make bin/local_laplacian.run RUNARGS=--describe Filter name: "local_laplacian" Input "input" is of type Buffer with 3 dimensions Input "levels" is of type int32 Input "alpha" is of type float32 Input "beta" is of type float32 Output "local_laplacian" is of type Buffer with 3 dimensions ``` Inputs are specified as `name=value` pairs, in any order. Scalar inputs are specified the typical text form, while buffer inputs (and outputs) are specified via paths to image files. RunGen currently can read/write image files in any format supported by halide_image_io.h; at this time, that means .png, .jpg, .ppm, .pgm, and .tmp formats. (We plan to add .tiff and .mat (level 5) in the future.) ``` $ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png $ display /tmp/out.png ``` You can also specify any scalar input as `default` or `estimate`, which will use the default value specified for the input, or the value specified by `set_estimate` for that input. (If the relevant value isn't set for that input, a runtime error occurs.) ``` $ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=estimate beta=default output=/tmp/out.png $ display /tmp/out.png ``` If you specify an input or output file format that doesn't match the required type/dimensions for an argument (e.g., using an 8-bit PNG for an Input, or a grayscale image for a 3-dimensional input), RunGen will try to coerce the inputs to something sensible; that said, it's hard to always get this right, so warnings are **always** issued whenever an input or output is modified in any way. ``` # This filter expects a 16-bit RGB image as input, but we're giving it an 8-bit grayscale image: $ ./bin/local_laplacian.rungen input=../images/gray.png levels=8 alpha=1 beta=1 output=/tmp/out.png Warning: Image for Input "input" has 2 dimensions, but this argument requires at least 3 dimensions: adding dummy dimensions of extent 1. Warning: Image loaded for argument "input" is type uint8 but this argument expects type uint16; data loss may have occurred. ``` By default, we try to guess a suitable size for the output image(s), based mainly on the size of the input images (if any); you can also specify explicit output extents. (Note that output_extents are subject to constraints already imposed by the particular Generator's logic, so arbitrary values for --output_extents may produce runtime errors.) ``` # Constrain output extents to 100x200x3 $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png ``` Sometimes you don't care what the particular element values for an input are (e.g. for benchmarking), and you just want an image of a particular size; in that case, you can use the `zero:[]` pseudo-file; it infers the _type_ from the Generator, and inits every element to zero: ``` # Input is a 3-dimensional image with extent 123, 456, and 3 # (bluring an image of all zeroes isn't very interesting, of course) $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png ``` You can also specify arbitrary (nonzero) constants: ``` # Input is a 3-dimensional image with extent 123, 456, and 3, # filled with a constant value of 42 $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=constant:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png ``` Similarly, you can create identity images where only the diagonal elements are 1-s (rest are 0-s) by invoking `identity:[]`. Diagonal elements are defined as those whose first two coordinates are equal. There's also a `random:SEED:[]` pseudo-file, which fills the image with uniform noise based on a specific random-number seed: ``` # Input is a 3-dimensional image with extent 123, 456, and 3 $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=random:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png ``` Instead of specifying an explicit set of extents for a pseudo-input, you can use the string `auto`, which will run a bounds query to choose a legal set of extents for that input given the known output extents. (This is only useful when used in conjunction with the `--output_extents` flag.) ``` $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png ``` You can also specify `estimate` for the extents, which will use the estimate values provided, typically (but not necessarily) for auto_schedule. (If there aren't estimates for all of the buffer's dimensions, a runtime error occurs.) ``` $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png ``` You can combine the two and specify `estimate_then_auto` for the extents, which will attempt to use the estimate values; if a given input buffer has no estimates, it will fall back to the bounds-query result for that input: ``` $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:estimate_then_auto levels=8 alpha=1 beta=1 output=/tmp/out.png ``` Similarly, you can use `estimate` for `--output_extents`, which will use the estimate values for each output. (If there aren't estimates for all of the outputs, a runtime error occurs.) ``` $ ./bin/local_laplacian.rungen --output_extents=estimate input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png ``` If you don't want to explicitly specify all (or any!) of the input values, you can use the `--default_input_buffers` and `--default_input_scalars` flags, which provide wildcards for any omitted inputs: ``` $ ./bin/local_laplacian.rungen --output_extents=[100,200,3] --default_input_buffers=random:0:auto --default_input_scalars=estimate output=/tmp/out.png ``` In this case, all input buffers will be sized according to bounds query, and filled with a random seed; all input scalars will be initialized to their declared default values. (If they have no declared default value, a zero of the appropriate type will be used.) Note: `--default_input_buffers` can produce surprising sizes! For instance, any input that uses `BoundaryConditions::repeat_edge` to wrap itself can legally be set to almost any size, so you may legitimately get an input with extent=1 in all dimensions; whether this is useful to you or not depends on the code. It's highly recommended you do testing with the `--verbose` flag (which will log the calculated sizes) to reality-check that you are getting what you expect, especially for benchmarking. A common case (especially for benchmarking) is to specify using estimates for all inputs and outputs; for this, you can specify `--estimate_all`, which is just a shortcut for `--default_input_buffers=estimate_then_auto --default_input_scalars=estimate --output_extents=estimate`. ## Benchmarking To run a benchmark, use the `--benchmarks=all` flag: ``` $ ./bin/local_laplacian.rungen --benchmarks=all input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3] Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations. Best output throughput is 39.9802 mpix/sec. ``` You can use `--default_input_buffers` and `--default_input_scalars` here as well: ``` $ ./bin/local_laplacian.rungen --benchmarks=all --default_input_buffers --default_input_scalars --output_extents=estimate Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations. Best output throughput is 39.9802 mpix/sec. ``` Note: `halide_benchmark.h` is known to be inaccurate for GPU filters; see https://github.com/halide/Halide/issues/2278 ## Measuring Memory Usage To track memory usage, use the `--track_memory` flag, which measures the high-water-mark of CPU memory usage. ``` $ ./bin/local_laplacian.rungen --track_memory input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3] Maximum Halide memory: 82688420 bytes for output of 1.97754 mpix. ``` Warning: `--track_memory` may degrade performance; don't combine it with `--benchmark` or expect meaningful timing measurements when using it. ## Using RunGen in Make To add support for RunGen to your Makefile, you need to add rules something like this (see `apps/support/Makefile.inc` for an example): ``` HALIDE_DISTRIB ?= /path/to/halide/distrib/folder $(BIN)/RunGenMain.o: $(HALIDE_DISTRIB)/tools/RunGenMain.cpp @mkdir -p $(@D) @$(CXX) -c $< $(CXXFLAGS) $(LIBPNG_CXX_FLAGS) $(LIBJPEG_CXX_FLAGS) -I$(BIN) -o $@ .PRECIOUS: $(BIN)/%.rungen $(BIN)/%.rungen: $(BIN)/%.a $(BIN)/%.registration.cpp $(BIN)/RunGenMain.o $(CXX) $(CXXFLAGS) $^ -o $@ $(LIBPNG_LIBS) $(LIBJPEG_LIBS) $(LDFLAGS) RUNARGS ?= $(BIN)/%.run: $(BIN)/%.rungen @$(CURDIR)/$< $(RUNARGS) ``` Note that the `%.registration.cpp` file is created by running a generator and specifying `registration` in the comma-separated list of files to emit; these are also generated by default if `-e` is not used on the generator command line. ## Known Issues & Caveats - If your Generator uses `define_extern()`, you must have all link-time dependencies declared properly via `FILTER_DEPS`; otherwise, you'll fail to link. - The code does its best to detect when inputs or outputs need to be chunky/interleaved (rather than planar), but in unusual cases it might guess wrong; if your Generator uses buffers with unusual stride setups, RunGen might fail at runtime. (If this happens, please file a bug!) - The code for deducing good output sizes is rudimentary and needs to be smartened; it will sometimes make bad decisions which will prevent the filter from executing. (If this happens, please file a bug!)