https://github.com/halide/Halide
Raw File
Tip revision: b2cf3e1c74f579b60f7154688f0bb8defe2ca38b authored by Steven Johnson on 14 June 2018, 16:09:28 UTC
Merge branch 'master' into extern_host_alloc
Tip revision: b2cf3e1
README_rungen.md
# Running and Benchmarking Halide Generators

## Overview

`RunGen` is a simple(ish) wrapper that allows an arbitrary Generator to be built
into a single executable that can be run directly from bash, without needing to
wrap it in your own custom main() driver. It also implements a rudimentary
benchmarking and memory-usage functionality.

If you use the standard CMake (or Bazel) rules for Generators, you get RunGen
functionality automatically. (If you use Make, you might need to add an extra rule
or two to your  Makefile; all the examples in `apps/` already have these rules.)

For every `halide_library` (or `halide_library_from_generator`) rule,
there is an implicit `name.rungen` rule that generates an executable that wraps
the Generator library:

```
# In addition to defining a static library named "local_laplacian", this rule 
# also implicitly defines an executable target named "local_laplacian.rungen"
halide_library(
    local_laplacian
    SRCS local_laplacian_generator.cc
)
```

You can build and run this like any other executable:

```
$ make local_laplacian.rungen && ./bin/local_laplacian.rungen
Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]
...typical "usage" text...
```

To be useful, you need to pass in values for the Generator's inputs (and
locations for the output(s)) on the command line, of course. You can use the
`--describe` flag to see the names and expected types:

```
# ('make local_laplacian.rungen && ' prefix omitted henceforth for clarity)
$ ./bin/local_laplacian.rungen --describe
Filter name: "local_laplacian"
  Input "input" is of type Buffer<uint16> with 3 dimensions
  Input "levels" is of type int32
  Input "alpha" is of type float32
  Input "beta" is of type float32
  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
```

Warning: Outputs may have `$X` (where `X` is a small integer) appended to their
names in some cases (or, in the case of Generators that don't explicitly declare
outputs via `Output<>`, an autogenerated name of the form `fX`). If this
happens, don't forget to escape the `$` with a backslash as necessary. These are
both bugs we intend to fix; see https://github.com/halide/Halide/issues/2194

As a convenience, there is also an implicit target that builds-and-runs, named simply "NAME.run":

```
# This is equivalent to "make local_laplacian.rungen && ./bin/local_laplacian.rungen"
$ make local_laplacian.run
Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]

# To pass arguments to local_laplacian.rungen, set the RUNARGS var:
$ make local_laplacian.run RUNARGS=--describe
Filter name: "local_laplacian"
  Input "input" is of type Buffer<uint16> with 3 dimensions
  Input "levels" is of type int32
  Input "alpha" is of type float32
  Input "beta" is of type float32
  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
```

Inputs are specified as `name=value` pairs, in any order. Scalar inputs are specified
the typical text form, while buffer inputs (and outputs) are specified via paths to image files.
RunGen currently can read/write image files in any format supported by halide_image_io.h; at this time, that means .png, .jpg, .ppm, .pgm, and .tmp formats. (We plan to add .tiff and .mat (level 5) in the future.)

```
$ ./bin/local_laplacian.rungen input=../apps/images/rgb_small16.png levels=8 alpha=1 beta=1 local_laplacian=/tmp/out.png
$ display /tmp/out.png
```

If you specify an input or output file format that doesn't match the required
type/dimensions for an argument (e.g., using an 8-bit PNG for an Input<float>,
or a grayscale image for a 3-dimensional input), RunGen will try to coerce the
inputs to something sensible; that said, it's hard to always get this right, so
warnings are **always** issued whenever an input or output is modified in any
way.

```
# This filter expects a 16-bit RGB image as input, but we're giving it an 8-bit grayscale image:
$ ./bin/local_laplacian.rungen input=../apps/images/gray.png levels=8 alpha=1 beta=1 local_laplacian=/tmp/out.png
Warning: Image for Input "input" has 2 dimensions, but this argument requires at least 3 dimensions: adding dummy dimensions of extent 1.
Warning: Image loaded for argument "input" is type uint8 but this argument expects type uint16; data loss may have occurred.
```

By default, we try to guess a suitable size for the output image(s), based mainly
on the size of the input images (if any); you can also specify explicit output
extents. (Note that output_extents are subject to constraints already imposed by
the particular Generator's logic, so arbitrary values for --output_extents may
produce runtime errors.)

```
# Constrain output extents to 100x200x3
$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=../apps/images/rgb_small16.png levels=8 alpha=1 beta=1 local_laplacian=/tmp/out.png
```

Sometimes you don't care what the particular element values for an input are
(e.g. for benchmarking), and you just want an image of a particular size; in
that case, you can use the `zero:[]` pseudo-file; it infers the *type* from the
Generator, and inits every element to zero:

```
# Input is a 3-dimensional image with extent 123, 456, and 3
# (bluring an image of all zeroes isn't very interesting, of course)
$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:[123,456,3] levels=8 alpha=1 beta=1 local_laplacian=/tmp/out.png
```

## Benchmarking

To run a benchmark, use the `--benchmark` flag:

```
# When you specify the --benchmark flag, outputs become optional.
$ ./bin/local_laplacian.rungen --benchmark input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 
Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations.
Best output throughput is 39.9802 mpix/sec.
```

Note: this uses Halide's `halide_benchmark.h` to measure the execution time,
which runs several consecutive sample sets (default=3) of multiple iterations
(default=10) each, then chooses the best average time. You can use the
`--benchmark_samples` and `--benchmark_iterations` to override these defaults.

Note: `halide_benchmark.h` is known to be inaccurate for GPU filters; see https://github.com/halide/Halide/issues/2278

## Measuring Memory Usage

To track memory usage, use the `--track_memory` flag, which measures the
high-water-mark of CPU memory usage.

```
# When you specify the --track_memory flag, outputs become optional.
$ ./bin/local_laplacian.rungen --track_memory input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 
Maximum Halide memory: 82688420 bytes for output of 1.97754 mpix.
```

Warning: `--track_memory` may degrade performance; don't combine it with
`--benchmark` or expect meaningful timing measurements when using it.

## Using RunGen in Make

To add support for RunGen to your Makefile, you need to add rules something like this (see `apps/support/Makefile.inc` for an example):

```
HALIDE_DISTRIB ?= /path/to/halide/distrib/folder

$(BIN)/RunGen.o: $(HALIDE_DISTRIB)/tools/RunGen.cpp
  @mkdir -p $(@D)
  @$(CXX) -c $< $(CXXFLAGS) $(LIBPNG_CXX_FLAGS) $(LIBJPEG_CXX_FLAGS) -I$(BIN) -o $@

.PRECIOUS: $(BIN)/%.rungen
$(BIN)/%.rungen: $(BIN)/%.a $(BIN)/RunGen.o $(HALIDE_DISTRIB)/tools/RunGenStubs.cpp
  $(CXX) $(CXXFLAGS) -DHL_RUNGEN_FILTER_HEADER=\"$*.h\" $^ -o $@ $(LIBPNG_LIBS) $(LIBJPEG_LIBS) $(LDFLAGS)

RUNARGS ?=

$(BIN)/%.run: $(BIN)/%.rungen
  @$(CURDIR)/$< $(RUNARGS)
```

## Known Issues & Caveats

-   If your Generator uses `define_extern()`, you must have all link-time
    dependencies declared properly via `FILTER_DEPS`; otherwise, you'll fail to
    link.
-   The code does its best to detect when inputs or outputs need to be
    chunky/interleaved (rather than planar), but in unusual cases it might guess
    wrong; if your Generator uses buffers with unusual stride setups, RunGen
    might fail at runtime. (If this happens, please file a bug!)
-   The code for deducing good output sizes is rudimentary and needs to be
    smartened; it will sometimes make bad decisions which will prevent the
    filter from executing. (If this happens, please file a bug!)

back to top