https://github.com/stan-dev/stan
Revision b2f65cfa8279f88b6f20f443bb541f06377b3b67 authored by Bob Carpenter on 30 August 2012, 22:57:59 UTC, committed by Bob Carpenter on 30 August 2012, 22:57:59 UTC
1 parent af91f3d
Raw File
Tip revision: b2f65cfa8279f88b6f20f443bb541f06377b3b67 authored by Bob Carpenter on 30 August 2012, 22:57:59 UTC
updated web page to version 1.0.0
Tip revision: b2f65cf
TO-DO.txt
STAN V1.0
======================================================================


CODE CLEANUP
======================================================================

* use "sample of size 1000" rather than "1000 samples", etc.

* const declarations wherever possible

* more care with size_t, etc.

* API doc for doxygen

* privatize API parts not needed

* stan::math and stan::agrad configurable error handling
  - how to configure std::log(), etc. for error handling?
    need our own versions?

* remove unnecessary dependencies/includes

* convert for loops with size bounds to appropriate iterators

* reduce code duplication in
  - gm generator



RStan
======================================================================

* group dimensions of multivariate params somehow
  -- perhaps share scale?

* print for each parameter, with default and config in plot(stan.fit).
  -- log ESS,
  -- log R-hat, and
  -- stacked per-chain medians + intervals
     ++ combined chains
  -- overlaid density estimates 
     ++ combined chains
  -- stacked histograms (sparklines-like)
  ?? autocorrelation plot
  -- traceplot
  -- MCMC std error
  -- posterior sd

* control number of digits in output
  -- low default to make Andrew happy
  -- have it depend on se scale?

* order of plots should match declaration order

* include greyed out warmup in traceplot

* color highlight bad R-hat values (red / orange ?)
  -- need enough contrast

* do variable-at-a-time traceplot
  -- traceplot(fit) -- array of traceplots for all vals (truncated)
  -- traceplot(fit,"a") -- array of traceplots for a (truncated)
  -- traceplot(fit,"a[5,3]") -- traceplot for a[5,3]

* break out into own project with its own license (GPL)
  - still include on Stan page, just flag license, or
  - break out into own project

* use Rcpp to call C from R and vice-versa

* Example call:

   fit <- rstan(model_file,inits, ....);

   + compile model
   + read dump data OR from R
   + read dump inits OR from R
   + run model
   + load results to file OR R

* access data from samples object in C++

* Run pre-built analyses on fit:
  + attach.stan(fit)
  + print(fit)
  + traceplot(fit)
  + plot.stan(fit)
  + dump.stan(fit)
  + read.stan(file_path)

* Get data from fit 
  - structured
  - chains optionally merged w. permutation
  - individual chains



EXTENSIONS
======================================================================

Sampling
----------------------------------------------------------------------

* catch +inf, -inf values of unconstrained params and let
  users know they have an improper posterior

* allow cross-chain sharing of adaptation parameters
  to recover from a subset of chains going off the rails during
  warmup

* set mass matrix from the parameters

* discrete samplers (Matt)
  - Gibbs for small outcome sets
  - slice for larger outcome sets
    + only makes sense for ordinals/counts
  - what about multinomial with constraints
    + need at least two varying params
  - adaptive Metropolis style?

* more general built-in initializations
  -  normal or student-t w. params
  -  or even set uniform boundaries
  -  e.g., uniform(-1,1), normal(0,1), t(0,2.5)

* stop NUTS from running away without coming back
  - limit number of steps
    + command-line param
    + instrumenting NUTS loops (adaptation, leapfrog)
  - limit time
    + start recording time at beginning, check loops inside
  - bomb with report if exceed these steps

* print more info during adaptation
  - adapted step size


Modeling Language
----------------------------------------------------------------------

* make any C++ reserved word a reserved word so it doesn't
  mess up C++ compilation

* constants for max and min integer values

* convert all int values to long to match doc!

* add asserts to the language itself, or something to throw exceptions
  - even a general logger with debug levels?

* add printf statement for debugging purposes
  - really format or just list params?

* need to pass in randomizer to repeat generated quantities
  - where to do this? 
    + model ctor defeats thread safety
    + pass into generated quantities eval is safest

* explicit var types, useful for single expression for types
  e.g. vector(5)[2] x;
  vs.  vector(5) x[2];


* write model text into output in a comment in output
  - optional?

* make sure no aliasing issues for Eigen,
  - e.g.  A <- A * B;  A <- A';
  - shouldn't be problem with our matrix lib's returns

* make sure assignments only allowed to var in blocks
  (documented in manual?)

* triangular matrix type
  - how to implement in Eigen?
    + triangular view?
  - upper and lower
  - strict (zero diagonal) and non-strict

* diagonal matrix type (?)

* symmetric matrix type 
 - how to implement in Eigen?  
 - what ops to allow? 
   + sets are symmetric
   + but this'll get products wrong as
     everything but diagonal will get set twice

* Cholesky factor type

* point on hypersphere type 
  - like simplex, but 2-norm is 1
  - trickier for generating last point as only sign
    is undetermined

* special positive-definiteness-preserving operations
  - basic data type (or trait)
  - operations that preserve well-formedness, perhaps
    by fudging boundary conditions

* add a parameter initialization block
  - fill in values not given randomly or report error?
  - needs randomization

* syntactic sugar sampling notation
  - blocks where sampling not allowed (all but model)
  - generated quantities, parameter init block:
      beta[n] ~ normal(0,10);
    means 
      beta[n] <- random_normal(0,10);
  - right-hand side must all be defined here

* matrix power operations
  -  matrix power:    A^n
  - matrix elementwise power:  A.^x

* Allow normal_log<true> and normal_log<false> to be accessed
  directly

* Allow normal_log(y|0,1) as parser synonym for normal_log(y,0,1)

* have sampling categorical variables raise an error on parse
  if it doesn't already

* warnings
  - a <- foo(theta) if LHS contains variable: overwrite var
  - a ~ foo(theta)  if LHS is complex expression: need Jacobian

* integer subtypes
  - integer(-inf,inf), count(1,inf), natural(0,inf)
  - boolean(0,1), categorical(1,K), ordinal(1,K),

* double subtypes
  - probability(0,1), correlation(-1,1)
  - positive(0,inf), negative(-inf,0)

* matrix slice and assign
  - index m
  - range-indexes :, a:, :b, a:b
  - e.g., matrix(M,N) m; vector(M) v;  m[,1] = v;

* conditional expressions
  - binary operators on int or double
    + bool op(int,int);
    + bool op(double,double); // allows promotion
    + usual comparisons:  ==,  !=,  <=,  <,  >=,  >
  - operators on conditions
    (!cond),  (cond && cond),  (cond || cond),  (cond)

* conditionals
  - conditional expression
  - ternary op: cond ? x : y
  - if (cond) statement else if (cond) statement ... else statement;
  - while (cond) statement;
  - repeat statement until cond;
  - for (statement; condition; statement) statement;

* for-each
  - foreach (variable : expression) statement;
  
* multi-returns and array literals
  - (a,b,c) = foo();  // function returns array
  - (m,n,p) = (1,2,3);   // (1,2,3) is array literal

* multiple declarations
  - e.g., double a, b, c[N];

* declare and assign
  - e.g., double x = 5.0;
    double b[5];
    double a[5] <- b; // odd, eh?
    simplex(5)[2] c <- ... // even odder

* extensions
  - declare file paths to #include
  - declare namespace usings
  - declare function sigs for parser

* ragged arrays
  - ready to use with vec<...<vec<x>...> pattern
  - not clear how to declare (follow C?)
  -- example on p. 8 of 
     http://www.jstatsoft.org/v36/c01/paper/
     involving ragged param arays for graded response model

* subroutines
  - user-defined functions in StanGM
  - user-defined densities (how much to require?)

* compound op-assigns in StanGM
  - +=, *=, etc.

* check each parameter shows up on LHS of ~
  - true check is run-time based on data, etc.

* use normal(y|mu,sigma) as shorthand for 
  normal_log(y,mu,sigma).



Stanc Compiler
----------------------------------------------------------------------

* generate templated log_prob function so that you don't
  need to evaluate auto-dif when just evaluating function

* Stan safe mode to compile BUGS/JAGS-compliant models
  - allows more error checking and optimization

* More precompilation into object files.
  -- Pros:  faster compiles
  -- Cons:  it makes the code more difficult to manage
            and less flexible by requiring redundancy 
            and preinstation of templates
* More targeted imports.  
  -- Pros:  faster compiles because less code to read
  -- Cons:  may defeat precompiled headers


Command-line
----------------------------------------------------------------------

* need way to allow stanc to turn on STAN_NO_DEBUG

* write analyze command to read in output and do analysis a la print in RStan

* allow multiple chains to be run from single command
  -- allow this to optionally call analyze on results


* allow Stan to read data in multiple formats (particulary CSV)
  -- and allow multiple files to be specified for single run

* allow configuration of target reject rate in step-size adaptation

* control significant digits in output through command line

* restart from where command left off a la R2jags
  - this will require writing dump output or a converter
    for the CSV output to dump output
  - easiest to save unconstrained params
  - also need to save adaptation states, etc.
  - need to save seed and type of randomizer

* auto convergence monitoring -- just keep doubling (or other ratio)
  - doubling reduces total calc to at most double the end
  - easy to monitor averages and variances online using accumulators

* add chain ID to output as first column

* ? add sample ID within chain as output 
  - should be reconstructible

* track rejection rate 
  - coda just uses equality of samples, which is probably good enough
    for us, too

* multi-thread or at least multi-process
  - need to merge results back together


C++ API
----------------------------------------------------------------------

* dump format handle:
  - integers with scientific notation
  - NaN, Inf, -Inf  (written just like that)
  - length one arrays read in like scalars

* turn off stan::math::check_range() in math/matrix.hpp
  if STAN_NO_DEBUG or NDEBUG flag is present

* memcpy faster than copy ctor and operator= for var?

* remove dependence of prob functions like normal_log on agrad, 
  Eigen matrices, etc.
  -- does this even matter any more?

* better way of declaring polymorphic vectorized functions
  -- { Eigen vector, Eigen row vector, Eigen matrix, std vector}^N
  -- ideally just with normal_log(reals,reals,reals)
  -- push through manual, too

* unfold logistic and linear regression auto-difs

* allow user-defined extensions

* add scalar subtraction to match scalar addition of vectors
  (to distribute over values)

* make sure still possible to use header-only ad

* as much partial eval for gradients as possible
  - matrices
  - probability functions

* cut out last derivative step in leapfrog if not nec.
  - only saves if very small number of leapfrog steps

* threading
  - rewrite memory arena as thread-local object
  - merging and monitoring several threads/processes

* could multi-thread agrad by copying all code; use var<1>, var<2>,
  etc., with var typedefed to var<1>.  Each would then have its
  own memory pool to read from.  How to specify by client?  Could
  generate templated code

* figure out way to have vari elements that are NOT on the stack
  - useful for numeric_limits<var>
  - stack_vari ==> vari ==> chainable
    local_vari ==^
    (stack_vari has operator new; local_vari uses dtor)

* convert memory for vari to singleton & allow thread locality
  for thread safety

* make sure std::less<T*> and std::swap<T*> 
 does right thing for our types (like var and vari), as
 these are used in "algorithms".

* use our arena-based alloc as STL allocator for auto-dif
  - already doing this for dot products
  - http://www.cplusplus.com/reference/std/memory/allocator/
  - what do we need to do for dtor to work?

* Watanabe's WAIC: Widely applicable information criterion

* Ben's implicit function idea -- iterative solver,
 analytic derivative

* refactor finite diff derivative calcs to functor
      template <class F, typename S>
      bool finite_diffs(const vector<S>& x,
                        const F& f,
                        vector<S>& df_dx);

* Speed test pass-by-const-ref and pass-by-val for stan::agrad::var

* Convergence monitoring of higher moments



Special Functions, Probability Functions and Distributions
----------------------------------------------------------------------
* break out body of error handling to get isnan if/else inlined

* handle covariance or correlation matrix of an AR1 process. 
If rho is the auto-regressive parameter, the entries of the 
correlation matrix are powers of rho. And the precision matrix 
is tridiagonal and known analytically. So, 
it would be inefficient to plow that through 
multivariate_normal_log().

* beta distro with mean plus "precision" params
  - beta_mean_log(theta|phi,kappa)
    = beta_log(theta|phi*kappa, (1-phi)*kappa)
  - Jacobian from:
        alpha <- phi * k
        beta  <- (1 - phi) * k
    determinant = k  (det [[a,b],[c,d]] = ad - bc)

* pow special cases
  - inv(x) = pow(x,-1) = 1/x
  - inv_square(x) = pow(x,-2)
  - inv_sqrt(x) = pow(x,-1/2)

* signum function
  int signum(real); 

* signed sqrt
  signed_sqrt(x) = -sqrt(-x) if x < 0
                 = sqrt(x)   if x > 0

* log logits
  - log_logit(u) = log(logit(u)) 
                 = -log(1 + exp(-u))
                 = -log1p_exp(-u)
  - log_1mlogit(u) = log(1 - logit(u))
                   = log(logit(-u))
                   = -log1p_exp(u)

* lower triangular times self-transposed

* add general mixture model
  - modeling language side
    simplex(K) lambda;
    vector(K) mu;
    vector(K) sigma;
    y ~ mixture(lambda,     // mixing
                (mu,sigma), // param vecs
                "normal");  // distro
 - C++ impl
    std::vector<double> log_ps;
    for (k in 1:K)
      log_ps.push_back(log lambda[k] + normal_log(y,mu[k],sigma[k]));
    return log_sum_exp(log_ps);

* generalize vector ops to conforming matrices 
  -- I (Bob) don't like this as it confuses return types

* alternative distribution parameterizations
  - e.g., precision for (multi) normal
  - inverse_sqrt_gamma(...)

* random generation functions
  - rnorm(),...

* special functions
  - inv(x) = 1/x
  - inv_square(x) = 1/(x * x)
  - dot_self(vector v);  dot_self(row_vector v);  
  - log_sum_exp_ratio(vector<double> log_ratios, 
                      vector<double> log_probs);
  - rank(v,s): number of components of v less than v[s]
  - ranked(v,s): s-th smallest component of v
  - interp_ln(e,v1,v2) from BUGS (whatever for?)

* dot_self(x) == x' * x operation

* wrap_multiply(u,v) = u' * v * u

* for square matrices,
  d/dx det(x' * A * x) = 2 det(x' * A * x) inv(x)'
  d/dx det(A) = det(A) inv(A)'

* More matrix functions from Eigen:
  - singular vectors in addition to values
    + nice to have (U,S,V) = svd(A) syntax for this
  - norm()
  - squaredNorm()
  - lpNorm<1>
  - lpNorm<Eigen::Infinity>

* z-score function
  - normalize sample to zero sample mean, unit sample variance

* cumulative distribution functions
  - to add:
     + calculation of cdf
     + two functions: 1) with templated policy, 2) with no policy,
        calling templated function with stan::math::default_policy()
     + TODO: test derivatives?
     + dox: add doxygen.
     + add to Stan Modeling Language Reference Manual function list
  - uni/cont/inv_chi_square
  - uni/cont/inv_gamma
  - uni/cont/logisitic
  - uni/cont/pareto
  - uni/cont/scaled_inv_chi_square
  - uni/cont/student_t
  - uni/cont/uniform
  - uni/disc/bernoulli
  - uni/disc/beta_binomial
  - uni/disc/binomial
  - uni/disc/hypergeometric
  - uni/disc/neg_binomial
  - uni/disc/ordered_logisitic
  - uni/disc/poisson
  - multi/cont/dirichlet
  - multi/cont/inv_wishart
  - multi/cont/lkj_corr
  - multi/cont/lkj_cov
  - multi/cont/multi_normal
  - multi/cont/multi_student_t
  - multi/cont/wishart
  - multi/disc/categorical
  - multi/disc/multinomial


Error Handling
----------------------------------------------------------------------

* some kind of error handling for std lib functions
  like log(double).  


Testing
----------------------------------------------------------------------

* build unit tests from the BUGS models that just check
  the gradients are right (this'll also cover parsing and
  being able to fire up the models)

* auto testing of sampler
  - Cook/Gelman/Rubin-style interval tests
  - needs effective sample size calc to bound intervals
  - needs simulated parameters to test estimators
  - push ESS "measurement error" through the interval tests

* more matrix tests (for all ops w. gradients)

* benchmarks
  - agrad vs. RAD/Sacado, CppAD, ... ?
  - stan vs. MCMCpack, R, BUGS, JAGS, OpenBUGS, ...

* profile running models w. various operations

* complete math robustness and limits review
  - want domain boundaries to be right (usually
    -inf, 0 and inf, etc.)

* more diagnostics during warmup of where epsilon is going

* speed measurement
  - to converge
  - time per effective sample once converged
  - time to 4 chains at 25 ESS/chain, rHat < 1.1 for all
    parameters theta[k] and theta[k]*theta[k]
  - discarding first half of samples too many if fast to converge




Models
----------------------------------------------------------------------

* unit K-hyperball complement for stationary AR(K) parameters
  - what should the prior be?

* user model library
  - something we don't have to get too involved vetting
  - ideally things hard to do with other systems
  - or illustrate the unusual parts of our modeling language

* add base class for models
 

New Stuff
----------------------------------------------------------------------

* override operator*, etc. in Eigen for efficiency?
  - ideally push expressions for gradients through along with vals

* "non-parametric" models
  - approximate a la BUGS vol 3
  - merge reversible jump and HMC somehow?

* built-ins to facilitate cross-validation
  - especially on hierarchical models

* allow references in Stan language to make it easy to
  set eigenvalues + eigenvectors in one call

* tuple types for multiple returns (U,S,V) <- svd(A)
  -- different than array literals as may have different typed components

* introduce transforms into the Stan language that have the same
  effect as the declarations, e.g.
    parameters {  real x; }
    transformed_parameters { real(0,1) y; y <- lub(y,0,1,lp__); }

* use plain old optimization to get close to a mode

* forward mode auto-dif
  - last revision 548: 
  - src/stan/agrad/ad.hpp

* optimization
  - functor-based concept design with templates 
  - easily pluggable with log prob in model for MAP 

* compiler for R's linear model notation

* interface to Python

* interface to MATLAB

* Riemannian manifold sampling
 - Girolami's method with auto-diff Hessians

* add Gibbs sampling

* add adaptive slice sampling
  -- will it fit into adaptive_sampler in mcmc?
  -- goal to minimize number of function evals
     ++ step out: linear, with log step-in
     ++ doubling: exp, log step-in (but reject & more evals)

* adapt low-order approximation to an actual mass matrix

* add ensemble/swarm sampler (affine invariant)
  -- how to calculate ESS given correlations?
  -- look up proper detailed balance or convergence proof
  -- use slice or their distro

* multiple imputation
  - when to run it?
  - how to model it?

* mapped distributions
  - foo(x|theta) ==> foo(xs|theta)
  - C style: x*, int size
  - iterator concept: (x.start(),x.end())
  - array concept: (x[n], x.size())

* monitor/sampler interface
  - sampler, filter, aggregator, buffer
  - online means and variances
  - rHat, and split rHat
  - run in second process as we go?

* whole new auto-dif.  stack consists of entries per expression
  containing
  - adjoint for expression
  - for each argument
    + partial 
    + ptr to subexpression  
  - less space for constants, equal for unary, more for binary+
  - no virtual ops

* code gen auto-dif
  - just for Stan GM models

* lazy init for var
  - if vari* is accessed and empty, return new var(0) equiv.
  - can't afford test to see if it's defined

* elim stan/maths/matrix lib by adding implicit ctor
  for Matrix<var,...> based on Matrix<double,...>

* some notion of momentum for discrete variables a la HMC
  for continuous variables (at least accelerate metropolis-hastings)
  some densities generalize 

* blocking updates on parameters
  - only update subset of parameters
  - specify blocks in program?
  - allow Gibbs on some variables
  - split HMC into blocks (aka batches)
  - in the limit do conditional with HMC!

* blocking updates on data
  - subsample data and update as before
  - works best if data is large and redundant

* stepwise debugger for Stan
  - print adaptation params
  - print gradients
  - just a mode on command to print more out per sample?
  - ideally link runtime errors back to lines of Stan code

* allow random generation in model
  - are derivatives just same as basic, so that
      y[n] <- random_normal(mu,sigma);
    gets same grad as:
      y[n] ~ normal(mu,sigma);
    ?






MODELS
======================================================================

* Online book with models based on BUGS:
  http://stat-athens.aueb.gr/~jbn/winbugs_book/

* deal with autocorrelation model params where require
  all roots of 1 - SUM_i rho[i]**i to be within unit
  complex circle

* show how to code undirected graphical models in Stan


BUGS Models
----------------------------------------------------------------------
All of the vol1 -- vol3 models are implemented except
for the following, categorized by issue.

* Sampler too slow
  - vol2/schools/schools.stan.0
  - vol3/fire/fire.stan.0

* Sampler hangs
  - vol3/funshapes/hsquare.stan.0
  - vol3/funshapes/ring.stan.0
  - vol3/funshapes/squaremc.stan.0

* Zero Poisson, 0 ~ Poisson(0 * parameters) hangs the following:
  Isn't this one fixed?
  - vol1/leuk/leuk.stan.0
  - vol1/leukfr/leukfr.stan.0

* Cyclic DAGs
  Can probably do these in Stan (not supported in JAGS).
  - vol2/ice/pineapple_ice.stan.0
  - vol2/ice/vanilla_ice.stan.0
  - vol3/jama/jama.stan.0

* Discrete parameters
  - vol2/asia/asia.stan.0
  - vol2/biopsies/biopsies.stan.0
  - vol2/stagnant/stagnant.stan.0
  - vol2/t_df/estdof2.stan.0

* Binary parameters
  Other versions of model working integrating out discretes.
  - vol2/cervix/cervix.stan.0 
  - vol2/hearts/hearts.stan.0

* Works, but data structure difficult to deal with
  - vol1/bones/bones.stan.0



MODELING LANGUAGE MANUAL
======================================================================

* include section for BUGS user migration
  -- Andrew's e-mail contents
  -- integer arithmetic gotcha

* drop Lucida console font section (?) or link Bigelow and Holmes

* step-by-step install
  - mac, linux, windows
  - stan, c++, make

* break out sampler chapter in doc to describe multiple samplers

* performance tips
  - compute once and re-use if possible
  - pull out constants from loops
  - pull out terms from looped probabilities

* add example calculating log likelihood for DIC
  - how to specify likelihood terms in model?

* add example on individual log likelihoods for WAIC
  - how to specify individual likelihood terms in model?

   generated quantities {
    for (i in 1:I) 
      for (j in 1:J) 
        ll_y[i,j] <- normal_log(y[i,j]...)

* C++ model class

* Index

* Description of reverse-mode auto-dif or too C++-like?



C++ API MANUAL
======================================================================

* Enough doc to use agrad w. matrices and probability functions
  - basic description of reverse-mode AD

* Goes along with cleaning up API doc and privatizing


Speed Comparisons (vs. JAGS)
======================================================================

* Run JAGS compiled with g++ -O3

* Look at models where Stan performs well
  e.g.  logistic or linear regression w. Cauchy and Normal prior
  - varying model shapes (step up exponentially)
    + number of data items
    + number of parameters
    + correlation of parameters (multi-variate Cauchy or Normal)
    + interaction of params (? need this with correlation ?)
    + conjugate vs. non-conjugate priors
      (well and misspecified -- more stats than speed issue)

* Single threaded

* Run 4 chains (each of same length) of each system (Stan and JAGS)
  - diffuse initializations (hand generate for each)
  - until convergence by split R-hat (all params under 1.05)
  - total (across all chains) ESS is > 100 (and < 150 or something?)
    (e.g., 25/25/25/25, or 15/35/20/30)
  - report time in effective samples/second for min ESS parameter

* Report multiple runs of each (10? 100?)
  - histograms/densities or mean/sd of multiple runs
  - how much will it vary by seed

* Throw away first half of each run?
  - to start, until split convergence/sample speeds

* Ideally, measure:
  - time to converge to equilibrium
  - ESS rate at equilibrium
back to top