Raw File
\title{Utility functions for selection and mutation in genetic algorithms}
  These functions implement specific forms of mutation and fitness
  that can be used in genetic algorithms for feature selection.
simpleMutate(allele, context)
selectionMutate(allele, context)
selectionFitness(arow, context)
    In the \code{simpleMutate} function, \code{allele} is a binary
    vector filled with 0's and 1's.  In the \code{selectionMutate}
    function, \code{allele} is an integer (which is silently ignored;
    see Details). 
    A vector of integer indices identifying the rows (features) to be
    selected from the \code{context$dataset} matrix.
    A list or data frame containing auxiliary information that is needed
    to resolve references from the mutation or fitness code.  In both 
    \code{selectionMutate} and \code{selectionFitness}, \code{context}
    must contain a \code{dataset} component that is either a matrix or a
    data frame.  In \code{selectionFitness}, the \code{context} must
    also include a grouping factor (with two levels) called \code{gps}.
  These functions represent 'callbacks'. They can be used in the
  function \code{\link{GenAlg}}, which creates objects. They will then
  be called repeatedly (for each individual in the population) each time
  the genetic algorithm is updated to the next generation.

  The \code{simpleMutate} function assumes that chromosomes are binary
  vectors, so alleles simply take on the value 0 or 1. A mutation of an
  allele, therefore, flips its state between those two possibilities.

  The \code{selectionMutate} and \code{selectionFitness} functions, by
  contrast, are specialized to perform feature selection assuming a
  fixed number K of features, with a goal of learning how to
  distinguish between two different groups of samples. We assume that
  the underlying data consists of a data frame (or matrix), with the
  rows representing features (such as genes) and the columns
  representing samples. In addition, there must be a grouping vector
  (or factor) that assigns all of the sample columns to one of two
  possible groups. These data are collected into a list,
  \code{context}, containing a \code{dataset} matrix and a \code{gps}
  factor. An individual member of the population of potential
  solutions is encoded as a length K vector of indices into the rows
  of the \code{dataset}. An individual \code{allele}, therefore, is a
  single index identifying a row of the \code{dataset}. When mutating
  it, we assume that it can be changed into any other possible allele;
  i.e., any other row number. To compute the fitness, we use the
  Mahalanobis distance between the centers of the two groups defined by
  the \code{gps} factor.
  Both \code{selectionMutate} and \code{simpleMutate} return an integer
  value; in the simpler case, the value is guaranteed to be a 0 or 1.
  The \code{selectionFitness} function returns a real number.
  Kevin R. Coombes \email{krc@silicovore.com},
  P. Roebuck \email{proebuck@mdanderson.org}
# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)

# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
  y[i,] <- sample(1:nrow(fakeData), n.features)

# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)

# advance one generation
my.ga <- newGeneration(my.ga)


back to top