genalg-tools.Rd
\name{GenAlg-tools}
\alias{GenAlg-tools}
\alias{simpleMutate}
\alias{selectionFitness}
\alias{selectionMutate}
\title{Utility functions for selection and mutation in genetic algorithms}
\description{
These functions implement specific forms of mutation and fitness
that can be used in genetic algorithms for feature selection.
}
\usage{
simpleMutate(allele, context)
selectionMutate(allele, context)
selectionFitness(arow, context)
}
\arguments{
\item{allele}{
In the \code{simpleMutate} function, \code{allele} is a binary
vector filled with 0's and 1's. In the \code{selectionMutate}
function, \code{allele} is an integer (which is silently ignored;
see Details).
}
\item{arow}{
A vector of integer indices identifying the rows (features) to be
selected from the \code{context$dataset} matrix.
}
\item{context}{
A list or data frame containing auxiliary information that is needed
to resolve references from the mutation or fitness code. In both
\code{selectionMutate} and \code{selectionFitness}, \code{context}
must contain a \code{dataset} component that is either a matrix or a
data frame. In \code{selectionFitness}, the \code{context} must
also include a grouping factor (with two levels) called \code{gps}.
}
}
\details{
These functions represent 'callbacks'. They can be used in the
function \code{\link{GenAlg}}, which creates objects. They will then
be called repeatedly (for each individual in the population) each time
the genetic algorithm is updated to the next generation.
The \code{simpleMutate} function assumes that chromosomes are binary
vectors, so alleles simply take on the value 0 or 1. A mutation of an
allele, therefore, flips its state between those two possibilities.
The \code{selectionMutate} and \code{selectionFitness} functions, by
contrast, are specialized to perform feature selection assuming a
fixed number K of features, with a goal of learning how to
distinguish between two different groups of samples. We assume that
the underlying data consists of a data frame (or matrix), with the
rows representing features (such as genes) and the columns
representing samples. In addition, there must be a grouping vector
(or factor) that assigns all of the sample columns to one of two
possible groups. These data are collected into a list,
\code{context}, containing a \code{dataset} matrix and a \code{gps}
factor. An individual member of the population of potential
solutions is encoded as a length K vector of indices into the rows
of the \code{dataset}. An individual \code{allele}, therefore, is a
single index identifying a row of the \code{dataset}. When mutating
it, we assume that it can be changed into any other possible allele;
i.e., any other row number. To compute the fitness, we use the
Mahalanobis distance between the centers of the two groups defined by
the \code{gps} factor.
}
\value{
Both \code{selectionMutate} and \code{simpleMutate} return an integer
value; in the simpler case, the value is guaranteed to be a 0 or 1.
The \code{selectionFitness} function returns a real number.
}
\author{
Kevin R. Coombes \email{krc@silicovore.com},
P. Roebuck \email{proebuck@mdanderson.org}
}
\seealso{
\code{\link{GenAlg}},
\code{\link{GenAlg-class}},
\code{\link{maha}}.
}
\examples{
# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)
# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
y[i,] <- sample(1:nrow(fakeData), n.features)
}
# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)
# advance one generation
my.ga <- newGeneration(my.ga)
}
\keyword{optimize}