genalg.Rd
\name{GenAlg}
\alias{GenAlg}
\alias{newGeneration}
\alias{popDiversity}
\title{
A generic Genetic Algorithm for feature selection
}
\description{
These functions allow you to initialize (\code{GenAlg}) and iterate
(\code{newGeneration}) a genetic algorithm to perform feature
selection for binary class prediction in the context of gene
expression microarrays or other high-throughput technologies.
}
\usage{
GenAlg(data, fitfun, mutfun, context, pm=0.001, pc=0.5, gen=1)
newGeneration(ga)
popDiversity(ga)
}
\arguments{
\item{data}{
The initial population of potential solutions, in the form of a data
matrix with one individual per row.}
\item{fitfun}{
A function to compute the fitness of an individual solution. Must take
two input arguments: a vector of indices into rows of the population
matrix, and a \code{context} list within which any other items required
by the function can be resolved. Must return a real number; higher values
indicate better fitness, with the maximum fitness occurring at the optimal
solution to the underlying numerical problem.}
\item{mutfun}{
A function to mutate individual alleles in the population. Must take two
arguments: the starting allele and a \code{context} list as in the
fitness function.}
\item{context}{
A list of additional data required to perform mutation or to compute
fitness. This list is passed along as the second argument when
\code{fitfun} and \code{mutfun} are called.}
\item{pm}{
A real value between \code{0} and \code{1}, representing the probability
that an individual allele will be mutated.}
\item{pc}{
A real value between \code{0} and \code{1}, representing the probability
that crossover will occur during reproduction.}
\item{gen}{
An integer identifying the current generation.}
\item{ga}{
An object of class \code{GenAlg}}
}
\value{
Both the \code{GenAlg} generator and the \code{newGeneration} functions
return a \code{\link{GenAlg-class}} object. The \code{popDiversity} function
returns a real number representing the average diversity of the population.
Here diversity is defined by the number of alleles (selected features) that
differ in two individuals.
}
\author{
Kevin R. Coombes \email{krc@silicovore.com},
P. Roebuck \email{proebuck@mdanderson.org}
}
\seealso{
}
\examples{
# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)

# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
y[i,] <- sample(1:nrow(fakeData), n.features)
}

# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)