Raw File
% $Id: sic2004.Rd,v 1.11 2006-02-10 19:03:27 edzer Exp $
\name{sic2004}
\alias{sic2004}
\alias{sic.train}
\alias{sic.pred}
\alias{sic.grid}
\alias{sic.test}
\alias{sic.val}
\title{ Spatial Interpolation Comparison 2004 data set: Natural Ambient Radioactivity }
\description{
The text below is copied from
\url{http://www.ai-geostats.org/events/sic2004/index.htm}, subsection
Data.

The variable used in the SIC 2004 exercise is natural ambient
radioactivity measured in Germany. The data, provided kindly by the
German Federal Office for Radiation Protection (BfS), are gamma dose rates
reported by means of the national automatic monitoring network (IMIS).

In the frame of SIC2004,  a rectangular area was used to select 1008
monitoring stations (from a total of around 2000 stations). For these
1008 stations, 11 days of measurements have been randomly selected during
the last 12 months and the average daily dose rates calculated for each
day. Hence, we ended up having 11 data sets.

Prior information (sic.train): 10 data sets of 200 points that are
identical for what concerns the locations of the monitoring stations
have been prepared. These locations have been randomly selected (see
Figure 1). These data sets differ only by their Z values since each set
corresponds to 1 day of measurement made during the last 14 months. No
information will be provided on the date of measurement.  These 10 data
sets (10 days of measurements) can be used as prior information to tune
the parameters of the mapping algorithms. No other information will be
provided about these sets. Participants are free of course to gather
more information about the variable in the literature and so on.

The 200 monitoring stations above were randomly taken from a larger set
of 1008 stations. The remaining 808 monitoring stations have a topology
given in sic.pred.  Participants to SIC2004 will have to estimate the
values of the variable taken at these 808 locations.  

The SIC2004 data (sic.val, variable dayx):
The exercise consists in using 200 measurements made on a 11th day (THE
data of the exercise) to estimate the values observed at the remaining
808 locations (hence the question marks as symbols in the maps shown
in Figure 3). These measurements will be provided only during two weeks
(15th of September until 1st of October 2004) on a web page restricted
to the participants. The true values observed at these 808 locations
will be released only at the end of the exercise to allow participants
to write their manuscripts (sic.test, variables dayx and joker).

In addition, a joker data set was released (sic.val, variable joker),
which contains an anomaly. The anomaly was generated by a simulation
model, and does not represent measured levels.

}

\format{
  The data frames contain the following columns:
  \describe{
   \item{record}{this integer value is the number (unique value) of
   the monitoring station chosen by us.}
   \item{x}{X-coordinate of the monitoring station indicated in meters}
   \item{y}{Y-coordinate of the monitoring station indicated in meters}
   \item{day01}{mean gamma dose rate measured during 24 hours, at day01. Units
are nanoSieverts/hour}
   \item{day02}{same, for day 02}
   \item{day03}{...}
   \item{day04}{...}
   \item{day05}{...}
   \item{day06}{...}
   \item{day07}{...}
   \item{day08}{...}
   \item{day09}{...}
   \item{day10}{...}
   \item{dayx}{ the data observed at the 11-th day}
   \item{joker}{ the joker data set, containing an anomaly not present
   in the training data}
  }
}
\note{
the data set sic.grid provides a set of points on a regular grid (almost
10000 points) covering the area; this is convenient for interpolation;
see the function \code{makegrid} in package sp.

The coordinates have been projected around a point located in the
South West of Germany. Hence, a few coordinates have negative values as
can be guessed from the Figures below.
}

\usage{
data(sic2004) # 
}
\author{ 
Data: the German Federal Office for Radiation Protection (BfS),
\url{http://www.bfs.de/}, data provided by Gregoire Dubois, R compilation
by Edzer J. Pebesma.  }

\references{ 
\url{http:/www.ai-geostats.org/},
\url{http://www.ai-geostats.org/resources/sic2004_data.htm},
\url{http://www.ai-geostats.org/events/sic2004/index.htm}
}

\keyword{datasets}
\examples{
data(sic2004) 
# FIGURE 1. Locations of the 200 monitoring stations for the 11 data sets. 
# The values taken by the variable are known.
plot(y~x,sic.train,pch=1,col="red", asp=1)

# FIGURE 2. Locations of the 808 remaining monitoring stations at which 
# the values of the variable must be estimated.
plot(y~x,sic.pred,pch="?", asp=1, cex=.8) # Figure 2

# FIGURE 3. Locations of the 1008 monitoring stations (exhaustive data sets). 
# Red circles are used to estimate values located at the questions marks
plot(y~x,sic.train,pch=1,col="red", asp=1)
points(y~x, sic.pred, pch="?", cex=.8)

}
back to top