This discussion document defines the gametic entropy of a population. It is a precise interpretation of a phenomenon occurring at the intersection of population genetics and information theory. This interpretation challenges minor statements in two journal articles. Gametic entropy is the application of Shannon entropy to the genetic information carried by gametes in the propagation of a population. This document serves as a reference to facilitate discussion. The practical utility of gametic entropy is not covered in this document. Understanding of Shannon entropy and genetics is required to understand the entire document.
An early application of Shannon entropy
Eight years after Lewontin's article, BDH Latter wrote:
"The Shannon information measure used by Lewontin (1972) ...
is extremely difficult to interpret genetically."
This claim is about possible interpretations, in the context of genetics, of Shannon entropy. In contrast to the context of genetics, Shannon entropy was established in the context of communication systems.
A mirror claim can be made about possible interpretations of genetic information in the context of communication systems. Indeed, CT Bergstrom & M Rosvall have claimed that genetic information lacks an obvious interpretation in the context of communication theory:
"Geneticists are not so fortunate. For them, the analogy to
communication theory is less obvious. Efforts to make this analogy
explicit seem forced at best, ..."
Below we specify an interpretation and refer to it as the
an easy genetic interpretation of Shannon entropy, and
a natural example of communication of genetic information.
This interpretation challenges the statements in the two mentioned articles. It should be emphasized that the statements are peripheral and not core claims of their respective articles (which are excellent).
The
The amount of information transmitted by a gamete in the propagation of a population.
In the specific application of the 1972 Lewontin article
The amount of information transmitted by a gamete,
Before getting into the biology, we review the core concepts upon which Shannon entropy is based. We follow by observing where these concepts appear in the propagation of a population. Only sexually reproducing populations are discussed initially. Clonal populations will be discussed later as a degenerate special case.
Shannon entropy is the key measurement of information theory
Roughly speaking, entropy is the best possible score in a game of
In population genetics, a key process of interest is the propagation of a population. In this process, what are the messages, senders and receivers?
The propagation of a population requires new members to be born. For sexually reproducing species, a new birth requires genetic information to be passed down from a mother and a father in the population. Each half of this genetic information is stored and transmitted by a gamete. It is at the unit of a gamete that we clearly see a complete message, sent by each of the two parents to new offspring, the receiver.
For a next possible birth in a population, there is a probabilistic distribution of possible genetic messages carried by each gamete. The Shannon entropy of the distribution of possible messages is the gametic entropy of the population. One can roughly think of gametic entropy as the number of yes/no questions offspring need to ask per parent to find out what alleles they are to inherit: "Hey parent, do I get an Rh+ or an Rh- allele from you?"
When only looking at the entropy of autosomal genetic information, the distinction between maternal and paternal does not matter, but when considering the information in sex chromosomes, the gametic entropy can be specifically maternal or paternal. The unqualified gametic entropy is a per gamete entropy and thus the average of both maternal and paternal gametic entropies, since every birth requires one of each kind of gametic message.
We consider a toy example of a sexually reproducing unicorn
population with biallelic chromosomes. Imagine a population of
unicorns with 10 autosomal chromosomes of which the entire lengths of
each chromosome consists of only one of two equally frequent
haplotypes. In contrast, the X and Y chromosomes of these unicorns are
completely fixed with only one X haplotype and only one Y haplotype in
the population. The gametic entropy across the 10 autosomal
chromosomes is exactly 10 bits. The maternal gametic entropy for the
sex chromosome is zero because there is no uncertainty about which sex
chromosome (or its haplotype) is transmitted. The total maternal
gametic entropy is thus 10 bits. In contrast, the paternal gametic
entropy for the sex chromosome is 1 bit since there is an equal chance
of an X or a Y chromosome being communicated (and no additional
uncertainty regarding each sex chromosome's respective haplotype).
Thus the total paternal gametic entropy is 11 bits. The overall
gametic entropy of this unicorn population is
One of the key insights from information (communication) theory is that fundamental properties of information, such as entropy, exist regardless of the forms of storage or transmission.
At current levels of technology, DNA is the only medium of transmission of interest for the genetic information carried by gametes. Nonetheless, a thought experiment of hypothetical but almost plausible transmission mediums helps elucidate the independence of the "pure" information transmitted from the medium of transmission.
Reflecting on DNA sequencing technology, maternal spindle
transfer in "three-parent" IVF
We now can propose an answer to the following question posed by CT Bergstrom & M Rosvall:
"Is there a clean mapping from informational processes in
biology onto the telegraph schema?"
The practical utility of such telegraphy, if any, is hard to imagine today. But as an entertaining thought experiment in science fiction, we can image the utility of interplanetary transmission of the genetic information in donor gametes for IVF. Rather than physically transporting gametes between planets, which could take months to years, the pure data can be transmitted in minutes or hours. All of the facts about transmission rates in telegraphy apply in a hypothetical interplanetary IVF system. For a given population of equally likely possible donors, the maternal (paternal) gametic entropy, is the best possible rate, measures in bits, for transmitting the genetic information of a egg (sperm) donor.
In the case of asexually reproducing clonal populations, there is no distinction between the genetic information in a gamete vs a parent. The "gametic message" is simply the entire genome of a parent, which is sent in its entirety to its single-parent offspring. The gametic entropy of a clonal population is the same as the entropy of the distribution of distinct genomes in the population.
This document challenges claims in two previous journal articles regarding two respective questions:
Is the gametic entropy of a population an extremely difficult genetic interpretation of Shannon entropy?
Is the gametic entropy of a population a forced analogy in communcation theory for geneticists?
ECE thanks Steven Orzack and John Novembre for fruitful discussions about the Lewontin 1972 paper.