The gametic entropy of a population 0000-0002-5014-4809 Ellerman E. Castedo castedo@castedo.com 27 1 2022 © 2022, Ellerman et al 2022 Ellerman et al https://creativecommons.org/licenses/by/4.0/ This document is distributed under a Creative Commons Attribution 4.0 International license.

This discussion document defines the gametic entropy of a population. It is a precise interpretation of a phenomenon occurring at the intersection of population genetics and information theory. This interpretation challenges minor statements in two journal articles. Gametic entropy is the application of Shannon entropy to the genetic information carried by gametes in the propagation of a population. This document serves as a reference to facilitate discussion. The practical utility of gametic entropy is not covered in this document. Understanding of Shannon entropy and genetics is required to understand the entire document.

Background

An early application of Shannon entropy 1 in population genetics is the highly cited article "The Apportionment of Human Diversity" by Lewontin in 1972 2. Shannon entropy is a measure of information, expressed in units of bits (base 2 logarithm of probability). The idea of genetic information being stored and quantified in bits appears in the context of evolutionary genetics as early as 1961 3.

Eight years after Lewontin's article, BDH Latter wrote:

"The Shannon information measure used by Lewontin (1972) ... is extremely difficult to interpret genetically." 4

This claim is about possible interpretations, in the context of genetics, of Shannon entropy. In contrast to the context of genetics, Shannon entropy was established in the context of communication systems.

A mirror claim can be made about possible interpretations of genetic information in the context of communication systems. Indeed, CT Bergstrom & M Rosvall have claimed that genetic information lacks an obvious interpretation in the context of communication theory:

"Geneticists are not so fortunate. For them, the analogy to communication theory is less obvious. Efforts to make this analogy explicit seem forced at best, ..." 5

Below we specify an interpretation and refer to it as the gametic entropy of a population. With an understanding of information theory, we propose this interpretation is both

an easy genetic interpretation of Shannon entropy, and

a natural example of communication of genetic information.

This interpretation challenges the statements in the two mentioned articles. It should be emphasized that the statements are peripheral and not core claims of their respective articles (which are excellent).

The interpretation

The gametic entropy of a population is:

The amount of information transmitted by a gamete in the propagation of a population.

In the specific application of the 1972 Lewontin article 2, the interpretation is:

The amount of information transmitted by a gamete, at a random locus, in the propagation of a population.

Before getting into the biology, we review the core concepts upon which Shannon entropy is based. We follow by observing where these concepts appear in the propagation of a population. Only sexually reproducing populations are discussed initially. Clonal populations will be discussed later as a degenerate special case.

Messages, senders, and receivers

Shannon entropy is the key measurement of information theory 1. Entropy measures the degree of uncertainty on what messages a sender communicates to a receiver. Entropy measures a real tangible minimum capacity required of any channel or storage used to communicate those messages. This minimum requirement applies to all channel and storage mechanisms, no matter the physical mechanism.

Roughly speaking, entropy is the best possible score in a game of 20 questions. That is the minimum number of yes/no questions that can be asked to determine a yet to be known message. One bit is the quantity of information gained (uncertainty reduced) by one yes/no question about two equally likely possibilities.

In population genetics, a key process of interest is the propagation of a population. In this process, what are the messages, senders and receivers?

Gametic messages

The propagation of a population requires new members to be born. For sexually reproducing species, a new birth requires genetic information to be passed down from a mother and a father in the population. Each half of this genetic information is stored and transmitted by a gamete. It is at the unit of a gamete that we clearly see a complete message, sent by each of the two parents to new offspring, the receiver.

For a next possible birth in a population, there is a probabilistic distribution of possible genetic messages carried by each gamete. The Shannon entropy of the distribution of possible messages is the gametic entropy of the population. One can roughly think of gametic entropy as the number of yes/no questions offspring need to ask per parent to find out what alleles they are to inherit: "Hey parent, do I get an Rh+ or an Rh- allele from you?"

When only looking at the entropy of autosomal genetic information, the distinction between maternal and paternal does not matter, but when considering the information in sex chromosomes, the gametic entropy can be specifically maternal or paternal. The unqualified gametic entropy is a per gamete entropy and thus the average of both maternal and paternal gametic entropies, since every birth requires one of each kind of gametic message.

Toy illustration

We consider a toy example of a sexually reproducing unicorn population with biallelic chromosomes. Imagine a population of unicorns with 10 autosomal chromosomes of which the entire lengths of each chromosome consists of only one of two equally frequent haplotypes. In contrast, the X and Y chromosomes of these unicorns are completely fixed with only one X haplotype and only one Y haplotype in the population. The gametic entropy across the 10 autosomal chromosomes is exactly 10 bits. The maternal gametic entropy for the sex chromosome is zero because there is no uncertainty about which sex chromosome (or its haplotype) is transmitted. The total maternal gametic entropy is thus 10 bits. In contrast, the paternal gametic entropy for the sex chromosome is 1 bit since there is an equal chance of an X or a Y chromosome being communicated (and no additional uncertainty regarding each sex chromosome's respective haplotype). Thus the total paternal gametic entropy is 11 bits. The overall gametic entropy of this unicorn population is 10.5 bits, the average of the parent-specific entropies.

Decoupling medium of transmission

One of the key insights from information (communication) theory is that fundamental properties of information, such as entropy, exist regardless of the forms of storage or transmission.

At current levels of technology, DNA is the only medium of transmission of interest for the genetic information carried by gametes. Nonetheless, a thought experiment of hypothetical but almost plausible transmission mediums helps elucidate the independence of the "pure" information transmitted from the medium of transmission.

A thought experiment

Reflecting on DNA sequencing technology, maternal spindle transfer in "three-parent" IVF 6 7 and artificial synthesis of an entire (bacterial) genome 8, it is not hard to imagine a possibility of completely dematerialized transmission of the genetic information carried by gametes. This transmission can be over any of the channels described by information theory, including telegraph schemes.

We now can propose an answer to the following question posed by CT Bergstrom & M Rosvall:

"Is there a clean mapping from informational processes in biology onto the telegraph schema?" 9

The practical utility of such telegraphy, if any, is hard to imagine today. But as an entertaining thought experiment in science fiction, we can image the utility of interplanetary transmission of the genetic information in donor gametes for IVF. Rather than physically transporting gametes between planets, which could take months to years, the pure data can be transmitted in minutes or hours. All of the facts about transmission rates in telegraphy apply in a hypothetical interplanetary IVF system. For a given population of equally likely possible donors, the maternal (paternal) gametic entropy, is the best possible rate, measures in bits, for transmitting the genetic information of a egg (sperm) donor.

Degenerate case of clonal populations

In the case of asexually reproducing clonal populations, there is no distinction between the genetic information in a gamete vs a parent. The "gametic message" is simply the entire genome of a parent, which is sent in its entirety to its single-parent offspring. The gametic entropy of a clonal population is the same as the entropy of the distribution of distinct genomes in the population.

Concluding questions

This document challenges claims in two previous journal articles regarding two respective questions:

Is the gametic entropy of a population an extremely difficult genetic interpretation of Shannon entropy?

Is the gametic entropy of a population a forced analogy in communcation theory for geneticists?

Acknowledgements

ECE thanks Steven Orzack and John Novembre for fruitful discussions about the Lewontin 1972 paper.

References Shannon Claude Elwood Weaver Warren The mathematical theory of communication Univ. of Illinois Press Urbana 1998 978-0-252-72548-7 978-0-252-72546-3 Lewontin R. C. The Apportionment of Human Diversity Evolutionary Biology Dobzhansky Theodosius Hecht Max K. Steere William C. Springer US New York, NY 1972 2021 05 19 978-1-4684-9065-7 978-1-4684-9063-3 http://link.springer.com/10.1007/978-1-4684-9063-3_14 10.1007/978-1-4684-9063-3_14 381 398 Bergstrom Carl T. Rosvall Martin The transmission sense of information Biology & Philosophy 2011 03 2021 12 08 26 2 0169-3867, 1572-8404 http://link.springer.com/10.1007/s10539-009-9180-z 10.1007/s10539-009-9180-z 159 176 Bergstrom Carl T. Rosvall Martin Response to commentaries on “The Transmission Sense of Information” Biology & Philosophy 2011 03 2021 12 08 26 2 0169-3867, 1572-8404 http://link.springer.com/10.1007/s10539-011-9257-3 10.1007/s10539-011-9257-3 195 200 Kimura Motoo Natural selection as the process of accumulating genetic information in adaptive evolution Genetical Research 1961 02 2021 12 08 2 1 0016-6723, 1469-5073 https://www.cambridge.org/core/product/identifier/S0016672300000616/type/journal_article 10.1017/S0016672300000616 127 140 Latter B. D. H. Genetic Differences Within and Between Populations of the Major Human Subgroups The American Naturalist 1980 08 2021 12 08 116 2 0003-0147, 1537-5323 https://www.journals.uchicago.edu/doi/10.1086/283624 10.1086/283624 220 237 Amato Paula Tachibana Masahito Sparman Michelle Mitalipov Shoukhrat Three-parent in vitro fertilization: Gene replacement for the prevention of inherited mitochondrial diseases Fertility and Sterility 2014 01 2021 12 11 101 1 00150282 https://linkinghub.elsevier.com/retrieve/pii/S0015028213032901 10.1016/j.fertnstert.2013.11.030 31 35 Zhang John Liu Hui Luo Shiyu Lu Zhuo Chávez-Badiola Alejandro Liu Zitao Yang Mingxue Merhi Zaher Silber Sherman J. Munné Santiago Konstantinidis Michalis Wells Dagan Tang Jian J Huang Taosheng Live birth derived from oocyte spindle transfer to prevent mitochondrial disease Reproductive BioMedicine Online 2017 04 2021 12 11 34 4 14726483 https://linkinghub.elsevier.com/retrieve/pii/S147264831730041X 10.1016/j.rbmo.2017.01.013 361 368 Gibson Daniel G. Glass John I. Lartigue Carole Noskov Vladimir N. Chuang Ray-Yuan Algire Mikkel A. Benders Gwynedd A. Montague Michael G. Ma Li Moodie Monzia M. Merryman Chuck Vashee Sanjay Krishnakumar Radha Assad-Garcia Nacyra Andrews-Pfannkoch Cynthia Denisova Evgeniya A. Young Lei Qi Zhi-Qing Segall-Shapiro Thomas H. Calvey Christopher H. Parmar Prashanth P. Hutchison Clyde A. Smith Hamilton O. Venter J. Craig Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome Science 2010 07 2021 12 11 329 5987 0036-8075, 1095-9203 https://www.science.org/doi/10.1126/science.1190719 10.1126/science.1190719 52 56