F-Statistics

Authors

Anders Gonçalves da Silva, Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, USA.

Introduction

In evolution, we are interested in the statistical outcome over several generations of the action of evolutionary forces on allelic and genotypic frequencies (Wright 1931). The four evolutionary forces act on genetic variation by changing the frequency of alleles in a population, and the distribution of alleles among populations. For example, genetic drift randomly removes alleles from populations, and therefore changes the frequency of a neutral allele at random, and promotes differentiation in allele frequency among populations. On the other hand, migration among populations while also promoting change in allele frequency within populations, it promotes genetic uniformity in allele frequency among population. The impact of each force on a population will depend on demography, natural history, and the level of connectivity among populations of the species.

In essence, evolution acts by re-distributing genetic variation among different population partitions, or what is called population genetic structure. For example, for a diploid species the simplest hierarchical organization is (1) alleles within individuals, (2) alleles among individuals within a population, and (3) alleles among populations, from the least to most inclusive. Although this is the simplest, the hierarchical structure can be changed to better reflect a species’ biology and geographic distribution by adding more levels. For instance, a species with social groups a level corresponding to the familial level could be added. In the case of a species with a large geographic distribution a level corresponding to regional distribution can de added.

Each force acts in a different fashion producing different expectations as to the distribution of genetic variation in each hierarchical level. Therefore, by analyzing the pattern of distribution of genetic variation one can infer which evolutionary forces are acting on the observed set of populations. The partitioning of genetic variation lends itself to an analysis of variance approach, in which we are asking ourselves how much of the total genetic variation within a sample of random populations is found at each hierarchical level (Weir 1996).

Here, we implement the calculations of f, θ, and F according to Weir (1996; Chapter 5, 176–179pp). These refer to the relative amounts of genetic variation within individuals, among individuals within populations, and among populations, respectively. Biologically, f estimates the amount of inbreeding within populations, θ measures the amount of differentiation among populations, and F measures overall inbreeding in the set of observed populations.

We are assuming that the species is diploid, and that the sample is a random sample of populations. We are also assuming that all populations were, at one point in time, a single reference population. This implies that all sampled individuals have a common ancestor at some point in time. This allows us to assume that means and variances refer to repeated samples from within and across populations (Weir 1996).

Estimating each index requires estimating each variance component, which are functions of the mean squares MSG, MSI, and MSP, for alleles, individuals and populations, respectively:

σ2G = MSG
σ2I = 1 / 2(MSI – MSG)
σ2P = 1 / 2nc(MSP – MSI)

Where nc is a weighted average of the number of individuals in each population. Each index is then estimated using the following equations:

F = (σ2P + σ2I) /2P + σ2I + σ2G)
θ = σ2P /2P + σ2I + σ2G)
f = (F – θ) / (1 – θ)

The calculations thus far have been presented for one allele at one locus. In most cases, however, data from multiple loci, and multiple alleles per locus, will be available. Estimates over all loci are obtained by separetely combining numerators and denominators (Weir 1996). Therefore, over u = 1, 2, …, U alleles at locus l (and dropping l and u subscripts in the σ terms for simplicity):

Fl = Σu2P + σ2I) / Σu2P + σ2I + σ2G)
θl = Σu2P) / Σu2P + σ2I + σ2G)
fl = (Fl – θl) / (1 – θl)

And, over all loci:

F = Σl Σu2P + σ2I) / Σl Σu2P + σ2I + σ2G)
θ = Σl Σu2P) / Σl Σu2P + σ2I + σ2G)
f = (F – θ) / (1 – θ)

References

Weir, B. S. (1996) Genetic Data Analysis II: Methods for discrete population genetic data. Sinauer, MA.

Wright, S. (1931) Evolution in Mendelian populations. Genetics 16: 97–159.

Go to the F-statistics for Population Genetics Eco-Tool