Mankind Quarterly 21 (1980) 135-149
Inbreeding and genetic distance between hierarchically structured populations
measured by surname frequencies
Volkmar Weiss
Surnames
can be considered as alleles of one locus, and surname distribution and
evolution can be analyzed by the theory of neutral mutations in finite
populations (Yasuda et al. 1974). The analysis of human evolution is
complicated by the fact that in developed countries such populations have been
structured geographically and socially for many thousand years. Bunimovic
(1975) demonstrated, if only theoretically, that in a hierarchical population
natural selection is more efficient than in a non-subdivided one and the rate
of the evolutionary process higher. He defines: ‘A population is hierarchical,
if it can be divided into a certain amount of subpopulations in such a way that
the set may naturally break into classes (levels).’
One may
describe the genetic structure of a human population in terms of the inbreeding
within its subpopulations and the extent of the sharing of genes among them.
Thus the general pattern of the population structure may be represented by the
distribution of coefficients of genetic distance (of coefficients of kinship or
inbreeding) within each subpopulation, and thus between all possible pairs of
subpopulations, tracing their respective changes in historical time. Crow and
Mange (1965) explained how, under certain assumptions, inbreeding coefficients
can be estimated from the frequency of marriages between persons of the same
surname (isonymy).
By
suggesting and applying additional methods, in a series of papers (Weiss 1973,
1974, 1976, 1977) the author extended the application of ‘surname genetics’ to
measure genetic distance and historical changes. But all because all these
papers were written (and reprinted) in German, the publication of an up-to-date
English summary of these findings seems justified.
Methods
Crow and
Mange (1965) showed that the rate of isonymous marriage is indicative of the
same amount of inbreeding in the population regardless of the degree of
consanguinity of the marriage. Sibs have always the same name, whereas first
cousins have the same only if they are related through their fathers, or ¼ of
the time. The same principle holds for most of the common marriage relationships. Thus a
fraction I of isonymous marriages is indicative of a population inbreeding
coefficient
f = I /4 (1)
If we
assume that all isonymy is a true reflection of common ancestry, we can separate random from nonrandom contributions to the inbreeding coefficient f. If we neglect nonrandom contributions (very small in actual
populations) the formula simplifies to
f = Sum n p squared/ 4 (2)
where p
is the proportion of the population with a certain surname and n the number of
surnames.
Instead
of using marriage data, this simplified formula enables us to sample from
directories or census data. Consequently in pedigrees f can be estimated not
only by counting the actual number of isonymous marriages but better by simply
calculating the probability of isonymy within a given generation of ancestors
(Weiss 1973). A number of workers have used measures to compare different populations a and b using gene frequency data. But since surname frequencies are usually estimates made from the observed frequencies of the male population, it seems appropriate to compare genotype frequencies rather than gene frequencies of populations. In order to do this Hedrick (1971) proposed a measure Hab , called by him the probability of genotypic
identity
Hab = Sum n pja times pjb / ½ (Sum n pja squared + Sum n pjb squared) (3)
where pja
is the frequency of the jth surname in a population and n is the number of surnames.
Nei
(1972) proposed
Nab = Sum m pja times pjb / root (Sum m pja squared) times (Sum m pjb
squared) (4)
where pja
is the frequency of the jth allele in a population and m is the number of alleles.
Nei
(1972) defines the genetic distance for his model as - loge Nab . For Hab
the distance is the complement of the similarity value.
As
Hedrick (1975) could demonstrate, the differences between the two formulas are
very small indeed and the numerical results correlate about .97. Therefore both
formulas, (3) and (4), can be recommended, compare also Michod and Hamilton
(1980).
Because
Sum n pja squared = 4 fa , and Sum n pjb
squared = 4 fb , and F = 100f,
Hedrick’s
formula (3) actually reduces to
Hab = Sum n pja times pjb / 2 (Fa + Fb ) times 100 (5)
Two
populations with the same surnames in the same frequencies reach always a value
Hab of genetic identity of 1, with completely different surnames a
value of 0. The coefficient of genetic identity by Hedrick determines the
extent to which to populations share genes in common by reason of common
descent.
Samples and results
From parish registers (published or unpublished material; Weiss 1974) of nine German villages the number of isonymous marriages was counted (see Tabl. 1). As expected the inbreeding coefficient F has a tendency to increase from generation to generation as a result of fluctuations in family size (Holgate 1971) and extinction of surnames. But the increase in migration and larger
marital distance during the last hundred years lead to a decrease of f in some villages. At present the two counterrunning effects cannot be separated quantitatively.
Table 1 (from Weiss 1974) Inbreeding coefficient F = 100f in nine German parishes |
||||||||
|
1548-99 |
1600-49 |
1650-99 |
1700-49 |
1750-99 |
1800-49 |
1850-99 |
1900-45 |
Öschelbronn/Württemberg |
- |
0.27 |
0.29 |
0.40 |
0.45 |
1.17 |
0.73 |
0.47 |
Wethen/Waldeck |
- |
- |
0.28 |
0.64 |
0.47 |
0.35 |
0.45 |
0.21 |
Ringsheim/Baden |
- |
- |
0.00 |
0.08 |
0.37 |
0.29 |
0.58 |
0.73 |
Schweigern/Baden |
- |
- |
- |
0.34 |
0.34 |
0.67 |
0.22 |
0.20 |
Lauf/Baden |
- |
- |
- |
0.14 |
0.22 |
0.57 |
0.62 |
0.70 |
Storbeck/Brandenburg |
- |
- |
- |
0.00 |
0.54 |
0.93 |
0.39 |
0.66 |
Grafenhausen/Baden |
- |
- |
- |
0.00 |
0.11 |
0.46 |
0.53 |
0.27 |
Rabenstein/Sachsen |
0.24 |
0.40 |
0.23 |
0.41 |
- |
- |
- |
- |
Oberhermersdorf/Sachsen (now Adelsberg) |
- |
0.28 |
1.22 |
0.74 |
0.64 |
0.71 |
0.41 |
- |
weighted
mean F |
- |
0.33 |
0.27 |
0.27 |
0.35 |
0.61 |
0.49 |
0.47 |
total
number of marriages |
315 |
531 |
728 |
2596 |
2812 |
3730 |
4646 |
3313 |
isonymous
from this |
3 |
7 |
8 |
29 |
39 |
91 |
91 |
62 |
- = no data |
In order
to calculate Hedrick’s coefficient (5) of genetical distance seven villages and
two towns were selected. These are all arranged on a straight line, extending
from the southwest to the northeast of the Vogtland in Saxony, Germany.
As the
calculation of F and H, including all surnames, is very cumbersome, analysis
was restricted to the most frequent surnames, amounting to one third of the
respective population. The contribution of the rarer surnames to F and H is
very small, indeed. Because of the polyphyletic origin of some common surnames,
the limitation to the most frequent surnames seems justified and a welcome
correction of F, for which an estimate from the total surname frequencies
provides an upper bound.
Lasker’s
(1977) ‘coefficient of relationship’ can, especially in cases of populations
with very different size, give no clear picture. E.g. in Table 2 its value for
the relationship between Oelsnitz and Tirpersdorf is 0.00013 (against H = 0.31)
and between Oelsnitz and Wiedersberg 0.0010 (against H = 0.19). As you can see
the size of Lasker’s coefficient has no direct meaning (the percentage of genes
shared by descent, as Hedrick’s coefficient).
Table 2 (from Weiss 1974) Genetic distance H between nine places of the Vogtland (Saxony) about
1920 |
|
|||||||||||
Air
distance km |
Town or village |
Number of
inhabitants |
Ro |
Wer |
Rü |
Auer |
Reu |
Tir |
Oels |
Bob |
Wie |
|
0.0 |
Rothenkirchen |
1910 |
0.34 |
.54 |
.26 |
.31 |
.29 |
.06 |
.12 |
.05 |
.00 |
|
3.0 |
Wernersgrün |
1290 |
|
0.20 |
.33 |
.40 |
.18 |
.02 |
.14 |
.09 |
.05 |
|
6.0 |
Rützengrün |
700 |
|
|
0.50 |
.29 |
.24 |
.11 |
.10 |
.00 |
.00 |
|
8.2 |
Auerbach |
18000 |
|
|
|
0.06 |
.34 |
.19 |
.59 |
.20 |
.05 |
|
10.8 |
Reumtengrün |
1410 |
|
|
|
|
0.35 |
.20 |
.15 |
.05 |
.00 |
|
21.2 |
Tirpersdorf |
1020 |
|
|
|
|
|
0.24 |
.31 |
.08 |
.09 |
|
27.4 |
Oelsnitz/V. |
33500 |
|
|
|
|
|
|
0.07 |
.41 |
.19 |
|
36.0 |
Bobenneukirchen |
790 |
|
|
|
|
|
|
|
0.27 |
.14 |
|
40.6 |
Wiedersberg |
170 |
|
|
|
|
|
|
|
|
0.67 |
|
in the diagonal F = 100f |
From
Table 2 we read the following results:
1.
The size of the inbreeding coefficient is inversely proportional to the number of inhabitants of place.
But
there are exceptions to the rule: e.g. Silberstrasse/Kreis Zwickau has 525
inhabitants, but an inbreeding coefficient of F = 0.17. On the contrary
Gottesberg with 500 inhabitants F = 1.51, Brunn with 1060 inhabitants F = 1.01.
Even Ellefeld with 6500 inhabitants has an F of 0.19. In Gottesberg half of the
populations has only three surnames. However, Silberstrasse is a settlement of
industrial workers with a mobile population. In the other villages only a few
families have multiplied, on the basis of local trade and industrial
development, without immigration from the outside. Such local population growth
seems to be the prerequisite of extremely high inbreeding. As a consequence,
true human isolates can have lower inbreeding coefficients. In the isolate of
the Catholic church in Upper Lusitania (Geserick and Weiss 1971) F is only
0.23.
2. Genetic distance decreases with geographical distance.
This
result seems, from the point of view of marital distances, trivial. But for the
first time the Hedrick-coefficient provides a quantitative geographical picture
of genetic identity by descent. Covering about 400 to 600 years since surnames
became fixed in Central Europe, in the investigated area between villages the
coefficient tends to zero at a distance of about 40 km.
The
environs of each central place has their own peculiarities. Around Auerbach the
surnames Seidel and Schädlich are common. Around Oelsnitz Schneider and
Wunderlich are the most common. Some surnames, very frequent in one village,
are possibly completely unknown beyond the nearest neighborhood. E.g., one
third of all inhabitants of Gottesberg bear (in 1912) the surname Röder. In
Jägersgrün, only at a distance of 3 km, the name is unknown. In Brunn, a
village with 1060 inhabitants, adjacent to the town Auerbach, three surnames
amount to one third of the population.
3. Central places (towns) are genetically more similar than one of the central places to its neighboring villages. Human populations are hierarchically
structured in space and townspeople prefer to marry townspeople.
In order to delimite the area of the respective central places, a cluster analysis was performed. To the first cluster belong Wernesgrün, Rothenkirchen, Auerbach, Rützengrün, and Reumtengrün, to the second Bobenneukirchen, Oelsnitz, and Tirpersdorf, to a third Wiedersberg alone. This result was theoretically expected. Historically Wiedersberg belongs to the environs of Hof, a town at a distance of 8 km, whereas the distance to Oelsnitz is 13 km.
Human
mainstream populations are not only structured geographically, but also
socially. To measure the social distance, we sampled only the inhabitants
having the surname with the initial G. (Thus the inbreeding coefficient of this
sample is not equal to the F of the whole population.) We distinguished, on the
basis of data given in directories, between owners and professionals on the one
side and workers on the other. Thus the analyis was restricted to the upper and
lower strata; the medium stratum has not been taken into account.
Table 3 (from Weiss 1974) Genetic distance H between social strata in the towns of Auerbach and
Oelsnitz/V.
and their rural surroundings in 1920 |
|||||||||
Town or
surrounding |
Social stratum |
Auerbach upper |
-town lower |
Auerbach- upper- |
surrounding lower |
Oelsnitz upper |
-town lower |
Oelsnitz- upper |
surrounding lower |
Auerbach-town |
upper lower |
1.04 |
.66 0.90 |
.40 .43 |
.58 .39 |
.16 .22 |
.25 .45 |
.08 .11 |
.06 .13 |
Auerbach-surrounding |
upper lower |
|
|
2.68 |
.49 3.40 |
.16 .13 |
.18 .10 |
.06 .09 |
.00 .06 |
Oelsnitz-town |
upper lower |
|
|
|
|
0.83 |
.50 1.06 |
.47 .54 |
.27 .48 |
Oelsnitz- surrounding |
upper lower |
|
|
|
|
|
|
2.78 |
.50 2.60 |
in the
diagonal F = 100f (inbreeding coefficient) comprising all
surnames starting with the initial G |
From
Table 3 we read:
1. The lower strata of the two towns are genetically more similar than the upper.
The
upper stratum has larger marital distances. If we had analyzed more towns at a
larger distance, the upper strata would become more similar than the lower.
2. The inbreeding coefficients of the lower and upper strata are very similar.
No
doubt, the upper stratum has larger marital distances, but because of its small
number its inbreeding coefficient is a high as that of the lower stratum.
No
doubt, the upper stratum has the larger marital distances, but because of its
small number its inbreeding coefficient is as high as that of the lower
stratum.
Table 4 (from Weiss 1974) Genetic distance H between and within the populations of two villages (Wernesgrün and Rothenkirchen)
from the 16th century to 1912 |
||||||||
Village |
Time |
Number
of inhabitants |
We 16th
|
We
1728 |
We 1912 |
Ro 16th
|
Ro
1728 |
Ro
1912 |
Wernesgrün |
16th |
120 |
2.96 |
.64 |
.20 |
.18 |
.16 |
.32 |
Wernesgrün |
1728 |
370 |
|
1.31 |
.34 |
.20 |
.17 |
.42 |
Wernesgrün |
1912 |
1290 |
|
|
0.32 |
.14 |
.21 |
.45 |
Rothenkirchen |
16th |
150 |
|
|
|
2.59 |
.17 |
.13 |
Rothenkirchen |
1728 |
440 |
|
|
|
|
0.82 |
.37 |
Rothenkirchen |
1912 |
1910 |
|
|
|
|
|
0.45 |
in the
diagonal F = 100f (inbreeding coefficient) |
Table 4
shows the historical development of genetic distance between and within two
villages. Over about 350 years only one fifth and one eight, respectively, of
the genes remain identical by descent. (The sources for the population of 1728
and in the 16th century were
tax lists. F in the 16th century may be too high, because hands were
not taxed and therefore their surnames unknown.)
A
combination of population structure analysis in space, time, and in the social
dimension and their respective changes is possible. But quantifying historical
sources is very time-consuming. For this purpose the exploitation of already
written or printed genealogies seems more appropriate.
From an
available list of ancestors (Weiss 1976) we selected 100 probands, born between
1650 and 1880 in 53 villages and 3 small towns of Saxony (Vogtland and
Erzgebirge), most of them (n = 57) from the 18th century. This
sample is not representative, but will be a good cross-section of the
population. From each proband we used the following data: surname, place, and
year of birth, birth places of the two parents, surnames of the 8
greatgrandparents.
If we
use generations with 16, 32 or more ancestors, the amount of information
increases, but also the time to compute probabilities of isonymy and genetic
distance. However, this approach seems especially appropriate, if we compare
genetic identity of famous persons or groups of such persons or ancestors of
certain professions or trades.
For the
56 birth places geographical distance was determined as the air distance from
church to church. By calculating Hab between all possible pairs of
proband and the surnames of their 8 ancestors, we have to drop all the pairs,
being direct ascendants and descendants,
respectively, of each other.
The mean
inbreeding coefficient of the 100 probands was f = 0.010. In cases that the
birth places of the parents are identical (n = 48), f = 0.015, and that the
birth places are different (n = 52; mean air distance 5,7 km) f = 0.006. As
expected, peasant probands (n = 34) had a higher level of inbreeding (f =
0.012) that non-peasants (f = 0.009).
Table 5 (from Weiss 1976) Geographical distance and genetic distance H of probands from Saxony
(16th – 19th century) |
||||||||
Mean air distance of parental
birth places |
||||||||
|
- 2,5 km |
|
- 5 km |
- 10 km |
- 20 km |
- 30 km |
|
-60 km |
Total
population |
.098 |
|
.048 |
.025 |
.019 |
.006 |
|
‘ |
Urbanpopulation |
|
.037* |
|
.016 |
.014 |
|
.008* |
|
Peasants |
.103 |
|
.052 |
.021 |
.019 |
‘ |
|
‘ |
‘ n to small
* mean of the two
neighboring entries |
The
leptokurtic curve, resulting from Table 5, is in accordance with results,
obtained by other methods (migration matrices, blood groups, morphological
similarity, marriage distances). It has to be understood primarily as a
function of marriage behavior of the population. The decrease of genetic
distance with geographical distance depends also upon the density of
population, which rose in the investigated area and time from about 40 to 100
per km2.
Two
probands of rural origin (their fathers peasants) share at a distance of 2.5 km
between the birthplaces of their parents 10 per cent of their genes by descent,
at 5 km about 5 per cent. Indirectly we conclude from Table 5 that the
non-farming rural population (traders and craftsmen especially) marry at a
distance of mostly 5 to 20 km.
The next
logical step of “surname genetics” is the calculation of matrices, which show
not only the historical change of genetic distance, but also the determined
direction and amount of gene flow (e.g., from the villages to the central place
and the different social strata).
The
processes of biological and cultural evolution both result in the divergence of
populations descended from the same ancestral group. It is right to consider
divergence in gene frequencies, estimated by surname frequencies, as a
representative indicator of biological evolution. For cultural evolution, the
most basic representative seems to be language differentiation. But a lot of
other variables seem also to be relevant and worth investigation. Inference
about relationships between local and social dialects should be based on shared
lexical items, grammatical rules, and phonological differences (Spielman et al.
1974).
Growing
linguistic similarity between two neighbouring populations can be the result of
natural increase and active immigration from one population into the other, of
increased marriage frequencies in both directions, or the result of cultural
(political) predominance of one population, assimilating (‘infecting’) the
other culturally but not necessarily also genetically, or the combined effect
of the causes mentioned. The extent of linguistic differentiation and its
correspondence to genetic differentiation of hierarchically structured
populations can be studied by means of calculating difference matrices, genetic
distance H minus linguistic distance H (Weiss 1977), and their respective
change in time. In such studies some linguistic expertise (including knowledge
about changes in the spelling of surnames) is indispensable (e.g. the
relationship between Müller, Muller and Mueller must be considered).
It seems
impossible to study relative fitness by measuring changes in surname
frequencies, because in human populations migration is always selective (with
respect to the upper stratum, but also to some other occupations). Considering
sampling error, the differences in relative fitness are smaller than the
differences in selective immigration or emigration of any geograpically bounded
population.
An
outlet from the dilemma seems to be to count the total number of collaterals
(end eventually descendants) of probands, weighted by their respective degree
of genetic relationship to the proband. For example, a group A of probands
(with a determined social status or behavior, in order to study
kinship-selection) has at a fixed time: 100 uncles and aunts, 264 first cousins, 30 sibs, and 76 nephews and
nieces, then
A = 100/4 + 264/8
+ 30/2 + 76/4 = 92
Group B
of probands has at the same time: 96 uncles and aunts, 312 first cousins, 40
sibs, and 102 nephews and nieces.
B= 96/4 + 312/8 +
40/2 + 102/4 = 108.5
Relative
Darwinian fitness of A to B is, consequently, 92 : 108.5 or 0.85: 1.00.
Before
the demographic transition (that means before the industrial revolution and the
secular decline of fertility), differences in relative fitness between the
social strata seem to have been greater than generally imagined. In a study
(based on the method of family reconstitution) of four villages of an early
industrialized region in Saxony (Weiss 1981) showed that from 1550 to 1800
peasants and owners had 6.8 children in completed first marriages and
proletarians (nearly half of the population) only 4.8. In families, in which
not only the husband, but also the father-in-law, was a peasant or owner of a
mill, firm or inn, the mean number of children rose even to 7.6 (population
mean 5.8). Before marriage two-thirds of the children of proletarians
(unskilled workers and hands of all kinds) died and only 1.6 married. From the
well-to-do families half of the children died and 3.4 married. Heckh (1952) had
found, with even larger empirical material, in nine villages of southwest
Germany very similar results (from 1650 to 1799). The main causes are
differential mortality during famine, later marriage of the hands, poorer
health of both husband and wife, as a consequence of under nourishment,
stillbirths, and fatal accidents in the working life of the poor men.
If we study rates of genetic drift and evolution by comparing populations of different
hierarchical structure (e.g., by simulation studies of surname frequencies), we have also to take into account such selective differentials.
References
Bunimovič,
L. A. (1975) Ob odnoj charakternoj modeli ierarchičeskoj struktury
populjacij čeloveka. Genetika (Moskva) 10: 134-143.
Crow, J.
F., and A. P. Mange (1965) Measurement of inbreeding from the frequency of
marriages between persons of the same surname. Eugenics Quarterly 12: 199-203.
Geserick, G. und V. Weiss (1971) Zur Populationsgenetik der Sorben – Bestimmung der Serumgruppen Hp, Gc, Tf und Pt. Ethnographisch-Archäologische Zeitschrift 12: 481-486.
Heckh, G. (1952) Unterschiedliche Fortpflanzung
ländlicher Sozialgruppen aus Südwestdeutschland seit dem 17. Jahrhundert.
Homo 3: 169-175. see
http://www.v-weiss.de/publ4-kinderzahlen.html
Hedrick,
P. W. (1971) A new approach to measuring genetic similarity. Evolution 25:
276-280.
Hedrick,
P. W. (1975) Genetic similarity and distance: comments and comparisons.
Evolution 29: 362-366.
Lasker,
G. W. (1977) A coefficient of relationship by isonymy: a method for estimating
the genetic relationship between populations. Human Biology 49: 489-493.
Michod,
R. E., and W. D. Hamilton (1980) Coefficients of relatedness in sociobiology.
Nature 288: 694-697.
Nei, M.
(1972) Genetic distance between populations. American Naturalist 106: 283-292.
Spielman,
R. S., Migliazza, E. C., and J. V. Neel (1974) Regional linguistic and genetic
differences among Yanomama indians. The comparison of linguistic and biological
differentiation sheds light on both. Science 184: 637-644.
Weiss, V. (1973) Eine neue Methode zur Schätzung des Inzuchtkoeffizienten aus den Familiennamenhäufigkeiten der Vorfahren. Biologische Rundschau 11: 314-315. – see http://www.v-weiss.de/publ4-inzucht.html
Weiss, V. (1974) Die Verwendung von Familiennamenhäufigkeiten zur Schätzung der genetischen Verwandtschaft. Ein Beitrag zur Populationsgenetik des Vogtlandes. Ethnographisch-Archäologische Zeitschrift 15: 433-451 (Reprinted: Mitteilungen der Deutschen Gesellschaft für Bevölkerungswissenschaft 55 (1978), Beilage 1-16).
Weiss, V. (1976) Geographische Distanz und genetische Identität von Personen, geschätzt mittels Familiennamenhäufigkeiten der Vorfahren (Erzgebirge, Vogtland – 16.-19. Jahrhundert). Mitteilungen der Sektion Anthropologie der DDR 32/33: 107-115. (Reprinted: Mitteilungen der Deutschen Gesellschaft für Bevölkerungswissenschaft 56 (1979) 107-115; and: Genealogie 29 (1980) 182-186).
Weiss, V. (1977) Familiennamenhäufigkeiten in Vergangenheit und Gegenwart als Ausgangspunkt für interdisziplinäre Forschungen von Linguisten, Historikern, Soziologen, Geographen und Humangenetikern. Namenkundliche Informationen 31: 27-32. – see http://www.v-weiss.de/familiennamen.html
Weiss, V. (1981) Zur Bevölkerungsgeschichte des Erzgebirges unter frühkapitalistischen Bedingungen vom 16. bis 18. Jahrhundert (Mittweida, Markersbach, Unterscheibe und Schwarzbach). Sächsische Heimatblätter 27: 28-30.
Yasuda,
N., Cavalli-Sforza, L. L., Skolnick, M., and A. Moroni (1974) The evolution of
surnames: an analysis of their distribution and extinction. Theoretical
Population Biology 5: 123-142.
Percentage of highly gifted and high IQ subjects among the relatives of highly gifted