Mankind Quarterly 21 (1980) 135-149

 

Inbreeding and genetic distance between hierarchically structured populations

 measured by surname frequencies

 

Volkmar Weiss

Home

 

Surnames can be considered as alleles of one locus, and surname distribution and evolution can be analyzed by the theory of neutral mutations in finite populations (Yasuda et al. 1974). The analysis of human evolution is complicated by the fact that in developed countries such populations have been structured geographically and socially for many thousand years. Bunimovic (1975) demonstrated, if only theoretically, that in a hierarchical population natural selection is more efficient than in a non-subdivided one and the rate of the evolutionary process higher. He defines: ‘A population is hierarchical, if it can be divided into a certain amount of subpopulations in such a way that the set may naturally break into classes (levels).’

One may describe the genetic structure of a human population in terms of the inbreeding within its subpopulations and the extent of the sharing of genes among them. Thus the general pattern of the population structure may be represented by the distribution of coefficients of genetic distance (of coefficients of kinship or inbreeding) within each subpopulation, and thus between all possible pairs of subpopulations, tracing their respective changes in historical time. Crow and Mange (1965) explained how, under certain assumptions, inbreeding coefficients can be estimated from the frequency of marriages between persons of the same surname (isonymy).

By suggesting and applying additional methods, in a series of papers (Weiss 1973, 1974, 1976, 1977) the author extended the application of ‘surname genetics’ to measure genetic distance and historical changes. But all because all these papers were written (and reprinted) in German, the publication of an up-to-date English summary of these findings seems justified.

 

Methods

Crow and Mange (1965) showed that the rate of isonymous marriage is indicative of the same amount of inbreeding in the population regardless of the degree of consanguinity of the marriage. Sibs have always the same name, whereas first cousins have the same only if they are related through their fathers, or ¼ of the time. The same principle holds for most of the  common marriage relationships. Thus a fraction I of isonymous marriages is indicative of a population inbreeding coefficient

                                                              f = I /4                                                               (1)

If we assume that all isonymy is a true reflection of common ancestry,  we can separate random from nonrandom contributions to the inbreeding coefficient f. If we neglect nonrandom contributions (very small in actual populations) the formula simplifies to

                                                  f = Sum n p squared/ 4                                                 (2)

where p is the proportion of the population with a certain surname and n the number of surnames.

Instead of using marriage data, this simplified formula enables us to sample from directories or census data. Consequently in pedigrees f can be estimated not only by counting the actual number of isonymous marriages but better by simply calculating the probability of isonymy within a given generation of ancestors (Weiss 1973).  A number of workers have used measures to compare different populations a and b using gene frequency data. But since surname frequencies are usually estimates made from the observed frequencies of the male population, it seems appropriate to compare genotype frequencies rather than gene frequencies of populations. In order to do this Hedrick (1971) proposed a measure Hab , called by him the probability of genotypic identity

                       Hab = Sum n pja  times pjb  / ½ (Sum n pja  squared +  Sum n pjb squared)                              (3) 

where pja is the frequency of the jth surname in a population  and n is the number of surnames.

Nei (1972) proposed 

                       Nab = Sum m pja  times pjb  / root (Sum m pja  squared) times (Sum m pjb squared)                 (4) 

where pja is the frequency of the jth allele in a population  and m is the number of alleles.

Nei (1972) defines the genetic distance for his model as  - loge Nab . For Hab the distance is the complement of the similarity value.

As Hedrick (1975) could demonstrate, the differences between the two formulas are very small indeed and the numerical results correlate about .97. Therefore both formulas, (3) and (4), can be recommended, compare also Michod and Hamilton (1980). 

Because Sum n pja squared = 4 fa , and Sum n pjb squared = 4 fb , and F = 100f,

Hedrick’s formula (3) actually reduces to

                       Hab = Sum n pja  times pjb  / 2 (Fa +  Fb ) times 100                                                                  (5) 

Two populations with the same surnames in the same frequencies reach always a value Hab of genetic identity of 1, with completely different surnames a value of 0. The coefficient of genetic identity by Hedrick determines the extent to which to populations share genes in common by reason of common descent.

 

Samples and results

From parish registers (published or unpublished material; Weiss 1974) of nine German villages the number of isonymous marriages was counted (see Tabl. 1). As expected the inbreeding coefficient F has a tendency to increase from generation to generation as a result of fluctuations in family size (Holgate 1971) and extinction of surnames. But the increase in migration and larger marital distance during the last hundred years lead to a decrease of f in some villages. At present the two counterrunning effects cannot be separated quantitatively.

 

Table 1 (from Weiss 1974)

Inbreeding coefficient F = 100f in nine German parishes

 

1548-99

1600-49

1650-99

1700-49

1750-99

1800-49

1850-99

1900-45

Öschelbronn/Württemberg

-

0.27

0.29

0.40

0.45

1.17

0.73

0.47

Wethen/Waldeck

-

0.28

0.64

0.47

0.35

0.45

0.21

Ringsheim/Baden

-

-

0.00

0.08

0.37

0.29

0.58

0.73

Schweigern/Baden

-

-

-

0.34

0.34

0.67

0.22

0.20

Lauf/Baden

-

-

-

0.14

0.22

0.57

0.62

0.70

Storbeck/Brandenburg

-

-

-

0.00

0.54

0.93

0.39

0.66

Grafenhausen/Baden

-

-

-

0.00

0.11

0.46

0.53

0.27

Rabenstein/Sachsen

0.24

0.40

0.23

0.41

-

-

-

-

Oberhermersdorf/Sachsen (now Adelsberg)

-

0.28

1.22

0.74

0.64

0.71

0.41

-

weighted mean F

-

0.33

0.27

0.27

0.35

0.61

0.49

0.47

total number of marriages

315

531

728

2596

2812

3730

4646

3313

isonymous from this

    3

    7

    8

    29

    39

    91

    91

    62

- = no data

 

In order to calculate Hedrick’s coefficient (5) of genetical distance seven villages and two towns were selected. These are all arranged on a straight line, extending from the southwest to the northeast of the Vogtland in Saxony, Germany.

As the calculation of F and H, including all surnames, is very cumbersome, analysis was restricted to the most frequent surnames, amounting to one third of the respective population. The contribution of the rarer surnames to F and H is very small, indeed. Because of the polyphyletic origin of some common surnames, the limitation to the most frequent surnames seems justified and a welcome correction of F, for which an estimate from the total surname frequencies provides an upper bound.

Lasker’s (1977) ‘coefficient of relationship’ can, especially in cases of populations with very different size, give no clear picture. E.g. in Table 2 its value for the relationship between Oelsnitz and Tirpersdorf is 0.00013 (against H = 0.31) and between Oelsnitz and Wiedersberg 0.0010 (against H = 0.19). As you can see the size of Lasker’s coefficient has no direct meaning (the percentage of genes shared by descent, as Hedrick’s coefficient).

 

Table 2 (from Weiss 1974)

Genetic distance H between nine places of the Vogtland (Saxony) about 1920

 

Air distance km

Town

or

village

Number

of inhabitants

 

 

Ro

 

 

Wer

 

 

 

 

Auer

 

 

Reu

 

 

Tir

 

 

Oels

 

 

Bob

 

 

Wie

 

0.0

Rothenkirchen

1910

0.34

.54

.26

.31

.29

.06

.12

.05

.00

3.0

Wernersgrün

1290

 

0.20

.33

.40

.18

.02

.14

.09

.05

6.0

Rützengrün

700

 

 

0.50

.29

.24     

.11

.10

.00

.00

8.2

Auerbach

18000

 

 

 

0.06

.34

.19

.59

.20

.05

10.8

Reumtengrün

1410

 

 

 

 

0.35

.20

.15

.05

.00

21.2

Tirpersdorf

1020

 

 

 

 

 

0.24

.31

.08

.09

27.4

Oelsnitz/V.

33500

 

 

 

 

 

 

0.07

.41

.19

36.0

Bobenneukirchen  

790

 

 

 

 

 

 

 

0.27

.14

40.6

Wiedersberg

170

 

 

 

 

 

 

 

 

0.67

in the diagonal F = 100f 

 

From Table 2 we read the following results:

1.      The size of the inbreeding coefficient is inversely proportional to the number of inhabitants of place.

But there are exceptions to the rule: e.g. Silberstrasse/Kreis Zwickau has 525 inhabitants, but an inbreeding coefficient of F = 0.17. On the contrary Gottesberg with 500 inhabitants F = 1.51, Brunn with 1060 inhabitants F = 1.01. Even Ellefeld with 6500 inhabitants has an F of 0.19. In Gottesberg half of the populations has only three surnames. However, Silberstrasse is a settlement of industrial workers with a mobile population. In the other villages only a few families have multiplied, on the basis of local trade and industrial development, without immigration from the outside. Such local population growth seems to be the prerequisite of extremely high inbreeding. As a consequence, true human isolates can have lower inbreeding coefficients. In the isolate of the Catholic church in Upper Lusitania (Geserick and Weiss 1971) F is only 0.23.

2. Genetic distance decreases with geographical distance.

This result seems, from the point of view of marital distances, trivial. But for the first time the Hedrick-coefficient provides a quantitative geographical picture of genetic identity by descent. Covering about 400 to 600 years since surnames became fixed in Central Europe, in the investigated area between villages the coefficient tends to zero at a distance of about 40 km.

The environs of each central place has their own peculiarities. Around Auerbach the surnames Seidel and Schädlich are common. Around Oelsnitz Schneider and Wunderlich are the most common. Some surnames, very frequent in one village, are possibly completely unknown beyond the nearest neighborhood. E.g., one third of all inhabitants of Gottesberg bear (in 1912) the surname Röder. In Jägersgrün, only at a distance of 3 km, the name is unknown. In Brunn, a village with 1060 inhabitants, adjacent to the town Auerbach, three surnames amount to one third of the population.

3. Central places (towns) are genetically more similar than one of the central places to its neighboring villages. Human populations are hierarchically structured in space and townspeople prefer to marry townspeople.

In order to delimite the area of the respective central places, a cluster analysis was performed. To the first cluster belong Wernesgrün, Rothenkirchen, Auerbach, Rützengrün, and Reumtengrün, to the second Bobenneukirchen, Oelsnitz, and Tirpersdorf, to a third Wiedersberg alone. This result was theoretically expected. Historically Wiedersberg belongs to the environs of Hof, a town at a distance of 8 km, whereas the distance to Oelsnitz is 13 km.

Human mainstream populations are not only structured geographically, but also socially. To measure the social distance, we sampled only the inhabitants having the surname with the initial G. (Thus the inbreeding coefficient of this sample is not equal to the F of the whole population.) We distinguished, on the basis of data given in directories, between owners and professionals on the one side and workers on the other. Thus the analyis was restricted to the upper and lower strata; the medium stratum has not been taken into account.

 

Table 3 (from Weiss 1974)

Genetic distance H between social strata in the towns of Auerbach and Oelsnitz/V.

and their rural surroundings in 1920

Town

or surrounding

Social stratum

Auerbach

upper

-town

lower

Auerbach-

upper-

surrounding

lower

Oelsnitz

upper

-town

lower

Oelsnitz-

upper

surrounding

lower

Auerbach-town

upper

lower

1.04

.66

0.90

.40

.43

.58

.39

.16

.22

.25

.45

.08

.11

.06

.13

Auerbach-surrounding

upper

lower

 

 

2.68

.49

3.40

.16

.13

.18

.10

.06

.09

.00

.06

Oelsnitz-town

upper

lower

 

 

 

 

0.83

.50

1.06

.47

.54

.27

.48

Oelsnitz-

surrounding

upper

lower

 

 

 

 

 

 

2.78

.50

2.60

in the diagonal F = 100f (inbreeding coefficient) comprising all surnames starting with the initial G

 

From Table 3 we read:

1. The lower strata of the two towns are genetically more similar than the upper.

The upper stratum has larger marital distances. If we had analyzed more towns at a larger distance, the upper strata would become more similar than the lower.

2. The inbreeding coefficients of the lower and upper strata are very similar.

No doubt, the upper stratum has larger marital distances, but because of its small number its inbreeding coefficient is a high as that of the lower stratum.

No doubt, the upper stratum has the larger marital distances, but because of its small number its inbreeding coefficient is as high as that of the lower stratum.

 

Table 4 (from Weiss 1974)

Genetic distance H between and within the populations of two villages

(Wernesgrün and Rothenkirchen) from the 16th century to 1912

Village

Time

Number of inhabitants

 

We 16th

 

We 1728

 

We 1912

 

Ro 16th

 

Ro 1728

 

Ro 1912

Wernesgrün

16th  

120

2.96

.64

.20

.18

.16

.32

Wernesgrün

1728

370

 

1.31

.34

.20

.17

.42

Wernesgrün

1912

1290

 

 

0.32

.14

.21

.45

Rothenkirchen

16th

150

 

 

 

2.59

.17

.13

Rothenkirchen

1728

440

 

 

 

 

0.82

.37

Rothenkirchen

1912

1910

 

 

 

 

 

0.45

in the diagonal F = 100f (inbreeding coefficient) 

 

Table 4 shows the historical development of genetic distance between and within two villages. Over about 350 years only one fifth and one eight, respectively, of the genes remain identical by descent. (The sources for the population of 1728 and in the 16th  century were tax lists. F in the 16th century may be too high, because hands were not taxed and therefore their surnames unknown.)

A combination of population structure analysis in space, time, and in the social dimension and their respective changes is possible. But quantifying historical sources is very time-consuming. For this purpose the exploitation of already written or printed genealogies seems more appropriate.

From an available list of ancestors (Weiss 1976) we selected 100 probands, born between 1650 and 1880 in 53 villages and 3 small towns of Saxony (Vogtland and Erzgebirge), most of them (n = 57) from the 18th century. This sample is not representative, but will be a good cross-section of the population. From each proband we used the following data: surname, place, and year of birth, birth places of the two parents, surnames of the 8 greatgrandparents.

If we use generations with 16, 32 or more ancestors, the amount of information increases, but also the time to compute probabilities of isonymy and genetic distance. However, this approach seems especially appropriate, if we compare genetic identity of famous persons or groups of such persons or ancestors of certain professions or trades.

For the 56 birth places geographical distance was determined as the air distance from church to church. By calculating Hab between all possible pairs of proband and the surnames of their 8 ancestors, we have to drop all the pairs, being  direct ascendants and descendants, respectively, of each other.

The mean inbreeding coefficient of the 100 probands was f = 0.010. In cases that the birth places of the parents are identical (n = 48), f = 0.015, and that the birth places are different (n = 52; mean air distance 5,7 km) f = 0.006. As expected, peasant probands (n = 34) had a higher level of inbreeding (f = 0.012) that non-peasants (f = 0.009).

 

Table 5 (from Weiss 1976)

Geographical distance and genetic distance H of probands from Saxony

(16th – 19th century)

Mean air distance of parental birth places

 

- 2,5 km

 

- 5 km

- 10 km

- 20 km

- 30 km

 

-60 km

Total population

.098

 

.048

.025

.019

.006

 

Urbanpopulation 

 

.037*

 

.016

.014

 

.008*

 

Peasants

.103

 

.052

.021

.019

 

‘ n to small

* mean of the two neighboring entries

 

The leptokurtic curve, resulting from Table 5, is in accordance with results, obtained by other methods (migration matrices, blood groups, morphological similarity, marriage distances). It has to be understood primarily as a function of marriage behavior of the population. The decrease of genetic distance with geographical distance depends also upon the density of population, which rose in the investigated area and time from about 40 to 100 per km2.

Two probands of rural origin (their fathers peasants) share at a distance of 2.5 km between the birthplaces of their parents 10 per cent of their genes by descent, at 5 km about 5 per cent. Indirectly we conclude from Table 5 that the non-farming rural population (traders and craftsmen especially) marry at a distance of mostly 5 to 20 km.

The next logical step of “surname genetics” is the calculation of matrices, which show not only the historical change of genetic distance, but also the determined direction and amount of gene flow (e.g., from the villages to the central place and the different social strata).

The processes of biological and cultural evolution both result in the divergence of populations descended from the same ancestral group. It is right to consider divergence in gene frequencies, estimated by surname frequencies, as a representative indicator of biological evolution. For cultural evolution, the most basic representative seems to be language differentiation. But a lot of other variables seem also to be relevant and worth investigation. Inference about relationships between local and social dialects should be based on shared lexical items, grammatical rules, and phonological differences (Spielman et al. 1974).

Growing linguistic similarity between two neighbouring populations can be the result of natural increase and active immigration from one population into the other, of increased marriage frequencies in both directions, or the result of cultural (political) predominance of one population, assimilating (‘infecting’) the other culturally but not necessarily also genetically, or the combined effect of the causes mentioned. The extent of linguistic differentiation and its correspondence to genetic differentiation of hierarchically structured populations can be studied by means of calculating difference matrices, genetic distance H minus linguistic distance H (Weiss 1977), and their respective change in time. In such studies some linguistic expertise (including knowledge about changes in the spelling of surnames) is indispensable (e.g. the relationship between Müller, Muller and Mueller must be considered).

It seems impossible to study relative fitness by measuring changes in surname frequencies, because in human populations migration is always selective (with respect to the upper stratum, but also to some other occupations). Considering sampling error, the differences in relative fitness are smaller than the differences in selective immigration or emigration of any geograpically bounded population.

An outlet from the dilemma seems to be to count the total number of collaterals (end eventually descendants) of probands, weighted by their respective degree of genetic relationship to the proband. For example, a group A of probands (with a determined social status or behavior, in order to study kinship-selection) has at a fixed time:  100 uncles and aunts, 264 first cousins, 30 sibs, and 76 nephews and nieces, then

                             A = 100/4 + 264/8 + 30/2 + 76/4 = 92

Group B of probands has at the same time: 96 uncles and aunts, 312 first cousins, 40 sibs, and 102 nephews and nieces.

                            B= 96/4 + 312/8 + 40/2 + 102/4 = 108.5

Relative Darwinian fitness of A to B is, consequently, 92 : 108.5 or 0.85: 1.00.

Before the demographic transition (that means before the industrial revolution and the secular decline of fertility), differences in relative fitness between the social strata seem to have been greater than generally imagined. In a study (based on the method of family reconstitution) of four villages of an early industrialized region in Saxony (Weiss 1981) showed that from 1550 to 1800 peasants and owners had 6.8 children in completed first marriages and proletarians (nearly half of the population) only 4.8. In families, in which not only the husband, but also the father-in-law, was a peasant or owner of a mill, firm or inn, the mean number of children rose even to 7.6 (population mean 5.8). Before marriage two-thirds of the children of proletarians (unskilled workers and hands of all kinds) died and only 1.6 married. From the well-to-do families half of the children died and 3.4 married. Heckh (1952) had found, with even larger empirical material, in nine villages of southwest Germany very similar results (from 1650 to 1799). The main causes are differential mortality during famine, later marriage of the hands, poorer health of both husband and wife, as a consequence of under nourishment, stillbirths, and fatal accidents in the working life of the poor men.

If we study rates of genetic drift and evolution by comparing populations of different hierarchical structure (e.g., by simulation studies of surname frequencies), we have also to take into account such selective differentials.

 

References

Bunimovič, L. A. (1975) Ob odnoj charakternoj modeli ierarchičeskoj struktury populjacij čeloveka. Genetika (Moskva) 10:   134-143.

Crow, J. F., and A. P. Mange (1965) Measurement of inbreeding from the frequency of marriages between persons of the same surname. Eugenics Quarterly 12: 199-203.

Geserick, G. und V. Weiss (1971) Zur Populationsgenetik der Sorben – Bestimmung der Serumgruppen Hp, Gc, Tf und Pt. Ethnographisch-Archäologische Zeitschrift 12: 481-486.

Heckh, G. (1952) Unterschiedliche Fortpflanzung ländlicher Sozialgruppen aus Südwestdeutschland seit dem 17. Jahrhundert. Homo 3: 169-175. – see http://www.v-weiss.de/publ4-kinderzahlen.html

Hedrick, P. W. (1971) A new approach to measuring genetic similarity. Evolution 25: 276-280.

Hedrick, P. W. (1975) Genetic similarity and distance: comments and comparisons. Evolution 29: 362-366.

Lasker, G. W. (1977) A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. Human Biology 49: 489-493.

Michod, R. E., and W. D. Hamilton (1980) Coefficients of relatedness in sociobiology. Nature 288: 694-697.

Nei, M. (1972) Genetic distance between populations. American Naturalist 106: 283-292.

Spielman, R. S., Migliazza, E. C., and J. V. Neel (1974) Regional linguistic and genetic differences among Yanomama indians. The comparison of linguistic and biological differentiation sheds light on both. Science 184: 637-644.

Weiss, V. (1973) Eine neue Methode zur Schätzung des Inzuchtkoeffizienten aus den Familiennamenhäufigkeiten der Vorfahren. Biologische Rundschau 11: 314-315. – see http://www.v-weiss.de/publ4-inzucht.html

Weiss, V. (1974) Die Verwendung von Familiennamenhäufigkeiten zur Schätzung der genetischen Verwandtschaft. Ein Beitrag zur Populationsgenetik des Vogtlandes. Ethnographisch-Archäologische Zeitschrift 15: 433-451 (Reprinted: Mitteilungen der Deutschen Gesellschaft für Bevölkerungswissenschaft 55 (1978), Beilage 1-16).

Weiss, V. (1976) Geographische Distanz und genetische Identität von Personen, geschätzt mittels Familiennamenhäufigkeiten der Vorfahren (Erzgebirge, Vogtland – 16.-19. Jahrhundert). Mitteilungen der Sektion Anthropologie der DDR 32/33: 107-115. (Reprinted: Mitteilungen der Deutschen Gesellschaft für Bevölkerungswissenschaft 56 (1979) 107-115; and: Genealogie 29 (1980) 182-186).

Weiss, V. (1977) Familiennamenhäufigkeiten in Vergangenheit und Gegenwart als Ausgangspunkt für interdisziplinäre Forschungen von Linguisten, Historikern, Soziologen, Geographen und Humangenetikern. Namenkundliche Informationen 31: 27-32. – see http://www.v-weiss.de/familiennamen.html

Weiss, V. (1981) Zur Bevölkerungsgeschichte des Erzgebirges unter frühkapitalistischen Bedingungen vom 16. bis 18. Jahrhundert (Mittweida, Markersbach, Unterscheibe und Schwarzbach). Sächsische Heimatblätter 27: 28-30. 

Yasuda, N., Cavalli-Sforza, L. L., Skolnick, M., and A. Moroni (1974) The evolution of surnames: an analysis of their distribution and extinction. Theoretical Population Biology 5: 123-142.

Percentage of highly gifted and high IQ subjects among the relatives of highly gifted

Home