Wednesday, March 2, 2011

Supervised ADMIXTURE 1.1 analysis results

ADMIXTURE 1.1 offers a new running mode in which some individuals are "fixed" as belonging toa particular population (100%), and the ancestral proportions of the remaining ones are estimated.

Naturally, I wanted to see how well this works in practice, so I went through the ancestry thread to find some test cases.

DOD006 reports half North Italian and half Ashkenazi Jewish ancestry. Using 5 Dodecad Project North Italians and 25 Dodecad Project Ashkenazi Jews, I estimate his/her ancestry as 24.9% North Italian and 75.1% Ashkenazi Jewish. Substituting the HGDP North Italian sample (from Bergamo) for the Dodecad one, I obtain values of 27.1% N.I. and 72.9% AJ. Based on these results, I would wager that the North Italian ancestor was half Jewish, or otherwise atypical for that population.

DOD073 reports half German and half Irish ancestry. Using 17 Dodecad Project Irish and 11 Dodecad Project Germans as references, I estimate his/her ancestry as 55.9% Irish and 44.1% German. This seems reasonable, given the limitations of the algorithm and the relative closeness of the two populations.

DOD188 reports half Sicilian, half Polish ancestry. Using 6 Poles and 20 South Italians/Sicilians from the Dodecad Project, I estimate his/her ancestry as 40.6% Polish and 59.4% Sicilian. Is this slightly worse result due to the algorithm's limitations, or, as I suspect, to the smaller Polish sample?

DOD014 is a very interesting case reporting half Greek half South Italian/Sicilian ancestry. Given the close relationship between these two populations, I did not know what to expect, and the result of 30.6% Greek 69.4% South Italian/Sicilian probably indicates the difficulty of obtaining accurate estimates for admixture between related populations.

DOD245 suggests an approximate breakup of 50% W African, 25% Ashkenazi, and 25% N European/English with some Native American. Using HGDP Yoruba, Dodecad Ashkenazi, and 17 Dodecad British, I estimate 50.5% W African, 24.4% Ashkenazi, and 25.1% British which seems right on the money, thanks, perhaps, to the large reference samples from well-differentiated populations.

I revisited Joe Pickrell whose ancestry I do not know fully except for the following:
  • 1 Ashkenazi great grandparent
  • 1 Italian grandparent
Guessing that the remainder is British-like white American, and the Italian part is from the south, I analyzed the sample using Dodecad British, South Italians/Sicilians, Ashkenazi and came up with 56.8%, 34%, and 9.2% respectively:
  • the Ashkenazi component is close to the expected 12.5%, given the randomness of three generations between a great-grandparent and his descendant
  • the Italian is more than the expected 25%, and this could be explained in many different ways, e.g., part-Italian descent of the non-Italian ancestors, or descent from some non-British, non-Italian white Europeans.
Zack finds his Dodecad results (DOD128) to be compatible with a quarter Egyptian ancestry, finding his South Asian ancestry to be more similar to Punjabis (although he has no data for Punjabis). Using Pakistani Punjabis from Xing et al. (2010) and Behar et al. (2010) Egyptians as references requires me to drop the number of markers to ~38k, but the result of the supervised ADMIXTURE analysis is 77.4% Punjabi and 22.6% Egyptian, which seems compatible with what he expected.

Finally, another difficult case is DOD329 who is 3/4 Norwegian and 1/4 Swedish with a little "forest Finn". Judging from the K=10 results for this sample (only 0.4% East Asian), I don't think there is much "forest Finn" in his/her genome. Using 7 Dodecad Swedes and 6 Dodecad Norwegians as references, I obtain 46.8% and 53.2% which is again appropriately "off" given both the small reference samples and close relatedness of these two populations.

Concordance between self-reported and genomic ancestry

Consider DOD375 of Spanish origin (from Valencia). I ran supervised ADMIXTURE analysis using Behar et al. (2010) Spaniards and 25 HapMap Mexicans as references. Not surprisingly, this individual turns out to be 100% Spanish using this test.

Now, consider the individual who prompted my recent plea for accurate self-reporting of ancestry. I had hard evidence that this individual, who also claimed full Spanish ancestry, was in fact part Mexican. Nonetheless, I decided to make the case airtight by performing exactly the type of test described in the previous paragraph. The result: 76% Spanish and 24% Mexican, in agreement with a single Mexican grandparent.

Conclusion

This type of analysis does seem to work best when good-sized samples of the ancestral populations are available, and these populations are well-differentiated genetically.

From an anthropological viewpoint, it could be useful for populations with well-known admixture histories, such as those of the New World or parts of Central Asia.

It could also be useful as a confirmatory tool to compare self-reported vs. genomic ancestry.

10 comments:

  1. I (DOD380) am an admixed outlier who would be interested in seeing my results in future Admixture 1.1 runs, if conducted. My ancestry details are now in the ancestry thread.

    ReplyDelete
  2. WRT DOD188 (Sicilian/Polish), I suspect that some of the overrepresentation of the Sicilian element may be due to distant/minor Ashkenazi ancestry in many Poles - which in this case might get subsumed into the Sicilian cluster.

    ReplyDelete
  3. Very interesting. I have been thinking of running the supervised analysis too.

    ReplyDelete
  4. Is it possible that the person who appears to have misrepresented their partial Mexican ancestry could have been the result of a "non paternity" event? i.e. they and their family think they are 100% Spanish, but grandma had an affair with a Mexican milkman? I suppose this is more likely if you can tell if the Mexican genes came from a male.

    ReplyDelete
  5. Possible in general, not in this case.

    ReplyDelete
  6. @pconroy Are you Slav or Germanic Polish? I also cluster Ashkenazi (34% on the Euro-DNA calc) but the family ancestry traces back to the 1600s migrating to Silesia from Dutch/German borderlands. Wonder if both slav and germanic polish matches with ashkenazim.

    ReplyDelete
  7. But a mexican should tell he is mexican, not spanish, it's very rare that a mexican be 100% spanish, unless he is a first generation mexican

    ReplyDelete
  8. @Justin,
    That's not me, but my father-in-law, and his Sicilian is from near Palermo, while his Polish is from Poznan (aka Posen) and the name is not Slavic/Polish, but German/Jewish sounding.

    Although with no known Jewish ancestry, he is getting about 1/3 Ashkenazi distant matches

    ReplyDelete
  9. DOD006 is my mother and to the best of family lore, there is no known AJ ancestry on her father's side in Italy.

    Based on ancestry finder results and relative finder results, it is possible that there was previously unknown AJ ancestry on her father's side, though no paper trail evidence yet exists.

    This is an interesting discovery, definitely.
    Thanks.

    ReplyDelete
  10. I'd love to participate in a test like this, but being 65% Euro 35% Native American (Mexican) I don't know if it would be possible? I got those ratios from 23andme

    ReplyDelete