Stephan Schiffels

Population Genetics – Computational Methods - Human History

Behind the paper: Paleo-Eskimo ancestry in North America

Posted on June 5, 2019 Categories: blog

This post appeared originally on the Nature Ecology & Evolution Community blog.


In October 2015 — I had just started my new job as Group Leader at the newly founded Max Planck Institute for the Science of Human History in Jena, Germany — we were visited by a young researcher, Pavel Flegontov, from Ostrava University in Czech Republic. Pavel is a computational biologist who a few years ago got himself interested in human population genetics and prehistory. I was at the time about to publish our paper on Anglo-Saxon ancient DNA from England, for which we developed new methodology to investigate fine-scale population structure based on rare genetic variation. Pavel wanted to apply this new methodology to an open genetic question in the literature about Athabaskans, an ethnic group from North America. David Reich, a geneticist from Harvard University, had shown previously in a paper in 2012 that Athabaskans have a distinct ancestry component not shared with many other Native Americans and which could be related to Paleo-Eskimos.

Paleo-Eskimos were known from Archaeology as the first people that inhabited the Arctic regions in North America and Greenland, from about 4500 until 700 years ago. A 4000 year old individual from the Paleo-Eskimo Saqqaq culture in Greenland was the first ancient human whose genome was completely sequenced. ©Illustration by Kerttu Majander, Design by Michelle O’Reilly

About two weeks after Pavel’s visit in Jena, I got an email from him saying “There is a signal!”. Indeed what we found was that the pattern of rare mutations of Athabaskans shared with Siberian and American populations matched that of Saqqaq (an ancient Paleo-Eskimo genome), but not that of Inuit. I was excited, but also somewhat concerned, since I had previously contributed to another paper, published earlier in 2015, in which we argued Athabaskans to be related to Inuit, not Paleo-Eskimos. But the signal persisted, also when applying other methods, based on other data than shared rare variation.

When we released a preprint about our work in 2016, it was overwhelmingly met with enthusiasm from other colleagues in the field, but also with criticism from some of my former coauthors from the 2015 paper (one colleague called our work “unscholarly” and “manipulative”, but apologised later). Some feedback was very useful, though, in particular there was one flaw in the original analysis that was brought to our attention. This kind of feedback is exactly what preprints are so useful for!

Another consequence of our preprint was that it caught the interest of David Reich, who offered us ancient genomic data from the Aleut islands and from Alaska that had been generated in his lab and that he felt would fit well into our study. We happily agreed to co-analyse it within our paper, and at this point decided — despite positive reviews from the first journal — to resubmit in the fall of 2017 to a different journal with higher visibility in the field. This almost never happens, we normally submit somewhere, get rejected, and try a lower-tier journal next. Here it was the opposite direction: up!

Attu Island, Aleutian Islands, Alaska. ©Photo credit: Jason Rogers
The excavation of the Middle Dorset individual from the Buchanan site on southeastern Victoria Island, Nunavut, Central Canadian Arctic. ©Photo credit: T. Max Friesen

In the mean time, this project took on a life of its own. Because David is uncannily well connected (and I’m pretty sure never sleeps), over the course of the next months he assembled more than 37 additional ancient genomes via multiple project partners from relevant sites in Canada, Chukotka and Central Siberia, that were also contributed to our project. In total, since our first submission and our first preprint, the number of novel ancient genomes described in our paper climbed from zero to 48! As a consequence, modelling became a lot harder. Although reviews were again actually fairly positive, we spent an unusually long time to rework our model to encompass all the additional data.

In early 2018, Pavel hit a breakthrough, which lead to the last piece of the puzzle to fall into place, which was bi-directional gene flow between ancestors of Inuit and Yup’ik, and ancestors of Chukotkan people. When we introduced this feature into the model, we could finally explain why there was Native American ancestry present west of the Bering Straight, and why Inuit are more closely related to people from Chukotka than expected under the previous model. This breakthrough ultimately lead to the final model that is now in the paper:

From Figure 2 of our published paper. In our final model, Paleo-Eskimos are involved in the founder population of Athabaskans, as well as Eskimo-Aleut speaking groups, although the latter mixed more recently with Chukotko-Kamchatkan ancestors.

For me personally, the project sure was a learning experience: As the paper has grown from having just 6 to now 35 authors, my role has changed from being a part in a small team of equals, to managing a large author team with a wide range of different perspectives and roles. The project was also a lesson in patience: Since first submission to Nature, our paper spent 263 days in peer review, 241 days in revision, and 100 days in production since its acceptance until today.

I couldn’t be more thrilled to finally seeing it published!