this is a presentation I gave at the 11th annual Family Tree DNA conference on the 15th of November 2015 entitled combining snips scos and genealogy to build a surname origins tree and a lot of this information will also appear on various posts on our blog for the police and DNA project and you’ll see the address for that below and it also complements an earlier presentation I gave in Dublin at genetic genealogy Ireland 2015 entitled building a family tree with snips st ores and named people and you can find that by simply googling youtube and genetic genealogy Ireland now the Dublin presentation explores some of the more basic concepts that we are discussing in this presentation and goes into them in a lot more detail so if you’re unfamiliar with some of the more basic principles behind this type of work it would be useful to look at the earlier video as well and the type of concepts that it explores would be the different types of s of DNA marker on the y chromosome the STR markers and the snip markers the modal haplotype the reasons for grouping people together it also explores markers of potential relatedness both genetic and traditional genealogical markers and it also looks at convergence and the fact that this still may be a problem even at 111 markers so that’s the earlier presentation in Dublin here is an example of a combined mutation and family history tree and this is purely for illustrative purposes but it combines people and here we have named people here and in Irish research research a lot of the records run out around about 1810 1840 or so and at that point in time we get a very big brick wall which is quite difficult to break through some people break through it because they have quite good personal family records like the family Bible or they may have been Gentry and have a longer line than other people in the tree but for some people we we can only hope to replace those ancestors by perhaps DNA markers and here we see STR markers highlighted in yellow and snipped markers highlighted in pink we can see that there’s a branching pattern within the tree here the various branching points so the big question is is it possible to build this type of tree combining named people and DNA markers and using the DNA markers when the named people run out but also using the DNA markers to identify the likely branching points in the tree and this will in turn help the modern-day descendants to decide how closely related they are to each other so that’s the concept behind it and in this presentation I’m going to talk about a variety of different topics we’ll start off first of all building a tree with s T or markers then we look at how to build a tree with snip markers and then combining the two we’ll also look at dating the branching points in the tree and then what happens when we combine the S T ORS and the snips with known genealogy and finally we’ll take a look at opportunities for the years ahead so here is the presentation so let’s start off with building a tree with St ORS and here are the recent DNA project results and this is hosted on world families network and a great big thank you to Terry abortion for the wonderful work that he does for wfm I particularly like men I work with both Family Tree DNA format and with wfm format I like the wfn format because it has the haplogroup modal haplotype up here the haplogroup subgroup or 1v1 and that’s very useful because it gives you a very nice distinctive colored pattern for lineage 1 which are the English Leeson’s and lineage 2 which are the Irish Gleason’s down here you see the two distinctive patterns are very very different but if we just look at the first 12 markers in lineage 2 of this Gleeson project and here they are here you can see that here is the modal haplotype for this particular lineage and about 4 people that’s number one three four and five are an exact match with with the modal haplotype now if we move down a little bit we see that there’s a mutation here at marker 3 8 5 B and we can imagine that that is an offshoot from the modal haplotype there and if we move down a little bit further we see here that there are what we see

here actually that there are two two more mutations which can be an offshoot from the luteal haplotype represented by this branch here if we move down a little bit further then this would be an offshoot here and move down a little bit further to these three mutations then this would be a sub branch of the previous branch so in this way we’ve generated a mutation history tree based on 12 markers now we can do the same for 37 markers and here the 37 marker results and we get some new branches at 37 markers indicated by these pink lines here but the big question for me is that suddenly we start getting these parallel mutations appearing here’s one for cdy a there’s a parallel mutation and cdy B there’s another parallel mutation in for six for C on both sides of the tree and there are three parallel mutations on four five six there’s also parallel mutation in four six for B and the big question is there are lots of parallel mutations would you expect to see that many parallel mutations in this tree that we’re drawing just by hand is the resolution of the tree enough to define the branching pattern adequately and is this the best fit model have I taken the fewest number of steps to arrive at this particular mutation history tree to help answer that question we have to turn to Fluxus now how many people here have heard of Fluxus okay it’s basically a phylogenetic type phylogenetic software and this is the type of output from Fluxus and when I first looked at this cladogram I thought hold on that looks familiar is that the constellation Ursa Major and then I thought oh my goodness no these are the assembly instructions for Swedish furniture it’s like yeah I cleared that joke with you to show them okay it is a real hodgepodge and so it’s difficult to interpret what’s exactly going on here but the green lines show alternative pathways so it means that there isn’t just one best fit model for the data there’s actually several and it can be helpful to check it against the hand-drawn tree it shows the maximum parsimony version which is the least number of steps to fit the data it but it’s cumbersome fiddly it’s easy to make mistakes it’s difficult to interpret and it is time-consuming and whose Ralph Taylor who is an expert on flux of software who very very kindly drew all the Fluxus diagrams that you’re going to see in this presentation so thank you Ralph Taylor for that it’s also difficult to visualize this diagram as a family tree the type of family tree diagram that were used to seeing and also it gives all the markers equal weight and ignores the differing mutation rates that you have an ST or markers however there are several fist best fit models and there are at least eight that could be drawn from this diagram and because of that the tree isn’t it doesn’t it’s kind of free-floating there’s no single most likely option and there’s really not enough information at 37 markers to adequately define the branching pattern of the tree parallel mutations still persist so for example there’s 390 up there it could also be done here here’s C D Y B and a C D Y a and B so there are parallel notations still present in this particular model and the back mutations are also possible but they’re going to be hidden because they’ll have gone forward 600 years ago back 300 years ago and they all have lost it because it went down from 17 to 16 and then back up from 16 to 17 and that’s lost within this tree and it’s not clear which mutation came before which so how does it compare with the hand-drawn tree there’s the hand-drawn tree up above this is how the Fluxus tree has changed this branch here moves over here because they both have a similar mutation in 4 5 6 and what Fluxus has done it has imagined that the 4 5 6 mutation occurred before the 3 8 9 mutation and therefore branch number 6 is now a sub-branch of branch number 2 we put that back into the diagram move the other branches over to accommodate it and that it is the Fluxus tree based on 37 markers it’s actually relatively close to what we’ve done in our hand-drawn tree so Fluxus is very useful for comparing the two together but of course and everything else but in the tree remains exactly the same that all of these other parallel

mutations are still there now that’s a 37 markers but to get better resolution and definition of the branching pattern we have to move up to a hundred and eleven markers and this is what the plateau gram looks like now and we’ve also added in four additional members because they joined while we were busy trying to figure so out and the big question here is how do you translate a hunt this 111 marker diagram into a diagram that looks like a family history tree and there is some essential technology that you need for this a strong cup of coffee a packet of biscuits and your favorite chocolate bar this was taken in the morning the shot of the evening has a half-empty bottle of whiskey in a couple of valium you also need a pen and a paper so it is actually quite nice to get back to the basic technology of the pen and paper but here is how the Fluxus pastry with a hundred eleven markers looks compared to the previous 137 the additional branches are in green you can see that there’s a lot more markers involved there’s a lot more branches some major changes are what was branch number seven has now moved into an entirely different place the closest match was branch number eight but that is the closest branch now is branch number six and the so branch number seven has been repositioned the other thing that has been repositioned is branch number eight which used to be closest to branch number seven but is now actually closest to branch four five and twelve now there are also parallel mutations still there are four on cdy a there are three on c d YB there are two and seven one four there are two and four six one there are three on 390 but the parallel mutation that was on 456 has now disappeared as has the parallel mutation on four six four B so you see how 111 markers have really changed the configuration of the tree in a major way the other thing that has appeared is a back mutation this is the first back mutation we have seen there’s lots of parallel mutations but this is the first back one that has become evident and that is a change in this marker here for six for B it was 17 let’s now come back to 16 and it even goes forward again but in a downwards direction to 15 so that’s our first back mutation but the problem with this 111 marker cladogram is that there’s still no waiting the mutations rates vary by a factor of 400 and that is not taken into account in this particular diagram so working with James Irvine and Ralph Taylor James developed an algorithm for weighting the markers and came up with this very very elegant equation and that’s the equation there and then Ralph put this into the Fluxus software made a few adjustments and we came out with a weighted Fluxus diagram now the big difference here you see the green bars on the previous diagram indicated that there were several pathways still that could be taken that has disappeared completely we actually now have a diagram that shows a single common path there are no alternative pathways this is one single best fit model and the torso has disappeared from the previous diagram when we look at how it changes the tree this is the previous one this is the new one and it only changes certain branches in the upper reaches of the tree so the bottom branches of the tree are still relatively intact but it’s just where the more distant generations attached to the modal haplotype that has changed now during this process we also noticed that we had a transcription error so even at this advanced stage we still had a transcription error and that was corrected and the two to two branches switched around but it was okay thereafter now that was fine but it I asked myself some other questions because some of the markers behaved unusually so for example marker 3 8 9 is tested in two parts and the mutation in part one is also counted in part two so we just used part two and we for that in these fluxes diagrams but I also had a problem with these parallel mutations which seems to happen in two markers more commonly than others and those were force Explorer and C D Y and these are the multi copy markers and there’s problem with these multi copy

markers is that the mutations are not reported in the correct order so you don’t you know if you could if you had 16 16 16 16 4 4 6 4 ABCD then would be fine you’d know that each of those marker values were 16 but as supposing the last one goes from 16 down to 15 the way it will be reported is not 16 16 16 15 but 15 16 16 16 which will make you think that the mutation is in 4 6 4 a when in fact it’s in 4 6 4 D now with the 385 marker marker 3 at 5 a and B the kittler test helps to define the relative positions of 3 8 5 a and B but can you do the killer test on 4 6 4 I don’t know if you know the answer please come up and tell me I did a lot of searching I couldn’t find the answer to that question it certainly would be useful but it raises questions about the applicability of 4 6 4 in the in using it in the tree multi copy markers for sits for ABCD there are two types cytosine guanine so the 4 6 4x test can help distinguish between that if you coupled with that with the fact that you’ve got four options for 4 6 4 you doughnut which order it is you could have at least 8 possibilities for 4 6 4 I’m that marker and CDI are both fast leaves hating markers so the risk of back mutations is much higher with these and that raises questions about how likely are they to screw up the tree so what we did was we took our weighted diagram and we converted it into another diagram by removing cdy and 4 6 4 and this is what happened to our tree there’s the previous tree this is what happened with removing and I’m going to toggle between the two so you can see that branch branch number 12 this one down here that disappeared completely sorry disappeared completely into branch 4 and there were also some changes to the upper echelons of the tree so that and and there were still parallel mutations in some of the markers we also had a back mutation now in marker 390 so the question of that stage is which is more accurate the tree with cvy and force Explorer or the tree without or some version in between and I really thought about this question how likely is it that for six for cdy will screw things up so if you imagine the Gleason surname it’s Gaelic it’s been around for about a thousand years so that’s about thirty three point three generations giving thirty years for generation how many mutations would you expect to see in a thousand years in thirty three generations well the mutation rate for cdy is 0.03 five per generation that’s one point one seven six per person over a thousand years and that’s about sixteen mutations for 14 people in lineage two but the observed rate is not sixteen it’s actually four for cdy a and three for cdy B so that would imply that 12 of the 16 mutations that should be there are hidden and one can imagine that there were six mutations forward and six mutations back and that is why they are hidden but that would mean that the predictions based on C dy could potentially be incorrect 78% of the time so should we be including these markers in the tree and a similar calculation for four six for suggested that the predictions based on four six four could be been correct sixty-two percent of the time thinking about it then I thought well it’s going to be less of a problem in those branches related within the last two hundred to three hundred years because there’s got to be less time to mutate back less chance of back mutations and therefore in these lower branches they’re going to be more useful for branch defining purposes it’s going to be more of a problem with those branch more distantly related maybe six hundred or a thousand years ago more time to mutate back at a higher chance of back mutations less useful for branch defining so in the end I decided to keep the markers in keep the data in in the knowledge that the tree is likely to be less than 100 percent correct and I need to be especially wary of these mutations in the more distant reaches of the tree so to recap we’ve gone from a y12 hand-drawn tree to a wide 37 hand-drawn tree to a Fluxus tree based on 37 markers 111 markers 111 markers weighted and on 111 markers weighted without C D 1 4 6 4 and I’ve plugged the goal for the 111 markers weighted version of the tree but I know that it’s likely to be not completely correct so there are

certain caveats and limitations there are missing data because we’re looking at 111 marker tree that includes people who’ve only tested 267 markers only 37 and only 25 so Fluxus fills in the blanks and the question is is its best guess valid so that will always be an issue also there’s no adequate mutation rates for many of the markers especially those between 68 and 111 the tree is still not anchored adequately because we still have several versions version 3 a version 4 with or without cdy even at 111 markers will this reduce as more people test or as more people upgrade to 111 markers and are there still hidden back mutations in the tree that are going to screw things up also the tree may be skewed by recent mutations say within the last five to six generations because not everybody has triangulated on their most distant known ancestor and ideally this is what they should do they should test at least two known distant cousins from each family branch in order to characterize the haplotype of each md ka and that helps eliminate recently which might cloud interpretation of the Train the problem also is it’s costly it’s three hundred and thirty nine dollars for one 111 mark test and will be 678 for two the other problem that we need to keep in mind is the risk of convergence and in my Dublin talk which you can see on YouTube our stomach Dom will give a wonderful example of match at 3 out of 111 3 out of 111 time to most recent ancestor about 1600 there is paper trail genealogy and snip testing that actually confirms it goes back to 1200 the common ancestor was 1200 not 600 the team or CA estimates were out by 400 years so we do need to be very careful about convergence it happens but not that frequently thank you thankfully especially of those higher moral levels so that’s the first topic it’s about building tree with us to yours let’s look at building a tree with snips and there is a hole snip tsunami over these last couple of years here are some of the companies that have been offering snips and the big question here is fine scale snip testing the best method of determining branching patterns within a genetic family and how do we do it as cheaply and as efficiently as possible because our project members will not want to dish out an infinite amount of cash there are two presentations that will be going up on the genetic genealogy Ireland and YouTube channel I encourage you to look at them because there are excellent presentations from John Cleary and James Irvine who will be following me and they really helped to generate a lot of thinking for my presentation so thank you to both of them so working with snips presents various opportunities and challenges when you come to this type of exercise of developing a mutation history tree declaring snips there are many false positives there’s many false negatives because a lot of some missing at this constant change you’ve got known ships snips novel snips shared snips private snips you don’t get a name you just get a location when they first arrive and the snip naming process is unregulated and the same snip can have several different names also trying to make the results user-friendly is also a challenge and we need more tools more tools more tools lots of help is available however and that is the good news and you can get independent verification and interpretation of the results but here is a kind of a quick summary of the problems encountered with declaring a genuine snip there’s a problem with detection there might be no coverage at a certain area of the Y chromosome so you get a false negative a snip is present on the Y but remains undetected secondly there’s a low number of calls because there’s poor coverage of a particular area and that can give you a false negative because the snip might be present when it feels to meet the threshold criteria for detection or for calling recognition can be a problem because again the detection filter or threshold is too strict or set too high and that can give you a false negative the snip is present in the data but it’s missed by the analysis because the threshold is set too high it might be detectable by manual analysis of possible snips on bam files and James will talk about that and with people like James and Alex Williamson that do this type of BAM file reading is way beyond my chem and way beyond my interest I don’t really want to get involved in reading BAM files please don’t make me localization is a problem

as well because it can be difficult to place a particular snip on the way if they’re near the centromere if they’re in a palindromic region or if they’re in the middle of an ST or or some other type of repetitive region and that can give you both false positives false negatives the snip may be genuine but it’s position cannot be known for sure or it may vary from person to person population to population instability there are unstable snips which occur maybe frequently or as an unpredictable mutation it occurs in different subgroups that can create false positives or false negatives the snip may or may not be genuine and then you’ve got the problems of in Dells as well usually Dell is rather than ins deletions rather than surgeons they’re not snips but they are they they may or may not be genuine as well so those are the kind of problems which means that you have to always ask yourself is the snip really present or is the snip really absent and just because it is detected doesn’t mean it is there and just became it because it’s not detected just doesn’t mean it isn’t there so it creates quite a lot of confusion the other big challenge with snips is the fluid categorization snips can be either known snips oh yeah we’ve got your results we know about those snips yeah l21 we’ve known about that for some time or they can be new snips newly discovered never seen before the trouble is that a lot of new snips once they have been discovered they’ll move into the known snips category so when does that happen about what after about 6 months or 3 months or you know so there’s going to be a movement of all novel snips or new snips into the known snips oh that’s all – that’s 2 weeks old yeah that’s known it’s not new anymore the other thing is that the new snips can either be shared with other people or they can be not shared they can be totally unique just to you but then somebody else comes along and tests and they have that snip and suddenly you’re not the only one that has the only copy of the Mona Lisa somebody else has a copy of the Mona Lisa as well and has gone out of your private collection and now it’s no longer a private snip it’s actually shared by more than one person so these are my big my dad’s Big Y results if I click on the the number of shared novel snips 75 I get this lovely presentation of 59 you snips they don’t have any names they only have positions and I cannot my short-term memory doesn’t go beyond seven so eight digits are impossible for me to remember it’s very unwieldy then if you click on the little tabs here you’ll get your private snips and these I have eleven private snips here and this is compared to this person and this person has nine attends tens of private snips compared to me but again when it’s just names and no names just positions and Family Tree DNA presents the results and then as a project administrator we’ve looked at the results and we found maybe an extra to that Family Tree DNA didn’t just consider to be snips because they didn’t fall above the threshold and then the haplogroup administrators they found three other possible snips and then Alex Williamson found another two and then Michael McCarthy in the monster project found another one and then we sent it to wife fool and they come back with another four so there is a lot of different people producing different results and this is what we did makes a little project member she put it into a spreadsheet for us and she came up with these colored snips here then she was able to generate a little tree on that basis and then the z250 five project admins they have their own little tree there those are the Gleason specific snips on John Murphy’s color-coded spreadsheet and James came as well he also has an experimental tree and we featured there as well so a lot of different people produced in different interpretations of the data and Alex Williamson is is one of the gurus that there are hasn’t he done a wonderful amount of work for the community and we feature in the bottom right of Alex’s tree on dealing words at 2:55 and it’s very interesting to see the progression of this particular segment of the tree because we’ve had six of our 14 members of lineage to test on the big y and here’s what happened in January 2015 this is all this year there were two Lisa’s and while the second one was being analyzed at least none Carol were lumped together but in April the analysis was complete and now Gleason and Carol were separated and there was an extra Gleason now in the group which actually split it into two but we are only dealing with locations and not with

names if we go on to the next slide we see that now those previous locations and let’s go back they actually have turned into names so we actually have names for these this block of mutations here and we also have split what was Gleason here having private snips because more people have joined three snips have moved up from the private area into the shared area of the tree and we can see the same thing happening for Carol when we go into the next slide you see he has no snips below the by2 a53 panel but on the next slide there’s a whole bunch of ten or twelve snips that have happened because somebody some other Carol have tested and his Mona Lisa is no longer the only Mona Lisa on the block there’s actually it’s actually shared by this other Carol and that happens all the time that the snips that are trapped in private collections move up into the shared area of the tree as more people test so it’s very interesting to see that over a period of 10 months this is how that particular section of the tree has changed also alex has this wonderful feature where if you click on a name or a marker it brings up further analysis and here is what happens when we click on the a five six to nine mutation and he’s comparing these six members here at the gray is no coverage the pink is marginal coverage and my simplistic interpretation of his plus 1 star 2 star 3 star is definite snip probables not possible snip unlikely snip and it’s a very crude way of interpreting but it allows comparison with Family Tree DNA is threshold and also with Alex Williamson with them why Falls threshold and comparing it to what we see here in his diagram there should actually be 3 but I can only count two in his diagram then we have the a5 6 to 930 27:31 you can see them here and you can see that they’re shared across all six members and that is represented in the diagram here and also the a5 6 to 8 which occurs here only is missing in this listen member here and it’s only shared by 5 people and you can see that 1 2 3 4 5 there so alex has done a great job of generating this this wonderful tree but what’s missing from the tree is the private snips of each of these individuals and if we click on each any of these individuals you see that if we just click on this group here at this group of 3 the first Gleason has one definite snip and two possible ones the second person has two definite on one improbable and the third has five definite and a few possible and probable so that’s quite interesting that Alex is able to generate this list of possible probable private snips that are in these private collections and we have about 19 snips altogether 8 of which are deathless so that’s about three private snips per member and I’m going to come back to that we also sent up our Big Y results to a Zweifel and they came up with this set of novel snips best quality for one this particular individual is my dad and 11 ambiguous sniffs where there was only two a coverage was just two but it was the same base on each of those two cores and that’s all fine but how does it compare with what Alex well let’s go back to Alex’s ones and here’s Alex’s analysis of my dad’s tree and we have position one seven nine one five five six five yep that’s up there next one one eight one yep that’s there as well and the same for that one yes that’s fine and that one there great okay so alex has identified all the definite snips the best quality snips same as Y four but what about the other ones well he didn’t identify that that that that that that and Alex well that was why fooled and he didn’t identify the y full ones and why fall didn’t identify Alex’s words so who’s right who’s wrong these are just different approaches and there’s a lot of inconsistency in declaring a genuine snip are they really snips there’s different thresholds and filters a lot of snips are trapped in these private collections and will be liberated as more people test and snips become not private anymore they move up into the shared area of the tree but they will run out how many people do need to test before you lose those three private snips per person and when will it happen we’re still dealing with just locations there are no names and these will need to be translated into snip names in time and in order to do that

how do you find out if a location actually has a name well you can go to Y browse there’s probably other utilities as well but there’s probably a variety places you can go and you are being pulled left right and center so it’s kind of different strokes for different folks but I like to say different snips from different lips who is right or more accurately who has estimated correctly because these are all estimations but the end result is that a snip could be definite probable possible or unlikely and it is subject to change and Sanger sequencing and here is the first of some bold predictions I am going to make despite next-generation sequencing and the big why and similar products Sanger sequencing the traditional chip based technology will still be required chip based snip testing will still be needed to confirm or refute discoveries made by next-generation sequencing particularly in relation to probable possible and unlikely snips and we will need to have multiple deep played panels created not just for subclades not just for surnames but for genetic clusters women eat within each and every surname project ok that’s encouraging I wasn’t sure how bold that really was so combining an ST ORS and snips let’s let’s look at how we we get on when we combine the snips with our previous ST or tree and just to orientate you there the the yellow marker is branch 14 and these two people who are here are indicated by this red marker here and then the purple group are indicated by branches 2 7 and 6 and you can see them circles there are the snip results consistent well on the face of it that looks pretty good it looks pretty consistent you can imagine that the by2 8 v 3 group is placed somewhere above the gleason most recent common ancestor because it’s shared not just with lisa’s but with carols also the a5 6 to 9 block is somewhere around at least an M or CA it could be some of them might be a little bit before some of it there might be a little bit afterwards you can see that this Glisson branch splits off and that’s where it splits off there and that’s consistent to but then the a5 6 to 8 mutation occurs after that break to account for that you’d have to imagine that branchline this red arrow down here actually connects that way but then we’ve added a four five six mutation that isn’t in the branch nine and in order to account for that we have to do a back mutation from 15 to 16 in order to make it consistent with the st or data and that was the only change that was made to our st or tree but there’s several things we need to tidy it up so let’s tidy it up there we go okay so it’s tidied up and the sorry the mutation sequence is the by2 eighty-five three branch here then the IFA five six to nine block plus the Annandale that was identified by Nigel McCarthy then we have a split into the G 68 which is down here the glisten branch and then the I five six to an eight mutation with a split into the y one six eight eight zero branch and ultimately the a six six zero branch of of a branch nine these are three brothers incidentally two of whom have done the Big Y test so what do we see from there well the snips are actually further up the tree than the s two yours they tell us nothing about the branches on the left and we’re only using definite snips here we’re not using probable or possible or unlikely snips and we still have all those private snips trapped in the private collections so the next thing I’m I did was look at Nigel McCarthy’s approach and Nigel pioneered or was the first to use as far as I’m aware combining snips and des to yours and if we look at this section of history there’s a few differences in – approach to mine and the first is that there isn’t any by2 8/5 to block that’s those the paint ones because they’re in an area of poor coverage and Nigel decides no they’re below my detection threshold I’m not including them in my tree which is fair enough it’s a judgment corner secondly he’s discovered this extra marker and that’s something that nobody else has picked up again – pink snips are omitted from this particular branch he has included the private snips this time so we’ve got one there two there and five over here and he’s including sixty-seven marker data and also 111 marker data anything less than 67 is

excluded so he’s not looking at 37 markers are 25 markers but the biggest difference and I think this is really important was his starting point his starting point was the modal haplotype for the Z 255 haplogroup subgroup and mine was just the modal haplotype for my surname project and I think his approach is better so all the work I’ve done I’m going to rewrite it slightly so it’s consistent with what my tool has done but that’s for another day and another bottle of whisky in another valium so moving on to dating the branching points in the tree and I’m almost finished here Ian McDonald has written a wonderful report and he does a great explanation of timor CA calculations check it out similarly there is a TM or CA k studies project on Family Tree DNA check it out up till now we know there are branches that come off our modal haplotype line but which came first and can we place them in the correct order so here is our current tree Y full let’s look at snips first in a now it offers an analysis of timor CA estimates for snips and you can see it has a 95% confidence interval up here if you just hover over that it produces this 95% confidence interval and that’s actually minus 62 plus 50 so it’s more or less you know this is the estimate plus or minus 50% you know so it’s still quite a broad range but that’s what rifle have been able to do they also include the calculation formula so if you click on the info it brings up how they calculate in the formula and for those of you interested in snip TM or CA estimates it’s very useful so if we put this into the tree it gives us an estimate of about 750 years for the more CA up here 325 years for this particular mutation here and 50 years for this particular mutation here for this group of two of three brothers two of whom are tested on big white and so that’s reasonable it’s a reasonable estimate if we if we look at s yours then and you’ll be familiar with the tip report on your gap pages and this is a comparison between and G 21 and g-57 two members in my group and what I like to do is I like to have say 90 or 95% confidence intervals with a media mean in the middle so the mean is roughly 50% point and the 90% limits would be 5% on one end 95% on the other end and those limits mean that I’ve got a 90% probability that the actual team or CA lies within those limits it also means that I’ve got a 10% chance 5% of either end that they don’t so one in ten times I’m going to be wrong and you have to remember that probability is only probability and it can be right and it can be wrong it gives you 90% chance of it being included within those limits and a 10% chance of them being excluded so if I look for the 5% level in this list of probability it comes before one so I’m putting a damn as zero if I look for the 50% probability it’s around about generation number three I’m gonna say three and have a look for the 95 percent limit it’s around number ten and if I put those into a probability matrix and this was just looking at Y 37 markers but I’ve done it for 1225 3760 711 you notice here the genetic distance is 1 out of 12 1 out of 25 137 2 out of 67 and 6 out of 111 meaning that the 50% estimate was 17 generations at 12 markers it dropped to seven and twenty five three and thirty seven but it went back up to four at 67 and now it’s eight at a hundred and eleven and that really helps to illustrate that market tests can be very misleading you may look very close to 37 and then find you’re actually much more distant at 111 and vice-versa you can look very distant at 37 and then find you’re actually very close at 111 so it’s worthwhile upgrading people if possible to 111 markers for that reason but look at the ranges because I’ve converted these these 5% of 95% numbers into percentages we get plus one hundred and eighty six percent plus two hundred percent plus one hundred and seventy five percent plus 88 percent these are very wide ranges and also they’re more skewed towards the distant generations rather than the recent generations so it’s not a normal distribution it’s a skewed distribution and that’s important to remember ranges are wide skewed toward distant generations 111 markers gives the best estimate with the smallest upper ranges so 111 is just minus 52 plus 88 so

increasing the number of markers you test brings in the arms of your distribution but even at 100 and level 111 markers the mid value of 8 is still double the the upper limit is double the mid value so 15 is almost twice the value it was 8 so what I did then was Timor see a triangulation and this is where I put all of these figures into a matrix and I’ve done it for 50 percent 5 percent in 95 percent this is the 50 percent sheet and I’ve created a matrix of all of these numbers I’ve generated and the idea here is to triangulate because normally the tip report just says a versus B and that’s all you ever get a versus B but some of these people are in sub branches where this maybe four people joined joining up to a most common ancestor and I can compare a versus B versus C versus D B versus C versus D C versus D and by that process I can triangulate on x two most recent common ancestors so for example here is 3 this one is 3 this one there is value of six four and six depending on who you compare this one over here there’s two threes because I’m comparing the member there with the two brothers over there this one is 8 3 and 11 this one has a comparison with about nine people in us this one has a comparison with just one person 12 and so on and I can average these are comparing it with the if I convert these snip T Mercier’s two generations just for comparison this one here it goes to 11 this one here goes down to two and the one at the top goes to 25 so that’s reasonably consistent and then I can average them all out there’s the averaging process and it gives you an idea of the overall time two most recent common ancestor and by triangulating and using several pieces of data and probably getting a more accurate representation so but will additional as to your markers help refine the team or CA estimates and this is from Y fool because Y fool not just reanalyze –is your Big Y results it also gives you 495 St or marker values but 5% of them are different from Family Tree DNA some are missing and some are not detected by the next generation sequencing so there is a question mark over the reliability of these Zweifel STR markers but I’m also able to find that I have 35 mutations out of 495 between the first two members then 24 between the next two and nine between the next two this I can slot into in or to the plan Donald team or see a calculator online there’s the the link and that will allow me to put in the number of non matching markers and also fix the mutation rate there is the cumulative probability graph that is generated I’m looking for the five percent 50% and 95 percent level and I can put that into my previous chart and you can see that with 24 mutations have 495 markers the fifth percent mark is nine which is very similar to 111 then the range is six to twelve the range is reduced even further so I’ve pulled in those branches of the distribution curve even further to plus or minus one-third so that is a much more accurate estimate than what we’ve seen with the previous ones and based on that I’m allowed able to change some of these estimates within the diagram and that’s what that’s what we end up with there these team or CA estimates based on the y full 495 s2 yours now there’s only one inconsistency and the prediction here is for branching point nine and a half generations ago but then if you follow it down the next one is thirteen and the one below that is twelve and that’s because this particular estimate is based on a thirty seven marker value the one here is based on a twenty five and this one here is only thirty-seven markers so again lower marker values will probably give you less than accurate placement of your team or CA estimates and the other thing of course is that the people that have the closest Association they can go back to the genealogy and start collaborating with each other let’s combine these stos snips and genealogy and this is what you get when you add in your genealogy to each of these people and you can’t read it here but you’ll get the idea that each of these yellow blocks represents an ancestor the problem is some people haven’t supplied their ancestry so it comes back to the basics we need to

encourage our project members to supply us with their pedigree and the other one that’s of interest is this glistening one over here now this is a very early branch and listen according to the surname dictionaries is an ancient form of the term gleason there are no lessons in the 1901 census in Ireland there are no lessons in the 1911 census there is one lesson left in the Irish phonebook today we are looking at an ancient form of this particular name and if you look down at the bottom here there are only four lessons in the whole 1850 Griffiths valuation so what I also encourage to do is people to put in their most distant own ancestor profile you can see the various things there what’s particularly useful is the nickname the residence and the order of the children’s birth because Catholic naming convention will say will give you a clue to the parents first owners the father’s father’s second son’s the mother’s father and so on and so on but I’m going to finish off now because I’m a little over time even though we started late this how close are we to this combined mutation family history tree well we have something that’s approaching it but we’re not there yet and we may not be there with the current technology so is it possible well it’s possible to do something similar but I don’t think we’re at the stage yet where we will be able to substitute ancestors for DNA markers or DNA markers for ancestors let’s look at some of the opportunities though for the years ahead and the lessons we’ve learned today transcription errors are easy you need to triple-check and automate your process as much as possible more tools more tools more tools st yours well there’s lots of parallel mutations where are the back mutations they’re all hidden they’re going to be a problem a hundred and eleven markers best to find the branching pattern placement of cdy and four six four is likely to be incorrect especially in upstream generations most project members have not tested other male cousins to triangulate on their MD ka and ideally they should do so convergence may be a problem the example of three out of a hundred eleven we need more people to test we need more people to upgrade to one hundred eleven markers a Zweifel analysis liberates 495 s to yours but can you be a hundred percent sure that they are trustworthy I don’t know regarding snips it’s difficult declare a genuine snip different snips with different lips definite probable possible unlikely there’s likely to be lots of false negatives and false positives there are no names just locations and the locations are too long for me to remember naming is unregulated many snips are captain private collections and the current next generation sequencing is discovery not confirmatory and further testing with other NGS or possibly a traditional Sanger sequencing will be needed to confirm a lot of the potential snip discoveries we have made so far regarding combining STRs and snips adding snips changed the upper reaches of the tree and the snips are all still located relatively upstream st ores offer the better definition downstream stars with the modal of your haplogroup subgroup rather than the said the modal of your surname genetic cluster regarding T Mercier estimates snip based estimates work best for the distant branching points and very very useful for haplogroup projects less useful for surname projects st or based estimates have wide ranges and skewed toward distant generations even at 111 markers the upper range is still double the mid value and that’s quite a range even a 495 markers have a wide range because plus or minus 1/3 is still quite a lot in terms of combining as to your snips and geniality we need to overlay documentary data on DNA some pedigrees are not supplied and incomplete need to add M markers a possible relatedness and that’s why the MD ka profile was set up by me and we probably need to take a one name study approach but having said all that the early draft mutation history tree serves as a very useful basis for further development it will evolve over time as more people test and upgrade it will facilitate collaboration between project members and it also help attract new project members to your project if you have something like this up on your project blog I’m going to end now with my vision 2020 where will we be in five years time and I’m good to give you some bold predictions who wants to hear some bold predictions ok let’s start off with the first one what would happen if everyone upgraded 211 markers well I think you get better definition of the brand pattern and you get more precise TM or CA estimates with a narrower range if everyone did the Big Y the snips would

only really be good for the upstream branches of your family tree say up to about 1500 AD but we will run out of private snips at some point in time what would happen if everyone tested on a surname specific panel well again it would elucidate the branching pattern maybe up to bed 1500 but much later than that I’m not sure supposing everyone did whole genome sequencing what would happen then well would that be any better than the Big Y I mean we’re there already aren’t we with a big why with whole genome sequencing even though we can only sequence 48 percent of the chromosome would more advanced technology give us better coverage better read lengths would it make a difference would it actually reduce snips that were ancestrally informative and what would happen with the probable possible and unlikely snows well here are my bold predictions and this is to help stimulate discussion because I came here to learn and if you agree with these bold predictions I wanted to come up to me and buy me a pint and say why you agree with my predictions and if you don’t agree with my predictions I want you to come up to me and I’ll buy you a pint and now you can tell me why you disagree with my predictions but here’s the first one what is the most useful for cernium projects more snips or more st ORS i say more st ORS because we will run out of private snips and it’s not a hundred and eleven markers versus fifty thousand it’s actually five hundred markers 500 STR markers versus about 40 because the average number of snips shared by any two people in this room is about 40 not 50,000 but the average number of markers shared between any two people from an ST or point of view is about 500 we all have the same 500s to your markers so the first boat prediction in 2020 Family Tree DNA will offer 500 s yours for $199 not only that they will be in ten panels and you can order any panel in any order second bold prediction how do we best generate a certain and specific snip panel well how many discovery big wide tests are needed to liberate sufficient private snips to adequately define the surname panel I would say roughly about five to ten I have no idea I’m just throwing it out there and I might be wrong I don’t know but in the absence of anything else I’m throwing it out there is a prediction and we need that would mean for my Gleeson lineage too we’ve tested six we need another couple of big white tests among that group to liberate enough private snips to define our specific surname panel so by 2020 Family Tree DNA will offer over 4,000 surnames specific snip panels for $100 each that’s slightly wrong because it shouldn’t be surname specific it’s actually lineage specific within surnames the last thing is in 2020 this is what the gap pages will look like they’re actually very similar to what we currently have you know if it ain’t broke don’t fix it right but if you look down this list you can see it says here generate mah tree generate a mutation history tree if you click on that button it brings up a drop-down menu and asks you lineage 1 2 3 or 4 if you click on lineage 2 it brings up the lineage 2 mutation history tree you guys are absolutely fantastic and I mean that sincerely because this is a huge collaborative effort and if it wasn’t for Bennett and Max none of us would be here today so Bennett max without you there is no us so thank you very much for that these were the other people that have helped Janine cloud an absolute stalwart the entire family tree DNA team Judi class and micro admin Lisa little project member James Irvine Ralph Taylor John Cleary and all the rest of people there the entire genetic genealogy community so I just want to say thank you very much for your kind attention thank you very much dr. Gleason because we ran over a few minutes and because there’s some similarity and overlap between this presentation and the next presentation we’re going to hold questions for lunch

You Want To Have Your Favorite Car?

We have a big list of modern & classic cars in both used and new categories.