Time Calculations

The time calculations presented on the migration history page are very rough estimates using
actual number of marker mutations seen within families in recent centuries as the starting point.  Then
these findings are used to determine what period of time earlier it would be when finding additional
shared mutations.  However, as one goes farther back in time, markers begin to pick up mutations
back to an original value which hide two mutations.  So one must make adjustments to counter this
situation as one finds more and more mutations.

One cannot provide an accurate estimate of how many mutations might become hidden in deep
ancestry.   But using the actual date from scientific dating of an Alaskan skeleton found to be about
10,000 yrs old -- and knowing this person by testing belongs to haplogroup Q -- and knowing that haplo G
and Q both are offshoots of haplogroup F-- one finds that G & Q persons have about 67 mutations at
67 markers when compared to F persons.  Presuming that haplo Q developed in Asia before the
migration and that the migration is thought to be 12,000 yrs ago based on archeological information,
perhaps a minimum of 136 mutations would be expected rather than the usual 67.  So about half
of the mutations may be hidden when getting back to a shared common ancestor 12,000 yrs ago.  

My methodology is certainly not based on a rigid scientific formula.  There are reasons, however,
for using a rough estimate methodology.

One of the formal ways of calculating time relationships based on mutations makes use of the specific
mutation rate calculated at each marker. 

The available mutation rates calculated for Family Tree DNA's markers 38 to 67 at
http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/MR.htm
are apparently too slow in the listings of G persons available to me.  Rates for the first 37
markers are also listed there.

The first 37 markers have a cumulative mutation rate of .18208 and markers 38 to 67 have .05276.

Thus the second section had a cumulative mutation rate about a third of the first section.  Yet
the available G samples have a higher percentage than this in markers 38 to 67.

So it is problematic using a formula based on mutation rates at each marker.

Men experience mutations at the various markers within a range of possibilities.  Mutations can
result unpredictably  in either a decrease or increase in value.  So men sharing the same ancestor
will have a collection of different numbers of shared mutations when comparing two samples.
Some collection of mutations will move a man unsually far away from another man.  And by
chance the collection of mutations may move a man unusually close to another man.  The
excesses seen in an individual can be overcome by looking at all the men within a specific
group and comparing them to another group.  The excesses will become the far ends of the
range of number of mutations when comparing the groups.  It is the midpoint of the range that
is most likely closest to the actual number of mutations that should have occurred in the time
period since a shared ancestor lived.   When comparing persons descended from the same ancestor
who lived 3000 yrs ago, for example, it is quite typical to find some men showing 20 values different
at 67 markers, and others with 30 values different and others with 40 values different.  The more
likely number of mutations that should have occurred in that time period would be about 30 -- the
midpoint.

Using 17 markers in a 2009 study of the Jewish priesthood genetics the calculated
time to most recent common ancestor was about 3000 yrs with variation approximately 
plus or minus 1000 yrs.  The variation range increases the farther back in time.

Most published research that provides time calcuations based on differences in marker results have
used less than 10 markers.  These small marker sets are particularly prone to distorted results because
a multi-step mutation in just one marker will suggest significant more time separation than exists.  This
single-marker problem will be vastly diluted if 67 markers are available.  The problem of small marker
sets makes time calculations in most published studies a bit suspect to me.  For this reason, the
67-marker set from Family Tree DNA is the most desirable set to use. In earlier attempts, the use
of 37 markers did not produce as accurate information as sometimes men can have unusual
numbers of mutations in either direction in markers 38 to 67.

My rough estimate method using 67-marker comparisons:
(a) for about the last 1000 yrs makes use of known times to common ancestors and does not
correct for hidden mutations which are yet uncommon
(b) When 16 mutation differences are reached, 2 mutations are added to the total
Then additional mutations are added in increasing quantities until one reaches 67 mutations.
These 67 mutations represent the average number of mutation differences in comparing G
and F haplogroup men.  It was assumed in a very conservative manner that these 67
mutations occurred prior to the arrival of the first haplogroup Q person in the Americas.
There is a dated Q skeleton from over 10000 yrs. ago and Q and G halogroups are both
descended from a haplogroup F person.

Not only do the added mutations accelerate due to increasing numbers of hidden mutations, but
the range of possibilities for the actual year likewise grows due to the increased uncertainty.  
Any dates calculated are merely the midpoints of a range.

Here are some methods used for time calculations based on number of mutations.  I have some
problems with each of them that purport to provide a correct date as to when the common
ancestor lived:

The methodology used in most studies in journals make reference to the research
produced by a Russian scholar and co-authors:
http://mbe.oxfordjournals.org/cgi/content/full/18/12/2141
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1181912&blobtype=pdf
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=40439&blobtype=pdf
He explains in two of the studies the need for several hundred marker results in order to establish
reliable branch lengths in a phylogenetic network.  An, in fact, the authors seem to use
for their time calculations BATWING, a program that uses Bayesian analysis, or the AMOVA
program.

This is Tim Janzen's time to most recent common ancestor calculator using a variance calculation.
http://www.timjanzen.com/dna.html [this requires the paid version of Excel in recent computers]
with some more recent comments
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247384275
This program is based on these formulas:
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2008-02/1203007581

Marko Heinila has posted the initial results of his time studies of haplogroup G to the DNA-Forums
site.  One has to register there in order to view postings.

He says he uses as his formula m * g = 2*m1 = 2*m2 where m1, m2 are up
and down step parameters in the wikipedia description of the Skellam distribution
http://en.wikipedia.org/wiki/Skellam_distribution  After this, one has p(d) = exp(-m*g) I_d (m*g)
which is the probability of a net change of d steps happening with mutation rate m per generation
and a time span of g generations. The assumption is that the probabilities of up and down mutations are
equal and constant per unit of time.
He cites Bruce Walsh's 2001 journal article on DNA time estimates as another basis of his work.
In the calculations, there is a log likelihood contribution of log(p(d)) from each marker in each network
link,  constraints on g parameters are used to enforce increasing time on paths from leaves to
the root (root has be picked separately).  Given a net change of d, one could also calculate the distributions
of up and down mutations (which follow Poisson distribution) with the constraint that d is the difference
of the two Poisson variables.
He also calculates a 10,000-year converge time for available haplo G samples, which is not the
same as the age of haplogroup G.

In July 2009 there were quite a number of postings at the general DNA site at Rootsweb
concerning confidence intervals used in these calculations:
This one from Ken Nordtvedt is a good starting point
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247680242
then
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247070327

These are the listed mutation rates for the markers used at the YHRD database
http://www.yhrd.org/Research/Loci and the 2nd study by the Russian author above summarize
various studies of mutation rates.

Anatole Klyosov, a chemist, proposed a logorithmic methodology in his self-published
article in the first edition of the Proceedings of the Russian Academy of DNA Genealogy
(in English) in 2008 which apparently can be downloaded for free
http://www.lulu.com/content/paperback-book/proceedings-of-the-russian-academy-of-dna-genealogy-volume-1-no-1/2677603
He calculates the common ancestor of G2c Jewish group lived 575 yrs ago based on 
25 years to a generation and that this G2c group's common ancestor with ancient G lived
about 12,400 yrs before present.  He also indicates according to his method that the common
ancestor of G1 and G2 persons lived abot 8800 yrs. before present.

The article dealing with obtaining valid DNA from an Alaskan skeleton 10000 yrs old and using such
DNA info to date origins of haplogrroups is found at (not available for free)
http://www.ncbi.nlm.nih.gov/pubmed/17243155?ordinalpos=3&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum