Time
Calculations
The time calculations presented on the
migration history page are very rough estimates using
actual number of marker
mutations seen within families in recent centuries as the starting point.
Then
these findings are used to determine what period of time earlier it
would be when finding additional
shared mutations. However,
as one goes farther back in time, markers begin to pick up mutations
back to
an original value which hide two mutations. So one must make adjustments
to counter this
situation as one finds more and more mutations.
One cannot provide an
accurate estimate of how many mutations might become hidden in
deep
ancestry. But using the actual date from scientific dating
of an Alaskan skeleton found to be about
10,000 yrs old -- and knowing this
person by testing belongs to haplogroup Q -- and knowing that haplo G
and Q
both are offshoots of haplogroup F-- one finds that G &
Q persons have about 67 mutations at
67 markers when compared to F
persons. Presuming that haplo Q developed in Asia before the
migration
and that the migration is thought to be 12,000 yrs ago based on
archeological information,
perhaps a minimum of 136 mutations would be
expected rather than the usual 67. So about half
of the mutations
may be hidden when getting back to a shared common ancestor 12,000 yrs
ago.
My
methodology is certainly not based on a rigid scientific formula. There are
reasons, however,
for using a rough estimate methodology.
One of the formal ways of
calculating time relationships based on mutations makes use of the
specific
mutation rate calculated at each
marker.
The available mutation rates calculated for Family Tree
DNA's markers 38 to 67 at
http://freepages.genealogy.rootsweb.ancestry.com/~geneticgenealogy/MR.htm
are apparently too slow in the listings of G persons
available to me. Rates for the first 37
markers are also listed there.
The
first 37 markers have a cumulative mutation rate of .18208 and markers 38 to 67
have .05276.
Thus the second section had a cumulative mutation rate about
a third of the first section. Yet
the available G samples have a higher percentage
than this in markers 38 to 67.
So it is problematic using a
formula based on mutation rates at each marker.
Men experience mutations
at the various markers within a range of possibilities. Mutations
can
result unpredictably in either a decrease or increase in
value. So men sharing the same ancestor
will have a collection of
different numbers of shared mutations when comparing two samples.
Some
collection of mutations will move a man unsually far away from another
man. And by
chance the collection of mutations may move a man
unusually close to another man. The
excesses seen in an individual can
be overcome by looking at all the men within a specific
group and comparing
them to another group. The excesses will become the far ends of
the
range of number of mutations when comparing the groups. It is the
midpoint of the range that
is most likely closest to the actual number of
mutations that should have occurred in the time
period since a shared
ancestor lived. When comparing persons descended from the same
ancestor
who lived 3000 yrs ago, for example, it is quite typical to find
some men showing 20 values different
at 67 markers, and others with 30 values
different and others with 40 values different. The more
likely number
of mutations that should have occurred in that time period would be about 30 --
the
midpoint.
Using 17 markers in a 2009
study of the Jewish priesthood genetics the calculated
time to most recent
common ancestor was about 3000 yrs with
variation approximately
plus or minus 1000 yrs. The
variation range increases the farther back in time.
Most published research that provides time calcuations
based on differences in marker results have
used less than 10 markers.
These small marker sets are particularly prone to distorted results
because
a multi-step mutation in just one marker will suggest
significant more time separation than exists. This
single-marker problem will be vastly diluted if 67 markers are available.
The problem of small marker
sets makes time calculations in most
published studies a bit suspect to me. For this reason, the
67-marker
set from Family Tree DNA is the most desirable set to use. In earlier
attempts, the use
of 37 markers did not produce as accurate information as
sometimes men can have unusual
numbers of mutations in either direction in
markers 38 to 67.
My rough
estimate method using 67-marker comparisons:
(a) for about the last 1000
yrs makes use of known times to common ancestors and does
not
correct for hidden mutations which are yet uncommon
(b) When 16
mutation differences are reached, 2 mutations are added to the total
Then additional mutations are
added in increasing quantities until one reaches 67
mutations.
These 67 mutations represent the average number of mutation
differences in comparing G
and F haplogroup men. It was assumed in a
very conservative manner that these 67
mutations occurred prior to the
arrival of the first haplogroup Q person in the Americas.
There is a dated Q
skeleton from over 10000 yrs. ago and Q and G halogroups are both
descended
from a haplogroup F person.
Not only do the added mutations accelerate
due to increasing numbers of hidden mutations, but
the range of possibilities
for the actual year likewise grows due to the increased
uncertainty.
Any
dates calculated are merely the midpoints of a range.
Here are some methods used for time calculations based
on number of mutations. I have some
problems with each of them that
purport to provide a correct date as to when the common
ancestor
lived:
The methodology used in most studies in journals make reference
to the research
produced by a Russian scholar and co-authors:
http://mbe.oxfordjournals.org/cgi/content/full/18/12/2141
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=1181912&blobtype=pdf .
http://www.pubmedcentral.nih.gov/picrender.fcgi?artid=40439&blobtype=pdf
He
explains in two of the studies the need for several hundred
marker results in order to establish
reliable branch lengths in a phylogenetic network.
An, in fact, the authors seem to use
for their time
calculations BATWING, a program that uses Bayesian analysis, or the AMOVA
program.
This is Tim Janzen's time to most recent common ancestor calculator
using a variance calculation.
http://www.timjanzen.com/dna.html [this
requires the paid version of Excel in recent computers]
with some more recent comments
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247384275
This
program is based on these formulas:
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2008-02/1203007581
Marko Heinila has posted the initial results of his time
studies of haplogroup G to the DNA-Forums
site. One has to
register there in order to view postings.
He says he uses as his formula m * g = 2*m1 = 2*m2 where m1, m2 are up
and
down step parameters in the wikipedia description of the Skellam
distribution
http://en.wikipedia.org/wiki/Skellam_distribution
After this, one has p(d) = exp(-m*g) I_d (m*g)
which is the
probability of a net change of d steps happening with mutation rate m per
generation
and a time span of g generations. The assumption is that the
probabilities of up and down mutations are
equal and constant per unit of
time.
He cites Bruce
Walsh's 2001 journal article on DNA time
estimates as another basis of his work.
In the calculations, there is a
log likelihood contribution of log(p(d)) from each marker in each network
link, constraints on g parameters are used to enforce increasing time
on paths from leaves to
the root (root has be picked separately). Given
a net change of d, one could also calculate the distributions
of up and down
mutations (which follow Poisson distribution) with the constraint that d is the
difference
of the two Poisson variables.
He also calculates a 10,000-year
converge time for available haplo G samples, which is not the
same as the age of
haplogroup G.
In
July 2009 there were quite a number of postings at the general DNA site at
Rootsweb
concerning confidence intervals used in these calculations:
This
one from Ken Nordtvedt is a good starting point
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247680242
then
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2009-07/1247070327
These
are the listed mutation rates for the markers used at the YHRD database
http://www.yhrd.org/Research/Loci and the 2nd study
by the Russian author above summarize
various studies of mutation rates.
Anatole
Klyosov, a chemist, proposed a logorithmic methodology in his
self-published
article in the first edition of the Proceedings of the Russian
Academy of DNA Genealogy
(in English) in 2008 which apparently can be
downloaded for free
http://www.lulu.com/content/paperback-book/proceedings-of-the-russian-academy-of-dna-genealogy-volume-1-no-1/2677603
He calculates the common ancestor of G2c Jewish group
lived 575 yrs ago based on
25 years to a generation and that this G2c group's
common ancestor with ancient G lived
about 12,400 yrs before present.
He also indicates according to his method that the common
ancestor of G1 and G2
persons lived abot 8800 yrs. before present.
The
article dealing with obtaining valid DNA from an Alaskan skeleton
10000 yrs old and using such
DNA info to date origins of haplogrroups is
found at (not available for free)
http://www.ncbi.nlm.nih.gov/pubmed/17243155?ordinalpos=3&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum