Notes

Notes

The main purpose of this database is to provide an online up-to-date tool for accessing human mtDNA sequences of the full length. The current version of the database contains only static pages, one for each full sequence, with links to parent and child nodes. Future implementations are planned to include about 200000 published HVS sequences, that requires use of SQL server. Although our database includes information about all known ancestral nodes, from the mtEve to minor subclades, it is more targeted at the "leaf" portion of the human mtDNA phylogeny, depicting connections between closely related particular sequences. We recommend the site phylotree.org (van Oven 2009) as a more friendly tool for viewing top-level branches of the mtDNA tree, including all universally recognized haplogroups and motifs and thousands of newly proposed ones. In general, we do not distinguish between haplogroup names known before the phylotree.org project was started and the names they introduced first, since most people working in the field use phylotree.org as the universal reference.

All changes in sequences are specified relative to the CRS. By default, the range is the entire human mtDNA molecule (positions 1 - 16569). If the base letter is omitted, the mutation is transition. For transversions and insertions changed bases are shown explicitly.

Differences from the CRS used when building the tree are shown in the field "Sequence". The field "Ignored" contains differences not taken into account due to their phylogenetic unsignificance. Artificially added differences (e.g. those from resolved ambiguous states) are repeated in the field "Added".

Artificial taxa (i.e. ancestral nodes) can be easily recognized by their frequency (always 0), by the presence of descendant nodes and absence of "Ignored" and "Added" differences.

The field "Differences from the parent taxon" contains phylogenetically reconstructed mutations relative to the parent node. Changes towards the CRS are marked with (!). For the changes from a non-CRS state to another non-CRS state the sign "(!!)" is used.All mutations except for the control-region and intergenic ones are followed by the short annotations in brackets: "t" indicates changes in tRNA, "r" - in rRNA, "s" and "ns" are used for synonymous and non-synonymous mutations in protein genes. If two genes overlap, there can be specified more than one letter for the corresponding positions. For the indels resulting in frame shifts we use "f" letter while other indels are marked by "i" and "d" respectively.

Age calculations are based on the Rho-Statistics, see Saillard 2000. For mutations in the entire molecule we use the rate from Soares 2009 paper. For synonymous substitutions we use improved value from Loogvali 2009 which is based on the results from the former. The whole coding-region ages are calculated according to Mishmar 2003. Each SE value here is just the product of the corresponding Sigma and Rate values, so the error caused by rate calculations is not taken into account. Please note that these ages should be considered as raw data because Rho-Statistics usually needs a sophisticated postprocessing stage to correct some flaws resulting from incomplete source data.

For constructing the mtDNA tree we used parsimony algorithm, consequently adding new sequences representing different deep branches and only after building the "skeleton" tree, each minor branch was "populated" by the available complete sequences. In most cases we accept the topology and haplogroup motifs suggested by phylotree.org, sometimes with subtle interventions on the normal MP algorithm workflow to prevent it from generating optimal trees violating the topology already inferred from the data other than complete sequences. However, in some minor details our tree differs from one of M.van Oven and M.Kayser and we believe that the difference in our source data (rather than in the principles) more contributes to this. Since the Build 3, all cases where the resulting topology or inferred nucleotide states differ from those suggested by our machine algorithm are traced and reported in the field "Remark".