Welcome to the web home of the Stoltzfus research group at The Institute Formerly Known as CARB.
The state of the art in re-usable trees
We recently completed an analysis of current practices for archiving trees and associated data, current practices for re-using trees, and barriers to re-use experienced by users (Stoltzfus, O'Meara, Whitacre, Mounce, Kumar, Rosauer & Vos). The results of this analysis have helped us to think strategically about how technology and standards can be used to facilitate data re-use, so as to promote integrative and synthetic science.
The project started at the 2010 TDWG meeting in Woods Hole, where the Phylogenetic Standards interest group held a workshop. Dan Rosauer, Jamie Whitacre, Torsten Eriksson and I wanted to assess the current state of the art in publishing trees that can be linked into that big world of data out there. Over the next year, this project gathered collaborators and morphed into a larger analysis of sharing and re-use of phylgoenetic trees and associated data. Probably the most interesting thing we did was to get a more systematic sense of what is going on "in the wild" by examining several samples of randomly or arbitrarly chosen phylogeny-related papers (ones that match the term "phylogen*"). We discovered that producers of phylogenies rarely make their results easily accessible by archiving them. Most trees remain on someone's hard-drive, apparently. In spite of some interest in a MIAPA (minimum information about a phylogenetic analysis) standard, currently there are no community standards to guide users as to what kinds of data and metadata to include in order to facilitate data re-use. Users who are interested in re-using published trees face many barriers due to the difficulting of discovering, accessing, decoding, interpreting and evaluating phylogenetic results.
Nevertheless, in spite of the generally dismal state of things, we found a lot of room for optimism. While the overall rate of archiving is low, various types of information are being made available. While most studies do not rely on re-used data (other than sequences from GenBank), a large minority of studies re-use alignments or trees. We actually found 5 different studies that use the APG (Angiosperm Phylogeny Group) tree for plants via Phylomatic, which provides grafting and pruning operations so that users can make a custom tree for the set of species they wish to analyze. And of course, there are some high-profile cases of re-use, the most extensive of which is probably TimeTree, which synthesizes information from nearly 1000 publications, together with a "tree of life" (I'm putting that in quotation marks so as not to offend purists, because the tree is actually the NCBI taxonomy hierarchy), to provide users with estimated dates of divergence. TimeTree literally gets 10's of thousands of queries per month.
Our overall impression is that, due to recent developments in regard to policies, software, infrastructure, and community organizing, evolutionary informatics is poised for a great leap forward-- if a broader community of stakeholders can get involved.
