|
Computational and Analytical Molecular Evolution Lab at CARB | Nexplorer | |||||
|---|---|---|---|---|---|---|---|
| People | Publications | Software | Opportunities | Links | Internal | ||
| EF introns | Contents of this page | HSP70 amino acids |
|---|---|---|
|
Alignment slice visualized by nexplot
Click image to view PDF in a separate window |
NEXUS file (pruned from original
for display purposes)
block names in bold; commands underlined |
|
|
#NEXUS
BEGIN TAXA;
DIMENSIONS ntax=26;
TAXLABELS O_volvulus_AAB64227.1 O_volvulus_AAB64226.1 C_elegans_AAF39759.1 C_elegans_AAA83577.1
S_cerevisiae_CAA89634.1 C_albicans_AAC12872.1 S_pombe_CAB57444.1 N_crassa_AAA63780.1 M_musculus_AAA40121.1
C_capitata_AAA57249.1 D_virilis_CAA32060.1 D_erecta_AAF23595.1 D_orena_AAF23594.1 D_teissieri_AAF23599.1
D_yakuba_AAF23598.1 D_melanogaster_AAF50095.1 D_mauritiana_AAF23597.1 D_sechellia_AAF23596.1
D_simulans_CAA33720.1 Z_mays_AAB49913.1 O_sativa_AAC14464.1 O_sativa_AAC14465.1 A_thaliana_AAF99769.1
P_tremuloides_AAD01605.1 A_thaliana_BAB09468.1 A_thaliana_AAD29823.2;
END;
BEGIN CHARACTERS;
DIMENSIONS ntax=26 nchar=30;
FORMAT datatype=protein gap=- missing=?;
CHARLABELS 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
113 114 115 116 117 118 119 120;
MATRIX
M_musculus_AAA40121.1 QGTIHFEQKASGE--PVVLSGQITGLTE-G
C_capitata_AAA57249.1 KGTVHFEQQDAKS--PVLVTGEVNGLAK-G
N_crassa_AAA63780.1 KGTVIFEQESESA--PTTITYDISGNDPNA
--stuff deleted here--
D_simulans_CAA33720.1 KGTVFFEQESSGT--PVKVSGEVCGLAK-G
S_cerevisiae_CAA89634.1 SGVVKFEQASESE--PTTVSYEIAGNSPNA
S_pombe_CAB57444.1 SGVVTFEQVDQNS--QVSVIVDLVGNDANA;
END;
BEGIN ASSUMPTIONS;
WTSET MySoapWeights (VECTOR) = 1 1 1 1 1 1 1 1 0.83 0.8 0.8 0.8 0.8 0.8 0.71 0.71 1 1 1 1 1 1 1 1
1 1 1 1 1 1;
END;
BEGIN TREES;
TREE "Cu-Zn Superoxide Dismutase" = (((((O_volvulus_AAB64227.1:0.31741,O_volvulus_AAB64226.1:0.13498):
0.20268[1],(C_elegans_AAF39759.1:0.14579,C_elegans_AAA83577.1:0.27311):0.2533[1]):0.12655[0.98],
((S_cerevisiae_CAA89634.1:0.28255,C_albicans_AAC12872.1:0.25631):0.08358[0.91],(S_pombe_CAB57444.1:
0.3159,N_crassa_AAA63780.1:0.1635):0.11954[0.97]):0.17514[1]):0.08988[0.77],(M_musculus_AAA40121.1:
0.49149,(C_capitata_AAA57249.1:0.18945,(D_virilis_CAA32060.1:0.11453,(((D_erecta_AAF23595.1:0.00661,
D_orena_AAF23594.1:0.00769):0.00497[0.92],(D_teissieri_AAF23599.1:0.004,D_yakuba_AAF23598.1:0.01012):
0.0073[0.87]):0.01271[0.88],(((D_melanogaster_AAF50095.1:0.00836,D_mauritiana_AAF23597.1:0.00552):
0.00203[0.28],D_sechellia_AAF23596.1:0.01103):0.00398[0.7],D_simulans_CAA33720.1:0.00595):0.00739[0.75]):
0.11795[1]):0.11754[1]):0.12932[1]):0.10326[1]):0.0712[0.9],(((((Z_mays_AAB49913.1:0.05142,
O_sativa_AAC14464.1:0.09031):0.02799[0.98],O_sativa_AAC14465.1:0.06915):0.05245[0.99],
(A_thaliana_AAF99769.1:0.17064,P_tremuloides_AAD01605.1:0.1075):0.08023[1]):0.08596[1],
A_thaliana_BAB09468.1:0.46052):0.06401[0.75],A_thaliana_AAD29823.2:0.42442):0.14252[0.94]);
END;
|
| Left, phylogeny of a subset of Cu-Zn Superoxide dismutases. Right, slice of the protein sequence alignment. Above right, alignment column numbers with histogram of reliability scores. Nextool was used to extract this subset of proteins ("OTUs") and columns ("characters") from a larger file. Various features of the plot produced automatically by nexplot are customizable. The PDF linked to the above image (click to view) has right- (rather than left-) justified OTU names, a wider tree, and more space between the lines. |
Nexplot and nextool can be used together to create customized publication-quality views of character data in a phylogenetic context. Nexplot has a variety of settings which you can read about in doc/nexplot.html (the perldocs). Importantly, the output of nexplot is PostScript, which means that the graphic elements all have infinite resolution. PostScript figures can be converted into graphics in other formats such as jpg or gif if necessary. So, to summarize, the advantages of nexplot are:
perl -MCPAN -e 'install Bio::NEXUS'
| Status | Nexplot | Nextool & Bio::NEXUS (API) | nexplorer (server) |
|---|---|---|---|
| Tested |
|
|
|
| Untested |
|
|
|
| Planned |
|
|
|
Though it is not widely appreciated, there already exists a sophisticated methodological framework for comparative analysis, developed over the past 40 years by systematists and evolutionary biologists, in which differences are interpreted according to probabilistic models of evolutionary divergence on a branching tree. The basic methods and concepts of comparative evolutionary biology, originally developed for morphological characters, can be applied directly to any kind of character (discrete or continuous, so long as it fits the character state data model).
In the ongoing quest to improve the accuracy and reliability of functional inferences, it is inevitable that the bioinformatics/genomics community will come to rely on these more sophisticated methods. This transition will require automatable tools for phylogenetic analysis and character reconstruction (which already exist to a large degree), portable and flexible formats for data exchange, infrastructure to facilitate integration, and better education about how to integrate probabilistic evolutionary reasoning into genome interpretation.
The NEXUS file format of
Maddison,
Swofford & Maddison, 1997
(Systematic Biology 46:590-621)
was developed to facilitate the communication and storage of data for comparative
analysis. We see it as a first step in developing a widely useful
standard. We use a slightly modified version of NEXUS called
"SPANDEX" as the exchange format for our own System for
Phyloinformatic ANalysis (SPAN).
The NEXUS file format for comparative data
The NEXUS format conveys data organized according to the
character state data model, in which the features of
operational taxonomic units (OTUs)
(e.g., species, individuals, genes, genomes, etc.)
are observable states of underlying homologous
characters.
For instance, in a protein sequence alignment, proteins are the OTUs,
alignment columns are characters,
and amino acids (or gaps) are states.
In evolutionary analysis, it is typical to consider differences as
the result of state transitions that take place on branches of
a tree, therefore the NEXUS file
provides a means to represent a tree (in the standard Newick (a.k.a. New Hampshire) format).
The syntactic structure of a NEXUS file is as follows:
#NEXUS
begin < blockname >;
< command > < argument > [additional argument];
[ < another command with args >; ]
end;
[ < another block with commands > ]
Each of the pre-defined types of public blocks may appear only once.
The TAXA block is the only necessary block. There are some
restrictions on the ordering of blocks, and on the ordering of
commands within a block.
Application-specific "private" blocks are also possible. NEXUS
keywords are not case-sensitive. We put names of BLOCKS in upper case here for
mnemonic purposes.
| Name | Description |
|---|---|
| TAXA | specifies OTUs in data set |
| CHARACTERS | specifies characters |
| SETS | assigns names to sets of characters or OTUs |
| ASSUMPTIONS | specifies assumptions for an analysis |
| CODONS | specifies codons and their genetic codes |
| Name | Block | Description |
|---|---|---|
| CharLabels | CHARACTERS | label for a character (column) |
| StateLabels | CHARACTERS | label for a state (the type of an instance of a character) |
| CharStateLabels | CHARACTERS | combined label for a character and its states |
| CharSet | SETS | give a name to some set of chars |
| TaxSet | SETS | give a name to some set of OTUs |
| GeneticCode | CODONS | specify a genetic code |
| CodeSet | CODONS | associate a code with a CharSet or TaxSet |
| Tree | TREES | specify a Newick tree |
This set of files should be highly useful to anyone else developing
software for NEXUS. However, please do not assume that it is
exhaustive.
Extensions to NEXUS
The NEXUS file, as envisioned by
Maddison,
Swofford & Maddison, 1997, is quite flexible. For instance, it is
possible to define an application-specific private block
containing commands to be read by one application but not by others.
However, there are two modifications to the public blocks that are implemented in our current library:
The context of this is that we do some special coloring of OTUs, characters, and nodes/branches based on taxonomic assignments of OTUs. To carry this out, we invented a private SPAN block. This block associates OTUs with a taxonomic division. Nexplot interprets this to mean that the OTUs are to be colored according to a hard-coded scheme in SpanBlock.pm.
There are two problems with this. One is that it does not leverage the current NEXUS format by defining named sets in the SETS block. The second problem is that it makes the coloring a private matter known only to the SPAN block and the Perl module that reads it.
Let's start just by considering OTUs. Here is a generic method:
begin SETS; [NEXUS SETS block used to define sets] otuset animals = fish dog cat mouse; otuset plants = corn geranium; otuset fungi = yeast; end; begin DISPLAY; color animals red; color plants green; color fungi blue; end;The syntax for the color command would be:
color < otuset > [ scope ] < color_choice >; color_choice = < named_color > | (rgb) < rgb_vec > scope = [ all | names | data | tree ]where the otuset must be named in SETS; the color is either named or given as Red-Green-Blue values; and the scope defaults to "all", with "names" = color OTU display names, "tree" = propagate up from these OTUs by consensus, "data" = color data rows for these OTUs.