NEXUS test set

The definitive version of this test file set is available at the NEXUS projects page on our web home. This HTML page is included with the test file set. If you are using a local copy, the version information is in your local file named VERSION.

To do the round-trip read test, you must install our NEXUS library. The test uses a simple script called test-rw-roundtrip.pl that carries out three operations:

The file is read, written out, and re-read. The object read from the original file is then compared to the object read from the written-out file. The files cannot be UNIX-diff'ed because whitespace is irrelevant and the data include sets whose members have no specific order. The implementation of nexus object comparison uses block-specific methods to determine whether two blocks of the same type have the same content. So far, I have only seen one case where this method of testing fails to uncover an error (error reading branch lengths in sci notation in a tree). The plot test requires visual examination of the output to see that it makes sense. This is aided by using discretized branch lengths and other tricks that make it easier to determine if the content is being displayed correctly.

Some notes

NEXUS files with imaginary character data and a taxa block

These files present various minor challenges such as comments, quoted strings, multiple trees, etc. They are all based on files with TAXA, CHARACTERS, and TREES blocks. Nearly all of them are derived from the basic-bush or basic-ladder file.
readplotTestDescription
passpassbasic-bushsymmetric bifurcating tree, all branch lengths = 1
passpassbasic-rakebasal polytomy, all branch lengths = 1
passpassbasic-laddermaximally asymmetric tree, branch lengths = time
passpasslong-namesvery long names in various places (OTU, char, tree labels)
failNDquoted-strings1OTU name "OTU C", charlabel "Char 3", tree name "the ladder tree"
failNDquoted-strings2OTU name 'OTU C', charlabel 'Char 3', tree name 'the ladder tree'
passpassradical-whitespaceridiculous but legal use of whitespace
failNDtop-level-commentcomment at the top level
passpassintrablock-commentcomments within every block
passpassmultiline-intrablock-commentmulti-line comments within every block
passpasschar-matrix-spaceschar data are in blocks separated by spaces
passpasschar-interleavechar data are in interleaved format
passpasstrees-translatebush tree with translated names
passpasstrees-tree-bushbasic bush
passpasstrees-tree-ladderbasic ladder
passpasstrees-tree-rakebasic rake
passpasstrees-tree-bush-cladogrambush, no branch lengths
passpasstrees-tree-ladder-cladogramladder, no branch lengths
passpasstrees-tree-rake-cladogramrake, no branch lengths
passpasstrees-tree-bush-quoted-string-name1name of tree is "bush quoted string name1"
passpasstrees-tree-bush-quoted-string-name2name of tree is 'bush quoted string name2'
passpasstrees-tree-bush-branchlength-zerobranch to OTU C with zero length
no!NDtrees-tree-bush-branchlength-scientificsci format for length of branches to OTU B (2e+01), C (9e-01), F (9E-01), G (2E+01)
passpasstrees-tree-bush-branchlength-negativebranch to CD ancestor with length = -0.25
passpasstrees-tree-bush-inode-labelsinternal node labels
passpasstrees-tree-bush-inode-labels-partialonly some internal node labels
failNDtrees-tree-bush-inode-labels-quoted1double quotation marks around inode labels
failNDtrees-tree-bush-inode-labels-quoted2single quotation marks around inode labels
passpasstrees-tree-basal-trifurcationbasal trifurcation
passnotetrees-tree-bush-extended-root-branchbush with branch above root node
passnotetrees-tree-multipletrees "bush", "ladder" and "rake" in one trees block
warnNDtrees-tree-multiple-challengesone tree with each challenge above (see the treelist below)

the tree list

Trees are shown exactly as in the "tree" command in NEXUS trees block.

NEXUS files of dna or protein data with a taxa block

TAXA, CHARACTERS, TREES. May contain an ASSUMPTIONS block with character weigths.
readplotTestDescription
passpassSPAN_Family1nlwSPAN export file, family 1, nucleotides, weights in ASSUMPTIONS block. Internal node labels.
passpassSPAN_Family2alwSPAN export file, family 2, amino acids, weights in ASSUMPTIONS block. Internal node labels.
passpassSPAN_Family4nlSPAN export file, family 4, nucleotides. Internal node labels.
passpassSPAN_Family5alwSPAN export file, family 5, amino acids, weights in ASSUMPTIONS block. Internal node labels.
passpassSPAN_Family7nSPAN export file, family 7, nucleotides. Internal node labels.
passpassSPAN_Family8aSPAN export file, family 8, amino acids. Internal node labels.
passpassFaresWolfeCCTCCT chaperonin amino acids, tree; normalized by nextool; data from Mario Fares (Fares & Wolfe, 2003, Mol Biol Evol. 20: 1588-97)

NEXUS files of dna or protein data with NO taxa block

OTUs are defined implicitly in the data block instead.
readplotTestDescription
fail?NDUnaSmithHIV-bothHIV sequences. fusion of NEXUS data and trees blocks from separate files supplied by Una Smith. No branch lengths.
failNDbarns-combinedfusion of two files from Chuck Delwiche. No TAXA block; tree & data from separate files, names may not match
passpassBird_OvomucoidsMacClade example file. No TAXA block. Trees without branch lengths.
passpassHuman_mt_DNAMacClade example file. No TAXA block. Large data set. Trees without branch lengths. Multiple trees.
passpassKingdoms_DNAMacClade example file. No TAXA block. Trees without branch lengths.
passpassMarsupial_WolfMacClade example file. No TAXA block. Trees without branch lengths.
passpassPrimate_mtDNAMacClade example file. No TAXA block. Trees without branch lengths. Multiple trees.
failNDOmland-Oriolesnucleotide data from Kevin Omland. No TAXA, no TREES blocks. Nucleotides are written in sets of three, with spaces between each codon. Names may not match within data block.
failNDOmland-Ravensnucleotide data from Kevin Omland. No TAXA, no TREES blocks. Nucleotides are written in sets of three, with spaces between each codon.
passpassTreebase-liverwort-rbclTreebase export. rbcL sequence data with tree from analysis of liverwort phylogeny by Lewis, et al., 1997. No TAXA block. No branch lengths.
passpassTreebase-horsetails-dnadna sequence data with tree. No TAXA block. No branch lengths.
passpassTreebase-chlamy-dnadna sequence data with tree. No TAXA block. No branch lengths.

NEXUS files of non-dna non-protein data with a taxa block

readplotTestDescription
passpassSPAN_Family3ilSPAN export file, family 1, intron presence/absence data. Internal node labels.
passpassSPAN_Family6iwSPAN export file, family 6, intron presence/absence data, weights in ASSUMPTIONS block. Internal node labels.
passpassSPAN_Family9iwSPAN export file, family 9, intron presence/absence data, weights in ASSUMPTIONS block. Internal node labels.
passpassSPAN_Family10iSPAN export file, family 10, intron presence/absence data. Internal node labels.

NEXUS files of non-dna non-protein data with NO taxa block

OTUs are defined implicitly in the data block instead
readplotTestDescription
failNDTreebase-horsetails-classicclassic character data with tree. No TAXA block. No branch lengths.

NEXUS errors: non-conformant files with common errors

this section not done, to include common errors:

Data input (format conversion): alignments and trees in other formats

Sets of alignments with matching trees to be converted and plotted. In practice, both the tree and the alignment are converted to separate NEXUS files, which can be plotted by nexplot, so that we can test four steps: convert alignment or tree; plot alignment or tree; merge alignment and tree into one file; plot alignment with tree.
convertplotmergeplotTestDescription
NDNDNDNDCuZnSOD.alnCu-Zn SOD amino acid sequence alignment (ClustalW output)
NDNDCuZnSOD.dndClustalW guide tree (ClustalW output)
NDNDNDNDTPI.alnTPI amino acid sequence alignment (ClustalW output)
NDNDTPI.dndClustalW guide tree (ClustalW output)
NDNDNDNDMnFeSOD-rename.phyMn-Fe SOD amino acid sequence alignment (ClustalW output). Names were simplified prior to running clustalw to avoid problems by the limit on name lengths in Phylip files.
NDNDMnFeSOD-rename.dndClustalW guide tree (ClustalW output)
(NA)NDNDNDUnaSmithHIV-data.nexHIV sequences alignment
(NA)NDUnaSmithHIV-tree.nextree for HIV sequences, no branch lengths
NDNDNDNDSSU_Mito_rep.gbfrom RDBII; representative mito SSU sequences in GenBank format
NDNDSSU_Mito_rep.newickfrom RDBII; tree with representative mito SSU sequences; labels in tree have embedded data

Data input: tabular data with tree in adjacency-table format

not done: data in the form of text table, tree in the form of adjacency table (e.g., like microarray data)