r5 - 12 Apr 2007 - 19:14:49 - HilmarLappYou are here: CAMEL >  CoreEIG Web  > DesignIssues

Aaron's Bio::CDAT implementation considerations

(not a complete list! topics for discussion!)

1) composited objects should be BioPerl-ready, if not BioPerl-native (though, as long as CDAT can itself consume and produce BioPerl-ready objects, this stricture need not apply)

2) granularity choice of the (potentially probabilistic) CharStateMatrix object is a concern for memory use and parsing/object-construction speed (lazy evaluation? binary encoding? Flyweight pattern?)

3) cardinality of composited datatypes: one or many CharStateMatrices per CDAT? one or many Trees per CDAT (or per CharStateMatrix)? Arguments exist for both ... CDS/protein "duality" case, but any others? If cardinality > 1 in any dimension, does the API become too complicated? Rutger argues that a CDAT is the intermediate in a many-to-many relationship between Matrices and Trees ... another idea for consideration: CDAT's may themselves be composites of related (sub-)CDATs ...

4) Bio::CDAT construction/IO: de novo, flatfile (Bio::NEXPL), relational (GSK::PhyloDB::CDAT); choices here affect answers to #2 above, and vice versa

5) should components implement Bio::AnnotableI? Bio::LocatableI? Bio::RangeI? Bio::LocatableSeq?

6) do CDAT objects manage mutation of underlying components? via an Observer pattern?

Email discussions

Subject: character state matrix api

Date: Wed, 12 Jul 2006 15:01:55 -0700, Rutger Vos <rvosa@sfu.ca>

Date: Wed, 12 Jul 2006 15:01:55 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: character state matrix api

Hi all,

the following is a sketch of an api for character state matrices. It inherits from Bio::Matrix::MatrixI, adds data type functionalities and nexus tokens, matrix operations as in Bio::Matrix::GenericMatrix and implements a "Bio::CDAT::ContainedObjectI" interface (only methods: get_cdat/set_cdat). All feedback welcome!

#####################################################################

New methods in Bio::Phylo::Matrices::CharMatrixI (future name: Bio::CDAT::CharMatrixI or Bio::Matrix::CharMatrixI). These methods are accessors - i.e. read only - that map onto the respective nexus tokens by the same name. The idea is that this will be easy to remember, and handy for insertion, for example, in Template Toolkit templates for nexus/html/xml writing in an MVC context:

  • datatype -- dna|rna|protein|standard|continuous|restriction
  • symbols -- array ref of single character symbols
  • missing -- the missing data symbol, usually '?' or 'N'
  • gap -- the gap symbol, usually '-'
  • ntax -- number of rows in matrix
  • nchar -- number of columns in matrix
  • charstatelabels -- column labels
  • matrix -- raw matrix as two-dimensional array

New methods for data integrity:

  • set_charstate_lookup -- set character state lookup hash
  • get_charstate_lookup -- get character state lookup hash

Methods inherited from Bio::CDAT::ContainedObjectI. The idea is that internally, the $cdat->add_matrix($matrix) method could check whether $matrix->isa('Bio::CDAT::ContainedObjectI').

  • get_cdat -- get the cdat container
  • set_cdat -- set the cdat container

Methods inherited from Bio::Matrix::MatrixI. These methods are accessors for generic matrices/tables:

  • matrix_id -- primary key
  • matrix_name -- string, nexus token legal, i.e. single quoted with
spaces or underline-separated
  • get_entry($rowname,$columnname) -- get single cell identified by
$rowname and $columnname
  • get_column($col) -- get column identified by $col
  • get_row($row) -- get row identified by $row
  • get_diagonal -- PROBABLY IRRELEVANT/IMPRACTICAL
  • column_num_for_name($name) -- get internal index of column identified
by $name
  • row_num_for_name($name) -- get internal index of row identified by $name
  • num_rows -- SAME AS "ntax"
  • num_columns -- SAME AS "nchar"
  • row_names -- character sequence names
  • column_names -- SAME AS "charstatelabels "

Methods like (but probably not inherited from) Bio::Matrix::GenericMatrix. These methods are mutators for generic matrices/tables:

  • add_row($row) -- adds row $row to matrix
  • remove_row($row) -- removes row $row from matrix
  • add_column($col) -- adds column $col to matrix
  • remove_column($col) -- removes column $col from matrix

Date: Thu, 13 Jul 2006 08:33:08 -0400, aaron.j.mackey@gsk.com

Date: Thu, 13 Jul 2006 08:33:08 -0400 From: aaron.j.mackey@gsk.com Subject: Re: character state matrix api In-reply-to: <44B57153.1060600@sfu.ca>

Rutger, I'm excited to see someone laying out an API, but it would help me (at least) to see your overall vision for class hierarchy and relationships (perhaps in UML or something more lightweight), and then start discussing API. In my comments/questions below, I'll try to infer your underlying "design" from your external API description, but I may have it wrong, so forgive me.

> New methods in Bio::Phylo::Matrices::CharMatrixI (future name:
> Bio::CDAT::CharMatrixI or Bio::Matrix::CharMatrixI). These methods are
> accessors - i.e. read only - that map onto the respective nexus tokens
> by the same name.

why read only? some of these I could imagine being "mutators" (e.g. $cdat->missing("?") would cause all N's to convert to ?'s if $cdat->missing() eq "N"). Also, if it's going to be read-only, then I would prefer to see "get_*" accessor names, with explicitly missing "set_*" mutator methods.

> * datatype -- dna|rna|protein|standard|continuous|restriction

the "original" CDAT relational data model allowed mixed datatypes (e.g. a binary intron presence/absence state could be embedded in the CDS sequence at the position at which the intron may occur); I realize that the representation of this in Nexus flat file format must be via separate matrix objects, but does that necessarily limit CDAT matrices?

> * symbols -- array ref of single character symbols

or maybe "alphabet"? related to the above, the symbols/alphabet structure may differ per column if mixed datatypes are allowed ... so this could be perhaps the "default" alphabet of the matrix, though a given column in the matrix may utilize a difference alphabet (or none at all, if continuous)

> * missing -- the missing data symbol, usually '?' or 'N'
> * gap -- the gap symbol, usually '-'
> * ntax -- number of rows in matrix

err, num_otus? or num_rows?

> * nchar -- number of columns in matrix

num_chars? num_cols? num_columns?

in general the prefix "n" is not very meaningful; one of the most loved (and hated) aspects of BioPerl? are the various method aliases that have arisen for just these differences in style and expectation.

> * charstatelabels -- column labels

I'm not sure what this is; is this related to charset's? what if a column belongs to more than one charset?

> * matrix -- raw matrix as two-dimensional array

> New methods for data integrity:
>
> * set_charstate_lookup -- set character state lookup hash
> * get_charstate_lookup -- get character state lookup hash

I don't know what a "character state lookup hash" is (well, I can guess, but probably won't be entirely correct); what did you have in mind?

> Methods inherited from Bio::CDAT::ContainedObjectI. The idea is that
> internally, the $cdat->add_matrix($matrix) method could check whether
> $matrix->isa('Bio::CDAT::ContainedObjectI').
>
> * get_cdat -- get the cdat container
> * set_cdat -- set the cdat container

this seems like part of an inside-out design (which isn't necessarily bad, I just want to make sure I understand the design); so instead of a Bio::CDAT having matrices (or trees, or whatever else a CDAT contains), you want the ability to "back reference" the CDAT object directly from the matrix? Do you want to do this via soft-references (i.e. the Bio::CDAT truly contains the matrices, and the matrices have a back-reference for convenience), or does the Bio::CDAT truly not itself know what the associated matrices are (not good, I'd think).

Also, doesn't this mean that I can't easily instantiate a plain old Bio::Align::AlignI matrix and call $cdat->add_matrix($alignment) without declaring Bio::Align::AlignI ISA Bio::CDAT::ContainedObjectI? This seems unnecessarily prohibitive; I'd rather rebless the matrix into a new derived subclass that contains the get_cdat/set_cdat methods, if those backreferencing methods are so important to have.

> Methods inherited from Bio::Matrix::MatrixI. These methods are accessors

> for generic matrices/tables:
>
> * matrix_id -- primary key
> * matrix_name -- string, nexus token legal, i.e. single quoted with
> spaces or underline-separated
> * get_entry($rowname,$columnname) -- get single cell identified by
> $rowname and $columnname

for a CDAT, I'd like to see get_entry expanded to consider the notion of charsets (i.e. give me the entry that corresponds to a particular position in a charset, not the entire matrix).

> * get_column($col) -- get column identified by $col

ditto comment as above re: charsets

> * get_row($row) -- get row identified by $row
> * get_diagonal -- PROBABLY IRRELEVANT/IMPRACTICAL

agreed.

> * column_num_for_name($name) -- get internal index of column identified
> by $name
> * row_num_for_name($name) -- get internal index of row identified by
$name > * num_rows -- SAME AS "ntax"
> * num_columns -- SAME AS "nchar"

OK, so these are your stylistic aliases wink

> * row_names -- character sequence names
> * column_names -- SAME AS "charstatelabels "

> Methods like (but probably not inherited from)
> Bio::Matrix::GenericMatrix. These methods are mutators for generic
> matrices/tables:
>
> * add_row($row) -- adds row $row to matrix
> * remove_row($row) -- removes row $row from matrix
> * add_column($col) -- adds column $col to matrix
> * remove_column($col) -- removes column $col from matrix

here's where the fun starts; what happens if you execute these methods on a matrix already associated with a CDAT, and that CDAT already has associated tree(s)?

I think the "rebless into CDAT-aware subclass" idea almost has to happen to be able to intercept these calls and either a) try to cascade the action if (easily) possible or b) throw a consistency error.

Thus, we'll also need a way to disassociate a matrix from its CDAT object to "restore" it's normal base functionality.

Thanks for the solid thinking, hopefully this discourse remains fruitful.

-Aaron

Date: Thu, 13 Jul 2006 13:54:14 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Thu, 13 Jul 2006 13:54:14 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Re: character state matrix api

Hi Aaron,

thanks for the reply! Below I will go through the points you're making one by one trying to explain myself a bit more, but here I'll say a bit more about "the vision thing" smile

I, like everyone else, like APIs that are sensibly laid out and easy to remember. I agree that accessors and mutators should be explicitly separated as get_* and set_* methods. BioPerl? sometimes violates this, by using variable argument lists, such as $node->branch_length( $length ) and $node->branch_length() for setting and getting, respectively. We can't really change these APIs, they've ossified, they're "stable" wink

For our problem space, we also have another type of ossified API, namely that of the nexus syntax. For character state matrices, there's a bunch of tokens that many phylogeneticists can recite: 'datatype', 'ntax', 'nchar', 'symbols', 'missing', etc.

If I try to imagine how people would used a shared API we design, I can see many people wanting to use a parser library from one package to obtain objects from a file, and serialize it in some way - to the cipres architecture, to a different data file format, to a visualization format, to another internal data structure.

The things I will want to know about an object once I've received it from a parser and query it for serialization will probably be the same things the programs for which nexus was designed (and those that have adopted it subsequently) want to know: how many rows, how many columns, what sort of symbols can I expect and what do they mean, what do the rows mean. The tokens to indicate that in nexus ('ntax', 'nchar', 'symbols', 'datatype', an implicit or explicit 'link' to a 'taxa' block) are in my view part of the "traditional", stable terminology. I think the API we design will suffer if we replace these with long-but-consistent names that will be soul destroying to type out every time ('get_num_rows', 'get_num_columns', 'get_matrix_symbols', 'get_matrix_data_type' etc.).

We are, after all, talking about perl programming, and Perl has a 'grep' function.

There is already some friction between these two forces (consistency versus convention), but there is a third force acting on the design: having to fit into BioPerl?'s inheritance tree. The way I see it, we could fit in like this:

Matrices Bio::CDAT::CharMatrixI (now under discussion) would be the main interface for character state matrices. Ideally, this would be an interface that is relatively easy to implement in NEXPL and Bio::Phylo, so that either can function as a parser back end for Bio::CDAT::IO. The Bio::CDAT::CharMatrixI interface inherits from BioPerl?'s Bio::Matrix::MatrixI so that objects from the CDAT parser architecture are available to other BioPerl? modules that want matrices. The matrix would be comprised of character sequence objects (basically, an encapsulated matrix row), for which we probably need a Bio::CDAT::CharSeqI interface.

Trees We can use Bio::Tree::TreeI, which should be fairly easy to implement in NEXPL. Likewise, for nodes we can use Bio::Tree::NodeI, which NEXPL would have to implement. This would make the tree parsers available for the IO back end.

Taxa There needs to be some notion like the 'taxa' block in nexus files. Taxon objects are basically encapsulated names to which sequences and nodes can link in some way for disambiguation purposes.

So that's the direction in which I'm thinking. Now, specifically:

aaron.j.mackey@gsk.com wrote: > Rutger, I'm excited to see someone laying out an API, but it would help me
> (at least) to see your overall vision for class hierarchy and
> relationships (perhaps in UML or something more lightweight), and then
> start discussing API. In my comments/questions below, I'll try to infer
> your underlying "design" from your external API description, but I may
> have it wrong, so forgive me.
>
>> New methods in Bio::Phylo::Matrices::CharMatrixI (future name:
>> Bio::CDAT::CharMatrixI or Bio::Matrix::CharMatrixI). These methods are
>> accessors - i.e. read only - that map onto the respective nexus tokens
>> by the same name.
>>
>
> why read only? some of these I could imagine being "mutators" (e.g.
> $cdat->missing("?") would cause all N's to convert to ?'s if
> $cdat->missing() eq "N"). Also, if it's going to be read-only, then I
> would prefer to see "get_*" accessor names, with explicitly missing
> "set_*" mutator methods.
>
I'm on the fence here - I'd like to strike a balance between "consistent" and "easy to remember". In the context of stringifying a matrix to nexus (or another conceptually similar format) it'd be nice to have all the nexus tokens available. For example, in a template for the template toolkit, you could do:

########################
begin characters;
dimensions ntax=[% matrix.ntax %] nchar=[% matrix.nchar %];
format datatype=[% matrix.datatype %] missing=[% matrix.missing %] 
gap=[% matrix.gap %] symbols=[% matrix.symbols %];
charlabels [% matrix.charlabels %];
matrix
....
########################
...if you pass it a $matrix that is a CharMatrixI? object. On the other hand, for Bio::Phylo I have religiously stuck to get_* and set_* methods, and if any of the above methods are "setters" too, that should be made explicit in the method names (rather than through 'dual usage' overloading, i.e. with/without arg). I would just hate to have to type get_num_taxa every time I want to know how many rows are in the matrix smile >
>> * datatype -- dna|rna|protein|standard|continuous|restriction
>>
>
> the "original" CDAT relational data model allowed mixed datatypes (e.g. a
> binary intron presence/absence state could be embedded in the CDS sequence
> at the position at which the intron may occur); I realize that the
> representation of this in Nexus flat file format must be via separate
> matrix objects, but does that necessarily limit CDAT matrices?

I forgot "mixed". It should be allowed. Note that nexus files for mrbayes have a mixed data type (essentially concatenated matrices of dna and standard, though).

>> * symbols -- array ref of single character symbols
>>
>
> or maybe "alphabet"? related to the above, the symbols/alphabet structure
> may differ per column if mixed datatypes are allowed ... so this could be
> perhaps the "default" alphabet of the matrix, though a given column in the
> matrix may utilize a difference alphabet (or none at all, if continuous)

Again, this was for consistency with nexus tokens, as are the names below:

>> * missing -- the missing data symbol, usually '?' or 'N'
>> * gap -- the gap symbol, usually '-'
>> * ntax -- number of rows in matrix

> err, num_otus? or num_rows?

>> * nchar -- number of columns in matrix

> num_chars? num_cols? num_columns?
>
> in general the prefix "n" is not very meaningful; one of the most loved
> (and hated) aspects of BioPerl? are the various method aliases that have
> arisen for just these differences in style and expectation.
>
>> * charstatelabels -- column labels
>
> I'm not sure what this is; is this related to charset's? what if a column
> belongs to more than one charset?
>

I meant the "charlabels" nexus token (i.e. column names).

>> New methods for data integrity:
>>
>> * set_charstate_lookup -- set character state lookup hash
>> * get_charstate_lookup -- get character state lookup hash
>
> I don't know what a "character state lookup hash" is (well, I can guess,
> but probably won't be entirely correct); what did you have in mind?

We need to be able to specify how the different symbols in a matrix map onto each other. For example, for restriction data, state '0' only ever maps onto '0', and '1' maps onto '1', i.e. both are unambiguous symbols. The '?' symbol could mean either '0' or '1'; the '-' symbol means neither. A hash that describes this is:

my $lookup = {
    '-' => [],
    '0' => [ '0' ],
    '1' => [ '1' ],
    '?' => [ '0', '1' ],
};

It indicates what we actually mean w.r.t. "missing" data and "gaps". For this data type this is not very complex, but think of how the IUPAC single character ambiguity symbols map onto each other: a rather bigger hash. For all datatypes (other than continuous) we can define a default hash in the classes, and users can get and set a new one, perhaps merging default hashes from different data types for "mixed" matrices.

Here's why we need this: i) symbols can be validated by checking whether they exist as keys in the hash; ii) if, while parsing a matrix, you come across "{ac}" (mrbayes) or "a&c" (mesquite) you can lookup the symbol that maps onto [ 'A', 'C' ] and use that internally; iii) by implementing an internal notion of ambiguity we can write out different dialects of nexus, i.e. with the {ac} construct if we're writing for mrbayes, and a&c if we're writing for mesquite; iv) the hashes can be modified/merged - e.g. for "mixed" data we could specify a hash that combines the "dna" and "standard" hashes.

(Mesquite and paup do things internally like this as well, albeit with some multidimensional array jiggery-pokery.)

>> Methods inherited from Bio::CDAT::ContainedObjectI. The idea is that
>> internally, the $cdat->add_matrix($matrix) method could check whether
>> $matrix->isa('Bio::CDAT::ContainedObjectI').
>>
>> * get_cdat -- get the cdat container
>> * set_cdat -- set the cdat container
>
> this seems like part of an inside-out design (which isn't necessarily bad,
> I just want to make sure I understand the design); so instead of a
> Bio::CDAT having matrices (or trees, or whatever else a CDAT contains),
> you want the ability to "back reference" the CDAT object directly from the
> matrix? Do you want to do this via soft-references (i.e. the Bio::CDAT
> truly contains the matrices, and the matrices have a back-reference for
> convenience), or does the Bio::CDAT truly not itself know what the
> associated matrices are (not good, I'd think).

I think $node needs to be able to find out whether $charseq belongs to the same Bio::CDAT container. The Bio::CDAT container would be some kind of array, so it could get at its contents and know what the associated matrices/trees/etc are, but it'll be handy if the contained objects can get at their container also. Perhaps just via their ID, not via actual references - as you suggested earlier (also to prevent issues with cyclical references and memory leaks, I realized later).

> Also, doesn't this mean that I can't easily instantiate a plain old
> Bio::Align::AlignI matrix and call $cdat->add_matrix($alignment) without
> declaring Bio::Align::AlignI ISA Bio::CDAT::ContainedObjectI? This seems
> unnecessarily prohibitive; I'd rather rebless the matrix into a new
> derived subclass that contains the get_cdat/set_cdat methods, if those
> backreferencing methods are so important to have.

Sure, I can't think of any name clashes right now, so objects contained by Bio::CDAT could perhaps be duck-typed by $obj->can('set_cdat'). Part of the point was that the CDAT container should be able to figure out whether what you're trying to add to it is a good idea or not, without a cascade of if/else statements.

>> Methods inherited from Bio::Matrix::MatrixI. These methods are accessors
>>
>> for generic matrices/tables:
>>
>> * matrix_id -- primary key
>> * matrix_name -- string, nexus token legal, i.e. single quoted with
>> spaces or underline-separated
>> * get_entry($rowname,$columnname) -- get single cell identified by
>> $rowname and $columnname
>
> for a CDAT, I'd like to see get_entry expanded to consider the notion of
> charsets (i.e. give me the entry that corresponds to a particular position
> in a charset, not the entire matrix).

Sounds good. Could be a third positional argument, I guess?

>> * num_rows -- SAME AS "ntax"
>> * num_columns -- SAME AS "nchar"
>
> OK, so these are your stylistic aliases wink

Yup, there'll inevitably be some redundance/aliasing.

>> Methods like (but probably not inherited from)
>> Bio::Matrix::GenericMatrix. These methods are mutators for generic
>> matrices/tables:
>>
>> * add_row($row) -- adds row $row to matrix
>> * remove_row($row) -- removes row $row from matrix
>> * add_column($col) -- adds column $col to matrix
>> * remove_column($col) -- removes column $col from matrix
>>
>
> here's where the fun starts; what happens if you execute these methods on
> a matrix already associated with a CDAT, and that CDAT already has
> associated tree(s)?

Adding columns I can't see having a great effect on associated trees, but here's how things work inside Bio::Phylo: If you add a row, that row is a datum object that is either identified by a name (string) or a taxon object. The taxon object is contained by a taxa container. If you insert the datum object in the matrix object, the matrix will check whether the datum object holds a reference to a taxon, and if it does, whether it belongs to the right taxa container. Matrices and trees can both reference the same taxa container, so that you get an architecture like in a nexus file.

> I think the "rebless into CDAT-aware subclass" idea almost has to happen
> to be able to intercept these calls and either a) try to cascade the
> action if (easily) possible or b) throw a consistency error.
>
> Thus, we'll also need a way to disassociate a matrix from its CDAT object
> to "restore" it's normal base functionality.

Agreed.

> Thanks for the solid thinking, hopefully this discourse remains fruitful.
>
> -Aaron

Any further feedback very welcome.

Best wishes,

Rutger

Date: Sat, 15 Jul 2006 13:19:35 -0400, From: aaron.j.mackey@gsk.com

Date: Sat, 15 Jul 2006 13:19:35 -0400 From: aaron.j.mackey@gsk.com Subject: Re: character state matrix api

> For character state matrices, there's a bunch
> of tokens that many phylogeneticists can recite: 'datatype', 'ntax',
> 'nchar', 'symbols', 'missing', etc.

Fair enough, I'm convinced.

> the API we design will suffer if we replace these with
> long-but-consistent names that will be soul destroying to type out every

> time ('get_num_rows', 'get_num_columns', 'get_matrix_symbols',
> 'get_matrix_data_type' etc.).

yep, agreed.

> _Taxa_
> There needs to be some notion like the 'taxa' block in nexus files.
> Taxon objects are basically encapsulated names to which sequences and
> nodes can link in some way for disambiguation purposes.

OK, this makes more sense to me now; I was still thinking about the original CDAT relational model, that had no separate Taxa/OTU entity, only sequences and nodes directly linked. but having a separate taxa object will make things more flexible.

> For example, in a template for the template toolkit, you could do:
> ########################
> begin characters;
> dimensions ntax=[% matrix.ntax %] nchar=[% matrix.nchar %];
> format datatype=[% matrix.datatype %] missing=[% matrix.missing %]
> gap=[% matrix.gap %] symbols=[% matrix.symbols %];
> charlabels [% matrix.charlabels %];
> matrix
> ....
> ########################

Yes, this is nice, but I worry about defining an API based on one particular file format data-structure.

Besides, in your example, how does [% matrix.symbols %] and matrix.charlabels interpolate into the file, with quotes, commas, whitespace, etc? I'm not concerned about data input/output formats being so tightly bound to the API.

> >> * charstatelabels -- column labels
> I meant the "charlabels" nexus token (i.e. column names).

I guess I'm not familiar with this; is this just the numbers 1 to [nchar]?

> >> * set_charstate_lookup -- set character state lookup hash
> >> * get_charstate_lookup -- get character state lookup hash
> >
> We need to be able to specify how the different symbols in a matrix map
> onto each other. For example, for restriction data, state '0' only ever
> maps onto '0', and '1' maps onto '1', i.e. both are unambiguous symbols.

> The '?' symbol could mean either '0' or '1'; the '-' symbol means
> neither. A hash that describes this is:
>
> my $lookup = {
> '-' => [],
> '0' => [ '0' ],
> '1' => [ '1' ],
> '?' => [ '0', '1' ],
> };
>

> Here's why we need this: i) symbols can be validated by checking whether

> they exist as keys in the hash; ii) if, while parsing a matrix, you come

> across "{ac}" (mrbayes) or "a&c" (mesquite) you can lookup the symbol
> that maps onto [ 'A', 'C' ] and use that internally;

ahh, this is presumably because you require each "cell" in the matrix to be scalar. again, one of the goals of CDAT is to go beyond this simplifying assumption and allow states to be probabilistic across the defined alphabet, for both observed states and inferred ancestral states. Why would an observed state be probabilistic? For exactly the "ambiguity" reasons you define above, and others (sequence trace/assembly quality scores, suspect mutations, etc.). This is particularly relevant for applications such as SNPs where you might want to remember the fraction of the population that has this vs. that allele at a particular position.

So if we don't do this internal "translation", we don't need these hashes; for validation all we need is the alphabet/symbols method (specified earlier).

> (Mesquite and paup do things internally like this as well, albeit with
> some multidimensional array jiggery-pokery.)

yep, I think that jiggery-pokery may be in our game-plan as well.

Of course, this may be a schism point between a Bio::CDAT::MatrixI and a Bio::CDAT::ProbabilisticMatrixI (which ISA Bio::CDAT::MatrixI), which would also be fine.

> >> Methods inherited from Bio::CDAT::ContainedObjectI. The idea is that
> >> internally, the $cdat->add_matrix($matrix) method could check whether

> >> $matrix->isa('Bio::CDAT::ContainedObjectI').
> >>
> >> * get_cdat -- get the cdat container
> >> * set_cdat -- set the cdat container
> >
> I think $node needs to be able to find out whether $charseq belongs to
> the same Bio::CDAT container.

I think that's fine, for utility/sanity checking.

> Sure, I can't think of any name clashes right now, so objects contained
> by Bio::CDAT could perhaps be duck-typed by $obj->can('set_cdat'). Part
> of the point was that the CDAT container should be able to figure out
> whether what you're trying to add to it is a good idea or not, without a
> cascade of if/else statements.

I see your point, but I fear any solution that requires some other monolithic project (BioPerl?) to add interfaces and/or methods to support another arcane (though useful) project (Bio::CDAT).

> >> * add_row($row) -- adds row $row to matrix
> >> * remove_row($row) -- removes row $row from matrix
> >> * add_column($col) -- adds column $col to matrix
> >> * remove_column($col) -- removes column $col from matrix
> >
> > here's where the fun starts; what happens if you execute these methods on
> > a matrix already associated with a CDAT, and that CDAT already has
> > associated tree(s)?
> >
> Adding columns I can't see having a great effect on associated trees,

except that if the tree was inferred from the matrix, adding a new column negates/outdates the current inference.

> but here's how things work inside Bio::Phylo:
> If you add a row, that row is a datum object that is either identified
> by a name (string) or a taxon object. The taxon object is contained by a
> taxa container. If you insert the datum object in the matrix object, the
> matrix will check whether the datum object holds a reference to a taxon,
> and if it does, whether it belongs to the right taxa container. Matrices
> and trees can both reference the same taxa container, so that you get an
> architecture like in a nexus file.

OK, again you're thinking about discrete manipulations of the datastructure (which is good), but I'm thinking about possible utility. If I call remove_row() will that "cascade" to an equivalent remove_node() call in the associated tree? I guess in my head there are two "scenarios" under which data manipulation occurs: construction (in which I don't care so much about referential integrity until I'm all done) and analysis (in which I do care very much if I stupidly do something that invalidates/outdates some other piece of information I've also carefully constructed).

> > I think the "rebless into CDAT-aware subclass" idea almost has to happen
> > to be able to intercept these calls and either a) try to cascade the
> > action if (easily) possible or b) throw a consistency error.

One further thought on this is that we might consider using an Observer design pattern to do this: one (or more?) Bio::CDAT objects are registered as listeners to the events that occur on Bio::CDAT::ComponentI's (instead of ContainedObjectI?'s); thus the component gets to know its CDAT(s), the CDAT(s) gets to control (via callbacks) its components (and interfere when something bad happens), etc. We'd still need to be able to directly access components via the CDAT object, so some amount of cyclic referencing will be necessary, but weak referencing is pretty stable in Perl nowadays.

> > Thus, we'll also need a way to disassociate a matrix from its CDAT object
> > to "restore" it's normal base functionality.

With the Observer pattern, this is simply a de-registration, no need to un/re-bless.

-Aaron

Date: Mon, 17 Jul 2006 18:06:16 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Mon, 17 Jul 2006 18:06:16 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Re: character state matrix api

Hi all,

in my head, I've summarized the exchanges on this thread as:

  1. we need a character state matrix api, AlignI? is too specialized,
MatrixI? too general;

  1. there is some debate about the exact aesthetics of getters and setters;

  1. the matrix will have to be able to maintain type safety, and hold
meta data about ambiguity/uncertainty;

  1. the matrix needs to fit into a more general CDAT architecture for
naming, i.e. how do we know that a sequence and a node refer to the same entity?

  1. the matrix needs to fit into an architecture for maintaining
referential integrate, so that state changes can cascade from one CDAT-contained object to another.

To start with the last item: Aaron mentioned the observer pattern. If I understand correctly, in this case it would mean that the main CDAT object is the observer, and that matrices, trees, nodes (etc.?) are subjects that register with the observer, and notify it when they change state (but please chime in if that's not how it would work). I think we can also see this in a caching context, i.e. in the implementation, objects might store intermediate calculation results and keep them until the object changes state (perhaps through a call by the CDAT object?).

On the second-to-last item: perhaps the CDAT object maintains a pool of all the unique names/ids of all entities that it observes? Earlier upthread we talked about whether a cdat object was essentially { _trees => [], _matrices => [] }, but I think we were worried we'd end up with cdat object becoming a big soup of unrelated entities. If we've added a matrix to the cdat object, surely we will want to map in some way the names of sequences in that matrix onto matching names in a tree we subsequently add. Maybe CDAT is something like a hash with names as keys and as values some structure to define the objects by that name (perhaps holding object references or ids).

W.r.t. type safety and metadata: I am worried about speed/memory requirements - can we just relegate this to either the character sequence object or a matrix subclass?

Rutger

(Below are some more specific responses.)

>
> Yes, this is nice, but I worry about defining an API based on one
> particular file format data-structure.

There are so many nexus files out there, I think many people will be very happy if their contents can be made available through a simple API. I agree, though, that this may not necessarily be provided by 'core' cdat - perhaps by nexpl? It would just be aliasing the getters by the same name, I guess.

> Besides, in your example, how does [% matrix.symbols %] and
> matrix.charlabels interpolate into the file, with quotes, commas,
> whitespace, etc? I'm not concerned about data input/output formats being
> so tightly bound to the API.

array references are interpolated into a space separated string of their contents. I think symbols need to be double quoted in nexus, so there'd have to be quotes around that in the template. Charlabels aren't quoted.

>>>> * charstatelabels -- column labels
>>>>
>> I meant the "charlabels" nexus token (i.e. column names).
>
> I guess I'm not familiar with this; is this just the numbers 1 to [nchar]?

It's like taxlabels: a list of names. In molecular matrices they are often skipped.

> ahh, this is presumably because you require each "cell" in the matrix to
> be scalar. again, one of the goals of CDAT is to go beyond this
> simplifying assumption and allow states to be probabilistic across the
> defined alphabet, for both observed states and inferred ancestral states.

Individual cells would have to be pretty compressed one way or another. The big idea here was that this is done by mapping all ambiguity onto single character symbols. Another way that I know of is by packing the states into bitvectors, e.g. A: 1000, C: 0100, G: 0010, T: 0001, N: 1111, (which may or may not be surprisingly efficient depending on how perl rounds bytes). Whatever it is, I don't see every "cell" become a complex data structure, not to mention an object. It'll be impossible in terms of memory requirements. A matrix for 200 taxa, 2000 characters? It'll take ages to parse.

> Why would an observed state be probabilistic? For exactly the "ambiguity"
> reasons you define above, and others (sequence trace/assembly quality
> scores, suspect mutations, etc.). This is particularly relevant for
> applications such as SNPs where you might want to remember the fraction of
> the population that has this vs. that allele at a particular position.
>
> So if we don't do this internal "translation", we don't need these hashes;
> for validation all we need is the alphabet/symbols method (specified
> earlier).

It'd definitely be nice if metadata could be attached to each individual cell, but I'm just worried how this would work out memory-wise. I tried out character objects for bio::phylo, and it was really slow. In any case, though, that's more an issue for the charseq interface, not for the matrix that contains it.

>> (Mesquite and paup do things internally like this as well, albeit with
>> some multidimensional array jiggery-pokery.)
>
> yep, I think that jiggery-pokery may be in our game-plan as well.
>
> Of course, this may be a schism point between a Bio::CDAT::MatrixI and a
> Bio::CDAT::ProbabilisticMatrixI (which ISA Bio::CDAT::MatrixI), which
> would also be fine.

Sounds good.

>>> Thus, we'll also need a way to disassociate a matrix from its CDAT object to "restore" it's normal base functionality.
>
> With the Observer pattern, this is simply a de-registration, no need to
> un/re-bless.

Could you say about more about how this would work? $matrix->register( $cdat ); after which point the cdat gets notified about any change in the matrix so that it can cascade changes in other objects?

Date: Tue, 18 Jul 2006 09:42:51 -0400, From: aaron.j.mackey@GSK.com

Date: Tue, 18 Jul 2006 09:42:51 -0400 From: aaron.j.mackey@GSK.com Subject: Re: character state matrix api

> Individual cells would have to be pretty compressed one way or another.
> The big idea here was that this is done by mapping all ambiguity onto
> single character symbols. Another way that I know of is by packing the
> states into bitvectors, e.g. A: 1000, C: 0100, G: 0010, T: 0001, N:
> 1111, (which may or may not be surprisingly efficient depending on how
> perl rounds bytes). Whatever it is, I don't see every "cell" become a
> complex data structure, not to mention an object. It'll be impossible in

> terms of memory requirements. A matrix for 200 taxa, 2000 characters?
> It'll take ages to parse.

This is exactly what the Flyweight pattern is for. The classic example is a word processor application that "somehow" has to keep track of the independent state characteristics (font, size, color, embellishment) of each and every character in a document, possibly tens of thousands.

> > With the Observer pattern, this is simply a de-registration, no need
to > > un/re-bless.
> >
> >
> Could you say about more about how this would work? $matrix->register(
> $cdat ); after which point the cdat gets notified about any change in
> the matrix so that it can cascade changes in other objects?

Here's one way to do it which maintains the original $matrix object, but "wraps" it with a CDAT-savvy matrix object:

$cdat->add_matrix(\$matrix) entails:

sub add_matrix {
  my ($self, $matrix) = @_;

  # replace $matrix with a CDAT-compatible matrix object that
  # delegates to original $matrix
  $$matrix = Bio::CDAT::Component::Matrix->new($$matrix);

  $$matrix->register($cdat);
  push @{$self->{_matrices}}, $$matrix;

  return $$matrix;
}
Now when I call $matrix->delete_column(10), I'm really calling Bio::CDAT::Component::Matrix::delete_column(), which notifies it's listeners, and delegates to the original $matrix's delete_column() method.

$cdat->remove_matrix(\$matrix) would do the opposite (replacing $$matrix with the original object), as might $matrix->unregister($cdat);

There are other ways to do it that don't involve delegation (e.g. destructively convert the original matrix object into a CDAT-savvy object, or "decorate" the original object with CDAT-savvy methods via adding to the object's @ISA), but each has its pros and cons.

-Aaron

Date: Tue, 18 Jul 2006 13:19:14 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Tue, 18 Jul 2006 13:19:14 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Re: character state matrix api

You are the Gang of Four and I claim my $5.

But seriously, so would the matrix contain row objects which in turn contain flyweight cell objects? Who is to know what classes of flyweight objects are to be instantiated? Does the matrix decide? The matrix row? I'm curious to hear how you'd see this be organized architecturally.

Best wishes,

Rutger

Date: Wed, 19 Jul 2006 08:57:06 -0400, From: aaron.j.mackey@gsk.com

Date: Wed, 19 Jul 2006 08:57:06 -0400 From: aaron.j.mackey@gsk.com Subject: Re: character state matrix api

Perhaps it's time to have another little conference to discuss these ideas more fully? I can setup a webEx teleconference (so at least we'd have "whiteboard"-like capability) for the core group to discuss basic implementation details.

One realization I had today about this discussion is that (at least in my mind) we've been discussing two separate things: CDAT "native" matrix representation/implementation vs. the CDAT matrix API. Mixed in there has been ideas about object "IO" (e.g. $cdat->add_matrix($matrix), where $matrix is not a file to be parsed, but some Bio::Align::AlignI-like object), and the possibility of being able to achieve the CDAT matrix API without reconstructing a new CDAT matrix object, but by "decorating" the original $matrix object with the CDAT API (and on further thought, I'm not sure that will be possible in the long run).

Regardless, let's have a "voice-to-voice" sometime soon. I'm available all day Thursday and most of the day Friday. Let me know your availability for the next, say, 7 working days (through July 28th).

Thanks,

-Aaron

Date: Wed, 19 Jul 2006 18:17:56 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Wed, 19 Jul 2006 18:17:56 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Re: character state matrix api

In recent discussions we have had some confusion about what I mean by "taxon". This page describes more or less what I meant (and, hopefully, why we'd need something like that for CDAT): http://mesquiteproject.org/Mesquite_Folder/docs/mesquite/Taxa.html

Subject: Bio::Align::AlignI not suited, right?

Date: Sun, 16 Jul 2006 00:09:58 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Sun, 16 Jul 2006 00:09:58 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Bio::Align::AlignI not suited, right?

Hi all,

I want to verify with you whether you think Bio::Align::AlignI is suitable as an interface for character state matrices. I don't think it is, as it's too specifically dna-oriented. It's a shame it inherits directly from the root, ideally it would be a subclass of a character data matrix. Can you see CharMatrixI? essentially be a subset of Bio::Align::AlignI, so that in the future maybe we can convince bioperl to make Bio::Align::AlignI inherit from it (so that alignments can be used by cdat directly)?

Rutger

Date: Sun, 16 Jul 2006 19:20:50 -0400, From: aaron.j.mackey@gsk.com

Date: Sun, 16 Jul 2006 19:20:50 -0400 From: aaron.j.mackey@gsk.com Subject: Re: Bio::Align::AlignI not suited, right?

Yep, I think I can agree with all of that, except the very last bit about CDAT using AlignI?'s directly - I'm content for an AlignI? to not be immediately CDAT-usable, just as a Bio::SeqI won't ever be immediately CDAT-usable, I think we're always going to have to reinterpret these more basic objects in the context of the CDAT data model, bestowing various functionalities not inherent to the native object.

The first step to making something happen in BioPerl? is to check out the CVS code, make the change (alter AlignI?'s ISA to include CharMatrixI?), and then see what breaks in the test suite (of course, there are no tests, yet, to ensure that a particular implementation class of AlignI? fully implements all methods defined in CharMatrixI?).

-Aaron

Subject: Character flyweight sketch

Date: Mon, 24 Jul 2006 16:27:47 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Mon, 24 Jul 2006 16:27:47 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Character flyweight sketch

Hi all,

attached is a sketch for a flyweight character class, my interpretation of what Aaron meant a few days ago.

Rutger

--Boundary_(ID_7knUJLO/HrKgAF2Wv1AmUQ)
Content-type: text/plain; name=Character.pm
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=Character.pm

package Bio::Phylo::Matrices::Character;
use strict;
use constant { CHAR => 0, AMBIG => 1, RCASE => 2, POLY => 3 };
my $cache = {};
my $chars = [];
use Bio::Phylo::Util::CONSTANT qw(_DATUM_);
use Bio::Phylo::Util::IDPool;
sub new {
    my ( $class, @args ) = @_;
    my ( $respectcase, $is_poly, %opt, $char, $ambig, $key ) = ( 0, 0 );
    if ( not scalar @args % 2 and scalar @args > 2 ) {
        %opt = @args;
        $is_poly = $opt{'-polymorphism'} ? $opt{'-polymorphism'} : 0;
        $respectcase = $opt{'-respectcase'} ? $opt{'-respectcase'} : 0;
        $char = $opt{'-char'};        
    }
    elsif ( scalar @args == 2 ) {
        ( $char, $ambig ) = ( $args[0], $args[1] );
    }
    elsif ( scalar @args == 1 ) {
        ( $char, $ambig ) = ( $args[0], uc( $args[0] ) );
    }
    $ambig = $opt{'-ambig'} ? $opt{'-ambig'} : [ uc( $char ) ];
    $key = join '', $char, sort { $a <=> $b } @$ambig, $respectcase, $is_poly;
    if ( $cache->{$key} ) {
        return $cache->{$key};
    }
    else {
        my $self = Bio::Phylo::Util::IDPool->_initialize();
        $chars->[ $$self ] = [
            $char,
            $ambig,
            $respectcase,
            $is_poly,
        ];
        bless $self, $class;
        $cache->{$key} = $self;
        return $self;
    }

}
sub set_char {
    my ( $self, $char ) = @_;
    $self = __PACKAGE__->new(
        '-char'         => $char,
        '-polymorphism' => $self->is_polymorphic(),
        '-respectcase'  => $self->is_case_sensitive(),
        '-ambig'        => $self->get_ambig_lookup(),
    );
    return $self;
}
sub set_ambig_lookup {
    my ( $self, $ambig ) = @_;
    $self = __PACKAGE__->new(
        '-char'         => $self->get_char(),
        '-polymorphism' => $self->is_polymorphic(),
        '-respectcase'  => $self->is_case_sensitive(),
        '-ambig'        => $ambig,
    );
    return $self;
}
sub set_case_sensitivity {
    my ( $self, $cs ) = @_;
    $self = __PACKAGE__->new(
        '-char'         => $self->get_char(),
        '-polymorphism' => $self->is_polymorphic(),
        '-respectcase'  => $cs,
        '-ambig'        => $self->get_ambig_lookup(),
    );    
    return $self;    
}
sub set_polymorphism {
    my ( $self, $poly ) = @_;
    $self = __PACKAGE__->new(
        '-char'         => $self->get_char(),
        '-polymorphism' => $poly,
        '-respectcase'  => $self->is_case_sensitive(),
        '-ambig'        => $self->get_ambig_lookup(),
    );    
    return $self;    
}
sub get_char          { $chars->[ $$_[0] ]->[ CHAR  ] }
sub get_ambig_lookup  { $chars->[ $$_[0] ]->[ AMBIG ] }
sub is_case_sensitive { $chars->[ $$_[0] ]->[ RCASE ] }
sub is_polymorphic    { $chars->[ $$_[0] ]->[ POLY  ] }
sub _container        { _DATUM_ }

################################################################################
package main;
use Data::Dumper;
my @dna = qw(A C G T);
my @array;
push @array, Bio::Phylo::Matrices::Character->new($dna[int(rand(scalar @dna))]) for ( 0 .. 100000 );
print Dumper( \@array );

--Boundary_(ID_7knUJLO/HrKgAF2Wv1AmUQ)--

Date: Tue, 25 Jul 2006 09:36:18 -0400, From: aaron.j.mackey@gsk.com

Date: Tue, 25 Jul 2006 09:36:18 -0400 From: aaron.j.mackey@gsk.com Subject: Re: Character flyweight sketch

Yeah, without quibbling over what the various get/set methods are (or what might be missing), this is the general idea - keep a cache of very lightweight objects that all share a common underlying (possibly large) datastructure.

What you're missing (and usually isn't discussed in the Flyweight pattern documentation) is a bulk constructor: i.e. if you were actually parsing an entire matrix and wanted to "preload" $chars in one bulk import. Of course, this would involve changing your identification pattern to one more rooted in the structure (i.e. using (row, column) tuples as a computable index into $chars) such that you don't also unnecessarily fill the cache with all the blessed scalars.

But again, I thought we were going to wait to have an actual discussion before plunging into so much coding? And that you remained unavailable for such discussions until September?

-Aaron

Date: Wed, 26 Jul 2006 12:39:00 -0400, From: arlin.stoltzfus@nist.gov

From: stoltzfu@umbi.umd.edu Subject: Re: Character flyweight sketch Date: July 26, 2006 12:39:00 PM EDT

On Jul 25, 2006, at 9:36 AM, aaron.j.mackey@gsk.com wrote:

> But again, I thought we were going to wait to have an actual discussion
> before plunging into so much coding? And that you remained unavailable
> for such discussions until September?

I am having mixed feelings about where we are going here. On the one hand, it rarely hurts to actually DO something, and obviously Rutger is getting things done and in one sense, we don't want to slow him down by trying to get a group consensus on everything. Its possible that he is going to solve all of our problems while we stand on the sidelines watching. On the other hand, we don't want to develop too much without a plan, and our priorities also include:

1. developing specific use cases to serve as target problems 2. developing a spec for CDAT-BioPerl integration 3. making plans for our next meeting.

My opinion is that if people want to start coding on an individual basis, that's great, but we should think of this as an experimental branch to be tested against a spec that is not yet fully developed.

With respect to #3, I will send a separate message today. I am working on the kinases case for #1. More later,

Arlin

Date: Wed, 26 Jul 2006 12:52:54 -0400, From: aaron.j.mackey@gsk.com

Date: Wed, 26 Jul 2006 12:52:54 -0400 From: aaron.j.mackey@gsk.com Subject: Re: Character flyweight sketch

Yes, I agree entirely. Rutger, please forgive my earlier message for sounding far too stodgy. As I've mentioned before, I truly appreciate your enthusiasm (both for discussion and actual coding). My (small) concern is ending up in a situation where we feel "locked in" to a particular implementation because of a significant effort that went into it. But in this "prototyping" stage, code does have the advantage (over "blather") of being concrete and testable.

-Aaron

Date: Wed, 26 Jul 2006 09:59:09 -0700, From: Rutger Vos <rvosa@sfu.ca>

Date: Wed, 26 Jul 2006 09:59:09 -0700 From: Rutger Vos <rvosa@sfu.ca> Subject: Re: Character flyweight sketch

Hi all,

sorry for stirring up a panic with that code sample smile

The way I think about programming problems is sometimes fairly bottom-up, so playing around with smaller components in the system helps me get an idea as to how they might sensibly interact and what the higher level architecture might be like.

Please don't read too much into the ideas I am throwing at you right now, I'm thinking out loud.

By the way, on the topic of use cases, have we nominated NEXPL's set of nexus test files for the test suite yet? That looks like a great acid test for the IO system.

Rutger

cdat prototyping

next message

CDAT design consideration: Mediator pattern

next message

next topic

next message

Post mortem

This section is an attempt to condense the preceding. (Please step in here to change things, but the goal is to keep things in this section on a "need to know" basis. A one page "executive summary". -- RutgerVos?)

Requirements

The requirements of a character matrix object that materialize in the preceding include:

  • BioPerl? compatible: the matrix may be composed of BioPerl? utility objects, and the overall CoreEIG architecture will consume BioPerl? objects. As BioPerl? has no useful character matrix interface, we design one here following BioPerl?'s interface style.
  • Bio::CDAT compatible: there will be a link between the cdat object and the matrix. This connection maintains referential integrity between biological data objects involved in an analysis.
  • Data integrity for sophisticated types: the matrix will be heavily annotatable, will contain mixed data types, will have a mechanism to define probabilistic character states, true polymorphism, and uncertainty between categorical states.
  • Multiple IO types: the matrix will be populated from a variety of data sources, such as flat files, database streams, webservice/corba connections.

Interface design

The discussion also touched on the API design. The consensus was that the interface should (at least) follow BioPerl?'s interface style, or probably be more explicit about the distinction between "getters" and "setters".

Implementation

Several implementation details were discussed:

  • A way to specify correspondence between ambiguous data types through a lookup hash.
  • A callback system through which the cdat object can maintain data integrity, akin to the Observer pattern.
  • A Flyweight pattern implementation for characters in a matrix.

-- ArlinStoltzfus - 28 Aug 2006

-- RutgerVos - 30 Aug 2006

Show attachmentsHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
pdfpdf Data_structures_for_phylogenetic_trees.pdf manage 122.7 K 12 Apr 2007 - 19:00 HilmarLapp Rutger's thesis Chapter I on design patterns for phylogenetics
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r5 < r4 < r3 < r2 < r1 | More topic actions
 
CAMEL TWiki home
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CAMEL? Send feedback