Wikipedia:Peer review/Sequence alignment/archive1

Sequence alignment edit

This article and its children multiple sequence alignment, structural alignment, and sequence alignment software have undergone significant revision in the past couple of weeks and I'd like some external commentary. There are two known issues with the main article - I'm working on getting a dot plot image, and the external links section will be removed after I finish merging the links into sequence alignment software - but these are tangential to the main issue I'd like comments on, which is the level of technical detail and its distribution between the parent and child articles. I'd like to see this as an FA eventually but for now I just want to be sure that people who understand basic biology but not much computer science (or vice versa) can understand it.

This is a subject that's often mis- or poorly understood by students and even by casual users of the tools, so I think it's important that the text be comprehensible to non-experts. Is there too much technical detail in the main article? Too little? Are the descriptions of selected methods useful, or too vague? Opabinia regalis 06:08, 3 July 2006 (UTC)[reply]

Please see automated peer review suggestions here. Thanks, Andy _t 15:01, 3 July 2006 (UTC)[reply]

A few points:

1.) Indels are actual (albeit hypothesized) evolutionary events, which may actually be of importance. I'm not familiar with the term "padding", but I find the term implies that indels are not events as such. If anyone's seen it used in the bioinformatics literature then continue, I just find it misleading. There's a huge literature on the importance and treatment ogf gaps in phylogenetics. It would be useful to incorporate that.

You're right, that's poor wording. "Padding" isn't a bioinformatics term, it's just an attempt to say "gap" without using the word "gap". Rewritten. Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

2.) Should molecular clocks and divergence times be brought into this somewhere? Similarity in sequence implies short time since divergence, slow evolving gene environment, and/or conservation due to functional significance.

Addressed in the new phylogenetics section. I think treatment of indels should probably go in phylogenetics or one of its relatives for space reasons. Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

3.) The programs align amino acids quite well, but I'm not sure that much of anyone will trust them with nucleotides without some hand editing. The intro to this article implies that computers can do it better. The reality is, at least in multiple alignment, the best alignments are a combination of both human and machine, back and forth between the software and hand editing.

You got me, I work almost entirely with proteins, so I'm less aware of the usage with nucleotide sequences. I would think, though, that hand alignments of nucleotides would be harder due to the lesser amount of biochemical information in the sequence? Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

Exactly. That's why the computer programs perform so abysmally. The result is that there's often a mix of programs and humans with the goal of capitalizing on the strengths of each. Most methods sections in phylogenetics papers will have a sentence that says something like: "Sequences were aligned using Clustal and edited by eye." --Aranae 04:21, 6 July 2006 (UTC)[reply]

4.) I don't understand the statement "Alignments are also used to aid in establishing evolutionary relationships by constructing phylogenetic trees (an application of the "molecular clock" hypothesis)." This seems to imply that phylogenetic techniques are theoretically based on the notions of molecular clock. That is simply false except for a handful of phenetic techniques and certain maximum likelihood models. Is something else meant by this sentence?

Bad phrasing. I was thinking of molecular clock in the sense of "way to measure divergence times", not necessarily the strict constant-rate interpretation. Also rewritten. Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

5.) There should be mention of phylogenetic alignment programs like POY that will align sequence, build a tree, realign based on that tree, build a new tree, etc. There's also quite a bit of controversy surrounding this approach.

Added some info on the Sankoff-Morel-Cedergren algorithm and what I could find on the POY method, but since I haven't worked with it I think that section is a little thin. Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

6.) In the end, a large part of many alignments is simply excluded from analyses (again speaking of phylogenetics), because researchers are uncertain of character homology.

Do you know, or can you point me to, a description of the criteria for exclusion?

It's usually very arbitrary. Many papers will simply state something along the lines of: "Unalignable regions were excluded from the analysis." With some exceptions, phylogeneticists are more comfortable throwing out suspect data than violating major assumptions inherent to their analytical techniques, the assumption of homology in particular. --Aranae 04:21, 6 July 2006 (UTC)[reply]

--Aranae 17:32, 3 July 2006 (UTC)[reply]

Thanks for the comments! Opabinia regalis 04:38, 4 July 2006 (UTC)[reply]

I made a few minor changes related to your most recent comments. I also want to put this up for discussion since it came up on the talk page: the list of structural alignment software is currently housed in the daughter article sequence alignment software, but there's been a question of whether it's more appropriate to keep it locally on the structural alignment article since it produces information beyond just sequence alignment. Any thoughts? Opabinia regalis 01:50, 8 July 2006 (UTC)[reply]