Talk:List of sequence alignment software

Latest comment: 6 years ago by Lenov in topic Missing read mapper

Missing read mapper edit

HISAT2 is missing in the list of short read mappers. [1]. This is one of the best mappers for RNA-seq around. I do not have time to add it just now. Nicolas Le Novère (talk) 13:25, 25 October 2017 (UTC)Reply

Conflict of interest edits edit

Thorwald (talk · contribs) asks in an edit summary why I removed the Geneious entry. The entry was made by Bm richard (talk · contribs) who is a Geneious developer. This type of edit is a conflict of interest and inappropriate on Wikipedia. The same editor's only contributions were to place Geneious spam in other articles. I hope editors here will not tolerate that type of behavior and remove that entry as well as any others that were added for self-interest reasons. Other list articles find that link spam is easly controlled by allowing only entries that also have articles. That ensures that the entry is sufficiently notable for inclusion, and any notability decisions happen in one place: the article itself. This article has a ways to go to in order to work with a system like that, but I urge the editors here to consider it to reduce abuse by Wikipedia by spammers. JonHarder 19:31, 27 October 2006 (UTC)Reply

Making the set of list entries and the set of articles coextensive has a problem here - a lot of these programs are useful enough to list but there just isn't enough to say about them to merit independent articles. Maybe they should have more complete descriptions here in lieu of their own pages, except for the very notable ones. I agree with the deletion of the article on Geneious but don't have a particular problem with its inclusion in the list. Opabinia regalis 02:49, 28 October 2006 (UTC)Reply
I will admit that I know nothing about Geneious. However a quick look at their site reveals that it appears to do "sequence alignment". As this article is a list of sequence alignment software, I believe it should be exhaustive and not just "notable"; this is an encyclopaedia after all. I am also not sure that the author of a programme submitting a link constitutes spam or conflict of interest. If they were to write a glowing article praising its virtues . . . I might think otherwise.--Thorwald 20:00, 28 October 2006 (UTC)Reply

Which software is useful and better? edit

First, I think the sequence and structure alignment are two very different problems. Therefore, they should be described in two separate articles. Second, as a user of the structure alignment software, I had a lot of trouble trying to determine what is actually working there and reliable. There are some benchmarking papers, but they do not say: "use this program for that purpose", so I had to spend a lot of time running different tests myself. Finally, I realized that SSM (from EBI) server is convenient and robust enought to compare remotely related proteins, although it can not be applied for small proteins and peptides almost lacking secondary structure. Of course, I did not check all methods indicated in this article. So, would it be appropriate if someone anonymous run a couple of tests for different servers and made a table with their performance? Or maybe such table has been published already? Such benchmarking table would be very helpful for potential users, like me. I could provide some examples for testing. Biophys 22:46, 31 October 2006 (UTC)Reply

  • Your first point is not necessarily true. High sequence similarity generally means structural similarity. Indeed, it has been found over and over again that all the information necessary for the secondary and tertiary structure is found in the sequence. Two identical sequences should yield identical structures. Therefore, these are not different problems. They are married to each other. As such, I believe they should remain under one article.
    As for your second point: This is how computational biology works. Any metric used to determine the accuracy of a programme will generally introduce bias. In science, we don't have absolutes. Therefore, there is no way for anyone to tell you to "use this program for that purpose" . . . it just doesn't work that way.
    On your final point: Yes. SSM is a good server. However, just like every other algorithm/server listed in this article, it has its drawbacks. For an example, I prefer to run everything from the CLI (in Linux) and try to avoid servers. Using SSM can be tedious without a well-documented API.--Thorwald 01:20, 1 November 2006 (UTC)Reply
Yes, protein sequence and structure are certainly related. But I still believe that protein sequence and protein structure are two completely different things, and therefore it is more convenient for the readers of Wikipedia to have them separately. What other people think about it? Biophys 18:10, 1 November 2006 (UTC)Reply
I know for sure, based on the results of my testing, that some of the methods in this table (I tried only few of them) work worse than SSM for remotely related proteins. I think there is nothing bad about it. For example after CASP experiments, we know that some protein structure modeling methods work better than others, which is fine. After attending CASP meetings, I generally had a much better idea "which software should be used for that purpose". Of course, there are important benchmarking tests, although they are not absolutes. But I am not going to insist. That was only a suggestion. What is CLI? I could not find it in the Table. Why it is better than SSM? What is the trouble with SSM? Please tell me. Maybe I wrongly trust this server? Biophys 18:10, 1 November 2006 (UTC)Reply
You wrote that some of these "work worse" than SSM; how are you defining "worse" here? The CASP/CAPRI benchmarks are a good set of 50-80 structures for testing ab initio predictions, but how do define a "good prediction"? CLI just means "command line interface"; id est running programmes from the CLI instead of via a web server.--Thorwald 01:35, 2 November 2006 (UTC)Reply
This is very simple. One program finds superposition when the evolitionary conserved residues in two structures (say Asp and Asp) are structurally superimposed, but another program does not. This is very easy to see when you are working with specific proteins. Such conserved residues can usually be identified from multiple sequence alignments. Of course, I am well aware of such problems as existence of multiple alternative structural superpositions, or that the number of superimosed residues depends on the selected distance cutoff, etc. One of good papers on this subject was published in JMB by Chothia and others (that was about superpositions of 4-alpha-helical bindles). Actually, I liked superpositions in the old version of FSSP/DALI and was looking for a convenient server that does "difficult" superpositions at least as well as FSSP/DALI. That was SSM. Maybe there are better programs, but I personally have no time to check. Same thing with other typical users. That is why it is important to have some kind of independent testing. What can I say? "I have tried programs A and B but did not try C and D, and I liked program B better". That is very subjective. Biophys 15:24, 2 November 2006 (UTC)Reply

Other pages like this? edit

I see the two categories for this page, namely 'bioinformatics' and 'lists of software', but how do I find pages that are also in this combination of categories? (And are there any?). Can we add a link to related pages?

I.e. 'List of sequence annotation software'.

--Dan|(talk) 07:42, 27 June 2007 (UTC)Reply

For example I found Software tools for molecular microscopy, which I think should be somehow categorized with this page. --Dan|(talk) 14:10, 27 June 2007 (UTC==)

Adding new software edit

I am interested in writing an article about a sequence alignment program developed by my university, VU Amsterdam, named PRALINE. I am however concerned with the conflict of interest rule. The original PRALINE paper can be found here. According to the book Essential Bioinformatics, ...PRALINE is perhaps the most sophisticated and accurate alignment program available. Because of the high complexity of the algorithm, its obvious drawback is the extremely slow computation. I have no plans to include comparisons against other popular SA programs, mostly because there is no downloadable version yet, and therefore it's difficult to be benchmarked against them. Please let me know what you think. I hope I receive a response within a reasonable amount of time. PervyPirate (talk) 19:20, 27 February 2008 (UTC)Reply

I found out that the program ApE (A Plasmid Editor) can do alignment as well. I like this program because it's simple to use but still offers a lot of functionality (I'm a Bachelor student from Austria). 85.127.93.199 (talk) 18:49, 7 August 2009 (UTC)Reply

Should it not be the license tab? edit

I find surprising that there is no licensing information in the software listing. Audriusa (talk) 10:10, 6 September 2009 (UTC)Reply

Read mappers? edit

ne info? I'm looking for a Global:Local (global in query) 'read mapper' that takes base quality information into account. So far I haven't found anything... --Dan|(talk) 11:04, 9 November 2009 (UTC)Reply

Indication of availability with Linux distributions? edit

I am biased since developing for Debian, but if there is some way to distribution-neutral (Linux flavours plus Mac plus Windows?) indicate the possibility to install directly a particular package listed here, then this may be of interest to the reader and help comparing the tools in an easier way. Such an availability is not unimportant when tools are ten years old and just do not compile any more with modern compilers and/or request libraries no longer available. The distros help by community-maintaining such software - not always, not enough, but this again is why I think pages like this and Linux distros should collaborate more. Smoe (talk) 18:24, 4 September 2011 (UTC)Reply

SPAM edit

The page is full of SPAM. No entries should be there that are not verifiable and include reliable sources to assert notability. -- Alexf(talk) 13:48, 2 March 2017 (UTC)Reply