FAQ

Q: What is the standard deviation good for?

A: The standard deviation tells you whether the sub-classifiers agree with one another. To be precise, it tells you how dependent the prediction is on the random choice of false proteins in the training set. If the standard deviation is high and the mean score is borderline, you should be careful about the results. However, keep in mind that the classifiers are not entirely independent of each other. So even if the classifiers agree with one another and give a low standard deviation, you could still possibly get a false positive.

Q: I got a positive score for my sequence, how can I tell whether the protein is toxin or toxin-like?

A: The short answer is that you can't tell the difference between toxin and toxin-like by using ClanTox. In fact, it is likely that you can't do it by any computational method. Generally, toxin and toxin-like proteins can be extremely similar (consider toxins that are expressed in non-venom tissues as an extreme example). The main difference between toxin and toxin-like proteins is the tissues in which they are expressed. This data is normally not coded in the protein sequence. The best way to tell the difference is to obtain expression localization data.

Q: A sequence that I entered was classified as toxin or toxin-like. What does this mean?

A: There are 3 options: Either your protein is a toxin, toxin-like, or a false positive. If your protein is a partial sequence (fragment) there is an increased risk of it being a false-positive. Also keep in mind that there are some groups of proteins that consistently score high (especially cysteine-rich groups) and are not necessarily toxin-like. The best way to further learn about your protein is to check with other computational tools such as InterProScan. You may also be able to learn about the protein from searching for homologs by sequence similarity searches (keep in mind that many times homologs appear as statistically insignificant in these methods, partially because of their length). If the other methods find nothing, and assuming you did not get a false positive, then the best bet for your protein is probably a toxin, toxin homolog or antibacterial proteins. All that is left is to check that hypothesis in the lab...

Q: In order to test your classifier, I entered a long sequence of cysteines and got a positive score. Is this bug?

A: No this is not a bug. ClanTox is trained to work on real protein sequences, not on arbitrary amino acid sequence.

Q: What kind of classifiers do you use?

A: Each of the sub-classifiers is a boosted stumps classifier. The decision-stump is a one-dimensional linear separator that returns -1 or 1. We apply a parameter-tuned version of the AdaBoost algorithm with the decision-stump as the weak learner, resulting in a linear combination of decision-stumps that give a final prediction. The prediction of each sub-classifier is normalized and sent to the meta-classifier, which returns the mean and standard deviation of the 10 sub-classifier predictions.

Q: I entered a toxin sequence and it was classified as "probably not toxin-like" (class 0). What is wrong?

A: Generally, most toxins received are classified as class 1 or higher. However, some toxins may be classified as class 0. Mostly these are preproteins that are much longer than the final mature peptide. We advise the use of mature protein sequences whenever possible (signal peptides can be detected computationally using SignalP).

Q: I got a positive score for a 2000 amino-acid sequence. Is this a toxin?

A: Probably not. ClanTox results should be read and used critically, they do not replace common-sense... After all ClanTox is only a prediction method. If the sequence is long it is probably not a toxin or toxin-like protein. Check to see whether it has a signal peptide (try SignalP). Check with other function prediction methods (InterProScan, PHYRE).

Q: "Probably toxin-like"... Your predictions are too vague. Why shouldn't I just use a regular prediction server such as InterProScan?

A: First of all, InterProScan is a great tool and you should by all means use it. The problem is that most supervised methods (such as those in InterProScan) try to characterize specific known families. If your protein belongs to one of those families, great. However, what happens if your protein does not belong to one of those predefined families? This is where ClanTox can help. ClanTox does not learn to characterize features of specific toxin families, but rather a global toxin characteristic. Consider the fact that ClanTox is trained only on ion channel inhibitors, but is able to detect several different toxin strategies such as phospholipases, disintegrins, protease inhibitors, and more. Therefore, even if your protein belongs to some unknown class of toxin or toxin-like proteins, there is a good chance that ClanTox will detect it as such.

Q: What kinds of proteins are toxin-like?

A: Secreted protease inhibitors, phospholipases and antibacterial proteins are the majority of proteins that are toxin-like. Since these are generally detectable by other methods, you can cross-check to see if your protein is one of those. Note that if your protein is not detected as belonging to a known group and is predicted positive, it might be an antibacterial proteins. This is due to the fact that ClanTox seems to be able to detect a variety of antibacterial proteins that are similar in structure to toxins such as beta defensins, thionins, and cyclotides (there is a simple evolutionary logic to this).

Q: I received a label of P3,P2,P1 and N to my input, What does this mean quantitatively?

A: Well, if you apply the classifier to a non-redundant set ~30000 proteins (all SwissProt proteins of length <= 150 amino acids), 7% will be classified as class P1, P2 and P3. 3.2% will be classified as class P2 and P3, and 1.8% will be classified as class P3. For a set of proteins of arbitrary length, the percentages are much smaller.

Q: When I click on the back button in my browser, why do the clantox results disappear?

A: The website has been implemented using latest web technology (Ajax), that does not load new pages, but instead executes inside the current window. In order to keep the result page, you should navigate using the green Tabs only, or simply click on the "reload" button of your browser . Note that this issue will be fixed in future realeases.