Dr. Alex Bateman, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire

The protein sequence databases contain over 150 million proteins and are rapidly growing. It seems likely that we will have over 1 billion protein sequences to contend with in the near future. These protein sequences are generated by computational gene predictions which sometimes make mistakes.  This means that an unknown fraction of protein sequences are probably never translated. In this presentation I will describe work to attempt to estimate the number of spurious proteins that exist in the databases. These studies are based on two tools that we have created: AntiFam and databased of spurious protein families that we have identified, and secondly using Spurio a new tool that can predict the likelihood that a protein sequence is not a true protein sequence.

29 Oct 2019 ... Alex Bateman, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire. When is a protein not a protein? The ...

Lee mas