Data mining framework for protein function prediction

Determining the functions of uncharacterized proteins from sequences remains a challenge despite the growth of the number of prediction methods. This is due to the nature of the inherent limitations of current tools and databases and the ambiguity of the function definition. Additionally, standard m...

Full description

Bibliographic Details
Published in:Proceedings - International Symposium on Information Technology 2008, ITSim
Main Author: Rahman S.A.; Hussein Z.A.M.; Bakar A.A.
Format: Conference paper
Language:English
Published: 2008
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-57349104125&doi=10.1109%2fITSIM.2008.4631683&partnerID=40&md5=e85c28539d521ebfd983e31610318d89
Description
Summary:Determining the functions of uncharacterized proteins from sequences remains a challenge despite the growth of the number of prediction methods. This is due to the nature of the inherent limitations of current tools and databases and the ambiguity of the function definition. Additionally, standard methods of functional assignment involve sequence alignment to a gene function often fail to find the significant matches. This paper proposes a framework of machine learning method in predicting protein function irrespective of sequence similarity. The framework aims to provide a workflow on predicting protein function that combines both data mining and machine learning algorithms. Three main components are involved: pre-processing, model development and testing & evaluation. The study is expected to create a new method on feature selection processes towards predicting protein functional classes in addition to complementing the existing conventional method of functional assignment. © 2008 IEEE.
ISSN:
DOI:10.1109/ITSIM.2008.4631683