Summary: | Determining the functions of uncharacterized proteins from sequences remains a challenge despite the growth of the number of prediction methods. This is due to the nature of the inherent limitations of current tools and databases and the ambiguity of the function definition. Additionally, standard methods of functional assignment involve sequence alignment to a gene function often fail to find the significant matches. This paper proposes a framework of machine learning method in predicting protein function irrespective of sequence similarity. The framework aims to provide a workflow on predicting protein function that combines both data mining and machine learning algorithms. Three main components are involved: pre-processing, model development and testing & evaluation. The study is expected to create a new method on feature selection processes towards predicting protein functional classes in addition to complementing the existing conventional method of functional assignment. © 2008 IEEE.
|