Protein Lysine Methylation Classification Using Decision Forest Models

Decision Forest
Machine Learning
Protein Lysine Methylation

The following was a project made for one of my graduate studies courses in the winter of 2021 in collaboration with Joshua Tanner and Nhat Hieu Le.

Abstract

Lysine methylation plays a crucial role in gene regulation for many human diseases such as cancer, heart diseases, and neurodegenerative disorders, thus understanding the mechanism of methylation for drug design and research works is essential for disease treatment, especially, an initial but important step is to detect methylation sites. In this project report, we apply a machine learning classifier model called Random Forest, an ensemble learning algorithm that is trained and validated on a provided dataset and tested on a blind test set. Additionally, an iterative process of an artificial intelligence model cycle starts from data preparation, model tuning, ensemble learning, and testing stages have been comprehensively investigated in this report. Our method achieves the second-highest place, surprisingly close to the first position, based on the metric used in the course project.

Final Report

The rest of the report (including figures) can be found here.