
Machine Learning concepts and tools in Biomedical Research, Cheminformatics and Genomics
Machine Learning in Bioscience Research using Programming in R
Skills you will gain:
About Program:
With the rapid growth of biological data from genomics, proteomics, and clinical studies, traditional analysis methods are often insufficient to uncover complex patterns. Machine learning provides powerful tools for classification, prediction, clustering, and biomarker discovery. R, being a leading language for statistical computing, offers a rich ecosystem of packages such as caret, randomForest, e1071, and Bioconductor for implementing ML workflows in biosciences.
This workshop provides a hands-on, dry-lab approach to building ML models using R. Participants will learn data preprocessing, feature selection, model training, validation, and visualization. Real-world biological datasets will be used to demonstrate applications such as gene expression analysis, disease classification, and predictive modeling, preparing participants for research and industry applications.
Aim:
With the rapid growth of biological data from genomics, proteomics, and clinical studies, traditional analysis methods are often insufficient to uncover complex patterns. Machine learning provides powerful tools for classification, prediction, clustering, and biomarker discovery. R, being a leading language for statistical computing, offers a rich ecosystem of packages such as caret, randomForest, e1071, and Bioconductor for implementing ML workflows in biosciences.
This workshop provides a hands-on, dry-lab approach to building ML models using R. Participants will learn data preprocessing, feature selection, model training, validation, and visualization. Real-world biological datasets will be used to demonstrate applications such as gene expression analysis, disease classification, and predictive modeling, preparing participants for research and industry applications.
Program Objectives:
- Understand ML fundamentals and their application in bioscience research.
- Learn data preprocessing and feature engineering using R.
- Build classification and regression models using R packages.
- Evaluate model performance using statistical and ML metrics.
- Apply ML workflows to genomics and biological datasets.
What you will learn?
Day 1: Genomics Data Processing & ML Foundations
- ML in bioscience
- Retrieval and preprocessing of genomic datasets from NCBI GEO / ENA (FASTQ/FASTA formats)
- Feature extraction from sequences (k-mers, GC content, embeddings)
- Encoding biological sequences using one-hot, k-mer vectors, and transformer embeddings
- Dimensionality reduction using PCA / t-SNE / UMAP on genomic features
- Variant analysis and SNP classification using ML pipelines
- Building a basic genomic prediction model (disease vs normal classification)
Day 2: Machine Learning in Drug Discovery and Prepare Bioscience
- Data for Real ML Modeling
- Retrieval and curation of chemical datasets from PubChem / ChEMBL
- Molecular descriptor calculation using RDKit (physicochemical properties)
- QSAR modeling using regression/classification ML models
- Drug-target interaction prediction using matrix factorization / ML models
- Virtual screening using ML and Model evaluation using ROC-AUC, precision-recall, RMSE metrics
- Essential statistics for ML: mean, median, variance, SD, correlation
- Hands-on: Preprocess a biological dataset for modeling
- Outcome : To clean, prepare, and structure biological data so it
- becomes ready for machine learning workflows.
Day 3 Apply Machine Learning to Real Bioscience Problems
- Applied ML in Bioscience
- k-means clustering, PCA for biological pattern discovery
- Hands-on: disease prediction / species classification / gene- expression style
- Outcome: Evaluate ML models in R for disease prediction, classification, and biological data interpretation.
- Integration of genomic & chemical data for target identification
- Multi-omics data fusion using ML-based feature integration techniques
- Building a drug response prediction model using genomic signatures
- Implementation of deep learning models (CNNs for sequence analysis)
- Explainability using SHAP / feature importance in biological models
- End-to-end pipeline: target identification → lead prediction → validation
- Deployment of ML models using Streamlit / API for real-time predictions
Mentor Profile
Fee Plan
Get an e-Certificate of Participation!

Intended For :
- Undergraduate/postgraduate degree in Bioinformatics, Biotechnology, Computational Biology, Data Science, or related fields.
- Professionals working in biomedical research, genomics, pharma R&D, or healthcare analytics sectors.
- Researchers and students interested in statistical analysis and machine learning using R.
- Individuals with a keen interest in data-driven bioscience research.
Career Supporting Skills
Program Outcomes
- Understand ML fundamentals and their application in bioscience research.
- Learn data preprocessing and feature engineering using R.
- Build classification and regression models using R packages.
- Evaluate model performance using statistical and ML metrics.
- Apply ML workflows to genomics and biological datasets.
