Omics Data Analysis
I specialize in leveraging advanced statistical methods and developing cutting-edge software tools to process and analyze large-scale sequencing datasets. These innovations enable the discovery of meaningful patterns and relationships across diverse omics datasets, paving the way for robust hypothesis generation and data-driven insights.
In my projects, I integrated multi-omics data from a variety of sources, including:
- Gene expression data (RNA-seq, scRNA-seq),
- DNA methylation profiles (Bisulfite-seq, RRBS, methylation arrays),
- Open chromatin regions (ATAC-seq),
- Transcription factor binding sites (ChIP-seq),
- Data from specialized protocols and methods, such as DRIP-seq and RDIP-seq, for detecting DNA-RNA hybrids
- Information on therapies, drugs, and biomarkers from internal and external clinical trial databases
By combining these diverse data types, I created comprehensive models that contribute to understanding complex biological systems.
Statistical Analysis
I employed a wide range of statistical techniques to extract meaningful insights from complex datasets, including:
- Survival Analysis: Utilizing methods such as the Kaplan-Meier Estimator and Cox Proportional Hazards Model to assess time-to-event data and evaluate prognostic factors.
- Regression Analysis: Applying linear regression and other predictive modeling techniques to uncover relationships between variables.
- Classification Methods: Leveraging algorithms such as logistic regression, elastic net, random forests, support vector machines (SVMs), and positive-unlabeled (PU) learning for accurate predictions and classifications.
- Unsupervised Methods: Employing dimensionality reduction and clustering techniques, including PCA, MOFA, and autoencoders, to uncover hidden structures in data.