Omics Data Analysis

I specialize in leveraging advanced statistical methods and developing cutting-edge software tools to process and analyze large-scale sequencing datasets. These innovations enable the discovery of meaningful patterns and relationships across diverse omics datasets, paving the way for robust hypothesis generation and data-driven insights.

In my projects, I integrated multi-omics data from a variety of sources, including:

  • Gene expression data (RNA-seq, scRNA-seq),
  • DNA methylation profiles (Bisulfite-seq, RRBS, methylation arrays),
  • Open chromatin regions (ATAC-seq),
  • Transcription factor binding sites (ChIP-seq),
  • Data from specialized protocols and methods, such as DRIP-seq and RDIP-seq, for detecting DNA-RNA hybrids
  • Information on therapies, drugs, and biomarkers from internal and external clinical trial databases

By combining these diverse data types, I created comprehensive models that contribute to understanding complex biological systems.

Statistical Analysis

I employed a wide range of statistical techniques to extract meaningful insights from complex datasets, including:

  • Survival Analysis: Utilizing methods such as the Kaplan-Meier Estimator and Cox Proportional Hazards Model to assess time-to-event data and evaluate prognostic factors.
  • Regression Analysis: Applying linear regression and other predictive modeling techniques to uncover relationships between variables.
  • Classification Methods: Leveraging algorithms such as logistic regression, elastic net, random forests, support vector machines (SVMs), and positive-unlabeled (PU) learning for accurate predictions and classifications.
  • Unsupervised Methods: Employing dimensionality reduction and clustering techniques, including PCA, MOFA, and autoencoders, to uncover hidden structures in data.