Skills
Collaboration & Communication :
Working in an independent manner and also in cross-functional teams (clinicians, wet-lab scientists, data scientists, software engineers, UX/UI designers)
Gathering, evaluating and synthesizing of information into articles
Project management and mentoring
Good at communication and presentation skills
Good at problem solving and adaptability
Curious and loves to learn new things!
Programming & Tools :
Python - NumPy, pandas, scikit-learn, seaborn, PyTorch, Matplotlib, Plotly
R - CRAN, Bioconductor (incl. caret, tidyverse)
unit testing - R (testthat), Python (unittest, pytest)
bash
Machine Learning :
Statistical tests: incl. t-tests, Wilcoxon, ANOVA
Regression: incl. linear regression, logistic regression, survival models (Cox proportional hazards, Kaplan-Meier estimation), ridge, LASSO, and elastic net regularization
Classification: incl. ensemble (random forests, XGBoost), kernel (SVM), and linear/probabilistic (LDA, logistic regression), distance-based (k-nearest neighbors) models
Clustering: K-means, EM algorithm
Probabilistic models: Hidden Markov models (HMMs), linear Gaussian state-space models
Dimensionality reduction & factorization: PCA, t-SNE, MOFA, NMF
Sampling & optimization: MCMC, replica exchange Monte Carlo, simulated annealing
Deep learning: CNNs, transfer learning, variational autoencoders (VAEs), transformer-encoder, diffusion models (U-Net-based DDPM)
NLP: retrieval-augmented generation (RAG), LLM toolchains (LangChain, Mistral, Llama, Groq, Ollama, Chainlit, Qdrant, FastEmbed, SPARQL)
Federated learning
Infrastructure :
SLURM, Grid Engine
AWS (S3, EC2, Lambda, Sagemaker, Elastic Beanstalk, Batch), DigitalOcean
Kubernetes
MLOps :
Workflow languages - Nextflow, Snakemake
Docker, Singularity
Databases : MySQL, SQLite, PostgreSQL
Version Control & Software Management :
Linux/Unix systems
Git
conda
CI/CD
virtualenv, pipenv, uv
Web Frameworks :
Django, FastAPI, Flask
Javascript, HTML/CSS, JQuery
R Shiny
Data engineering :