Skills
Collaboration & Communication :
Working in an independent manner and also in cross-functional teams (clinicians, wet-lab scientists, data scientists, software engineers, UX/UI designers)
Gathering, evaluating and synthesizing of information into articles
Project management and mentoring
Good at communication and presentation skills
Good at problem solving and adaptability
Curious and loves to learn new things
Programming & Tools :
Python - NumPy, pandas, scikit-learn, seaborn, PyTorch, Matplotlib, Plotly, Biopython
R - CRAN, Bioconductor
unit testing - R (testthat), Python (unittest, pytest)
bash
Machine Learning :
Statistical tests: incl. t-tests, Wilcoxon, ANOVA
Regression: incl. linear regression, logistic regression, survival models (Cox proportional hazards, Kaplan-Meier estimation), ridge, LASSO, and elastic net regularization
Classification: incl. ensemble (random forests, XGBoost), kernel (SVM), and linear/probabilistic (LDA, logistic regression), distance-based (k-nearest neighbors) models
Clustering: K-means, EM algorithm
Probabilistic models: Hidden Markov models (HMMs), linear Gaussian state-space models
Dimensionality reduction & factorization: PCA, t-SNE, MOFA, NMF
Sampling & optimization: MCMC, replica exchange Monte Carlo, simulated annealing
Deep learning: variational autoencoders (VAEs), CNNs, transfer learning, transformers, retrieval-augmented generation (RAG), NLP, LLM toolchains (LangChain, Mistral, Llama, Groq, Ollama, Chainlit, Qdrant, FastEmbed, SPARQL)
Federated learning
MLOps :
Workflow languages - Nextflow, Snakemake
Docker, Singularity
SLURM, Grid Engine, Kubernetes
AWS (S3, EC2, AWS Batch, Sagemaker), DigitalOcean
Databases : MySQL, SQLite, PostgreSQL
Version Control & Software Management : Linux/Unix systems, git, svn, conda, GNU Guix
Web Frameworks : Django, CSS, JavaScript, HTML, jQuery, PHP