Skills

  • Collaboration & Communication:
    • Working in an independent manner and also in cross-functional teams (clinicians, wet-lab scientists, data scientists, software engineers, UX/UI designers)
    • Gathering, evaluating and synthesizing of information into articles
    • Project management and mentoring
    • Good at communication and presentation skills
    • Good at problem solving and adaptability
    • Curious and loves to learn new things
  • Programming & Tools:
    • Python - NumPy, pandas, scikit-learn, seaborn, PyTorch, Matplotlib, Plotly, Biopython
    • R - CRAN, Bioconductor
    • unit testing - R (testthat), Python (unittest, pytest)
    • bash
  • Machine Learning:
    • Statistical tests: incl. t-tests, Wilcoxon, ANOVA
    • Regression: incl. linear regression, logistic regression, survival models (Cox proportional hazards, Kaplan-Meier estimation), ridge, LASSO, and elastic net regularization
    • Classification: incl. ensemble (random forests, XGBoost), kernel (SVM), and linear/probabilistic (LDA, logistic regression), distance-based (k-nearest neighbors) models
    • Clustering: K-means, EM algorithm
    • Probabilistic models: Hidden Markov models (HMMs), linear Gaussian state-space models
    • Dimensionality reduction & factorization: PCA, t-SNE, MOFA, NMF
    • Sampling & optimization: MCMC, replica exchange Monte Carlo, simulated annealing
    • Deep learning: variational autoencoders (VAEs), CNNs, transfer learning, transformers, retrieval-augmented generation (RAG), NLP, LLM toolchains (LangChain, Mistral, Llama, Groq, Ollama, Chainlit, Qdrant, FastEmbed, SPARQL)
    • Federated learning
  • MLOps:
    • Workflow languages - Nextflow, Snakemake
    • Docker, Singularity
    • SLURM, Grid Engine, Kubernetes
    • AWS (S3, EC2, AWS Batch, Sagemaker), DigitalOcean
  • Databases: MySQL, SQLite, PostgreSQL
  • Version Control & Software Management: Linux/Unix systems, git, svn, conda, GNU Guix
  • Web Frameworks: Django, CSS, JavaScript, HTML, jQuery, PHP