Open source software

Genomation

I maintained and developed Genomation - a Bioc R package that provides collection of functions for simplfiying common tasks in genomic feature/interval analysis. It provides functions for reading BED and GFF files as GRanges objects, summarizing genomic features over predefined windows so users can make average enrichment of features over defined regions or produce heatmaps. It can also annotate given regions with other genomic features such as exons,introns and promoters.

People: Altuna Akalin, Vedran Franke and others

GitHub: https://github.com/BIMSBbioinfo/genomation

Publication: Akalin A, Franke V, Vlahovicek K, Mason C, Schubeler D (2014). “genomation: a toolkit to summarize, annotate and visualize genomic intervals.” Bioinformatics

MethylKit

I contributed to methylKit - a Bioc R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods such as Agilent SureSelect methyl-seq. In addition, methylKit can deal with base-pair resolution data for 5hmC obtained from Tab-seq or oxBS-seq. It can also handle whole-genome bisulfite sequencing data if proper input format is provided.

People: Altuna Akalin, Alex Blume and others

GitHub: https://github.com/al2na/methylKit

Publication: Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, Mason CE (2012). “methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles.” Genome Biology, 13(10), R87.

PiGx - Reproducible pipelines in genomics using GNU Guix

PiGx is a collection of genomics pipelines implemented in snakemake, Python and R. All pipelines are easily configured with a simple sample sheet and a descriptive settings file. The result is a set of comprehensive, interactive HTML reports with interesting findings about your samples.

People: Altuna Akalin, Ricoardo Wurmus and others

GitHub: http://bioinformatics.mdc-berlin.de/pigx/

Publication: Wurmus R, Uyar B, Osberg B, Franke V, Gosdschan A, Wreczycka K, Ronen J, Akalin A. PiGx: Reproducible genomics analysis pipelines with GNU Guix. Gigascience 2018

MotifActivity

The motifActivity R package predicts key transcription factors (TFs) driving gene expression or epigenetic marks changes across the input samples, and the activity profiles of TFs. As input is uses a set of gene expression (e.g. RNA-seq) or epigenetic marks (such as from BS-seq, ChIP-seq, ATAC-seq etc.) across samples, and a set of DNA motifs.

People: Katarzyna Wreczycka under the supervision of Altuna Akalin

GitHub: https://github.com/katwre/motifActivity

Customized enhancements for existing tools for clients

IGV web application

Enhancements of an interactive tool for the visual exploration of genomic data called IGV web application (original source code) implemented in Javascript and Python included:

visualization of publicly-available and in-house data and dynamic adding new in-house data to the IGV application
added and highlight regions of interest, such as genetic variants to the main window
new visualization options for RefSeq and GENCODE gene annotations:
- collapse all genes transcripts
- extend isoforms of a gene of interest
- added a button to control the width of window panes so that the contents ﬁt
added display links to databases from which data originates
implemented a command line application that automatically takes snapshots and session of regions/genes and tracks of interest

Example view from the IGV web app displaying genomic data tracks.

Toy projects

Protein folding in the HP model

Implementation of the simulated annealing and replica exchange Monte Carlo algorithms for protein folding in the HP model in Python (2.7.6) and using NumPy library (1.8.0).

Hydrophobic-polar protein folding (HP) model is used in the study of the general principles of protein folding. The idea of the HP model is based on the observation that a key role in the process of folding has the hydrophobic effect - tendency of hydrophobic amino acids to aggregate and ‘hide’ from the water molecules. Amino acids are over the alphabet {H,P}, where H is hydrophobic and P polar amino acid and they are located on the square lattice.

Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method that allows sampling the set of possible configurations of protein, according to any probability distribution (here the Boltzmann distribution). The algorithm generates a Markov chain in which each state x^{t+1} depends only on the previous state x^t. The algorithm uses a proposal density Q(x’; x^t ), which depends on the current state x^t, to generate a new proposed sample x’. This proposal is “accepted” as the next value (x^{t+1}=x’) if drawn from U(0,1) satisfies.

The lattice protein hydrophobic-polar (HP) model, showing the global energy.

GitHub: https://github.com/katwre/bioinformatics-projects/tree/master/Molecular_Dynamics

Genome assmebly - De Bruijn Graph implementation with Eulerian walk-finder

Modern short-read assembly algorithms construct a de Bruijn graph by representing all k-mer prefixes and suffixes as nodes and then drawing edges that represent k-mers having a particular prefix and suffix [1]. Eulerian walk allows to reconstruct the DNA sequence from its fragments (k-mers) [2].
[1] Phillip E C Compeau, Pavel A Pevzner and Glenn Tesler (2011). How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29, 987–991
[2] Pavel A. Pevzner, Haixu Tang and Michael S. Waterman (2001). An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci U S A., 98(17): 9748–9753

GitHub: https://github.com/katwre/bioinformatics-projects/tree/master/genome_assembly

Sudoku

Sudoku implemented in Javascript and JQuery.

GitHub: https://github.com/katwre/sudoku

Minesweeper

Minesweeper implemented in JAVA using SWING and AWT graphics libraries.

GitHub: https://github.com/katwre/Minesweeper

Django-based web-services

A django based server for Multiple Sequence Alignment (MSA) visualization
GitHub: https://github.com/freesci/MSA-vis-project

Phone application with django 1.5.1, manifesto app, localStorage:
GitHub: https://github.com/katwre/phone_application

Discover your career match

Find your best career matches based on your personality profile based on the Big Five Aspects Scale. This interactive web tool lets users explore their personality traits and see how their profile aligns with different career paths. This tool uses Python (running directly in your browser via Pyodide without need to precompile) and machine learning techniques like PCA and clustering using sklearn and pandas libraries to generate personalized visualizations and career matches.

Website: link
GitHub: https://github.com/katwre/Personalities

PCA plot of careers based on personality traits

Figure: PCA plot showing career matches based on your personality profile.