Prof. Marek Gagolewski
My current research interests are related to data science — with a focus on modelling complex phenomena, developing usable, general-purpose algorithms, studying their analytical properties, and finding out how people (laymen, decision makers, students, and researchers from different fields) use, misuse, understand, and misunderstand data analysis methods in scientific, business, political, social, and other settings. In my spare time, I write books for my students and develop open-source data analysis software.
-
Deep R Programming (HTML) (PDF) (paper copy) (GitHub)
-
Minimalist Data Wrangling in Python (HTML) (PDF) (paper copy) (GitHub)
- genieclust – Fast and robust hierarchical clustering with noise point detection (GitHub) (PyPI) (paper)
- deadwood – Outlier detection via trimming of mutual reachability minimum spanning trees (GitHub) (PyPI)
- quitefastmst – Euclidean and mutual reachability minimum spanning tree algorithms (GitHub) (PyPI)
- clustering-benchmarks – A framework for benchmarking clustering algorithms (GitHub) (PyPI) (paper)
- stringi – Fast and portable character string processing in R (one of the most often downloaded packages for R) (GitHub) (CRAN) (paper)
- genieclust – Fast and robust hierarchical clustering with noise point detection (GitHub) (CRAN) (paper)
- deadwood – Outlier detection via trimming of mutual reachability minimum spanning trees (GitHub) (CRAN)
- quitefastmst – Euclidean and mutual reachability minimum spanning tree algorithms (GitHub) (CRAN)
- stringx – Drop-in replacements for base R string functions powered by stringi (GitHub) (CRAN)
- realtest – Where expectations meet reality: Realistic unit testing in R (GitHub) (CRAN)
- TurtleGraphics – Learn computer programming in R while having a jolly time! (GitHub) (CRAN)
- Clustering benchmarks (framework, datasets, results)
- Datasets for teaching