Final-year BSc Data Science (Warsaw University of Technology)

Projects

Real-Time Finance - Big Data Pipeline

SparkHiveHBaseHDFSDocker

Two-person team project. Stack: NiFi/Kafka → HDFS → Spark → Hive → HBase. My work was Spark/Hive/HBase (processing, analytics, serving).

  • Spark ETL for curated datasets + analytics (OHLC/volume, returns, rolling BTC–USD/PLN correlation).
  • Hive external tables/views and sanity checks for analytics outputs.
  • Serving layer: loading analytical facts into HBase for fast key-based reads.

NMAR - R package for estimation under nonignorable nonresponse

CRANRCI/testsDocsSimulation studies

CRAN R package: unified nmar() API and method comparisons in simulation studies.

  • Implemented estimators from the literature behind a unified nmar() API.
  • Reproducible simulation studies for comparisons + validation.
  • Packaged for CRAN with documentation and CI/testing.

Mamut - AutoML toolkit for tabular classification

PythonPyPIscikit-learnOptunaEnsemblesReports

AutoML workflow for tabular classification (binary + multi-class): preprocessing, HPO, model comparison, and ensemble search, with reports and plots.

  • Preprocessing pipeline (imputation, scaling, encoding, skew correction, outliers; optional PCA/feature selection).
  • Model search across common classifiers (LogReg, RF, SVC, XGBoost, MLP, NB, KNN) with Bayesian or grid search.
  • Dynamic ensemble search with majority voting (hard/soft) + HTML report, notebook plots, and optional SHAP.

QuantumRAG - Grover-inspired top-k selection for RAG

RAGFAISSEmbeddingsStreamlitQiskitBenchmarking

Dense retrieval (FAISS) + GroverTopK selector (Qiskit Aer), with an evaluation harness on SQuAD 1.1 and end-to-end latency profiling.

  • Multi-model comparisons via Hugging Face Inference (llama-3-8b, mixtral-8x7b, phi-3.5), with answers with vs. without context.
  • Benchmark artifacts exported to CSV/JSON + plots; Streamlit demo for context inspection and model comparison.
  • Key finding: Grover vs classic selection ~identical (≈99% context agreement) with ~30 ms overhead; top-3 contexts outperform top-1 / no-context.
Other projects
DermNet - DINOv2 embeddings for clustering GitHub
DoomRL - PPO/A2C agents for ViZDoom GitHub

Research

Research Software Engineer

Poznań University of Economics and Business · project: “Towards census-like statistics for foreign-born populations - quality, data integration and estimation” (2020/39/B/HS4/00941)

03/2025 - Present
  • Implemented NMAR estimators from the literature behind a unified API.
  • Built reproducible simulation studies to compare methods and validate behavior.
  • Maintained engineering quality: documentation/vignettes, CI, tests.
  • Talks: uRos (Romanian NSI, 2025) and ElementsX (AGH, 2025).

Leadership

President, Data Science Club (WUT)

2024–2025

  • Organized talks/workshops; hosted guests from Google, ING, Allegro.
  • Worked with a student team on outreach and events.

Co-organizer, ensembleAI hackathon

2024, 2025 (preparing 2026)

  • Sponsors, logistics, venue coordination, on-site operations.

Capitalize (student venture, Enactus WUT)

Demo app shipped to Google Play (testing track)

  • Backend features/APIs (FastAPI).
  • Python scripts for basic telemetry analysis from Amplitude exports.

Awards

  • 2nd place - Enactus Poland National Competition (Capitalize), 2023
  • Finalist - Consult IT business/technology hackathon (SGH Warsaw School of Economics), 2023
  • Laureate - AGH “Diamond Index” Olympiad in Physics, 2022
  • Finalist - National Technical Knowledge Olympiad (OWT), 2022

Skills

Data / Systems

  • SQL, Spark
  • Hive, HDFS, HBase
  • NiFi, Docker

Software Engineering

  • Python, Java, R
  • Git, Linux, CI/testing (GitHub Actions)
  • Backend: FastAPI, Spring Boot

ML / Evaluation

  • PyTorch, scikit-learn, Transformers
  • Optuna, NumPy, Pandas

Languages

  • Polish - native
  • English - C2 (CAE Grade A)
  • German - basic

Contact

Open to Junior/Intern roles around data engineering, data-heavy backend, and ML systems.