Curriculum Vitae

Jul – Sep 2013


KTH Royal Institute of Technology, Stockholm, Sweden
Research stay funded by the German Academic Exchange Service

Since Nov 2011




Humboldt-Universität zu Berlin, Berlin, Germany
Knowledge Management in Bioinformatics
Ph.D. student in the DFG-funded research training group SOAMED
Topic: Adaptive Scheduling of Scientific Workflows

2009 – 2011


Novartis Pharma AG, Basel, Switzerland
Research consultant employed by Elan Computing Schweiz AG

Jul – Sep 2009



Novartis Pharma AG, Basel, Switzerland
Intern at Text Mining Services of NIBR IT
Research study “Chemical structure search for text corpora”

2008 – 2009


Fraunhofer IPK, Berlin, Germany
Student assistant at the research division “Security Technologies”

2005 – 2011




Humboldt-Universität zu Berlin, Berlin, Germany
Study of Computer Science with minor subject Cognitive Psychology
Thesis: Comparing gene co-expression networks in colorectal cancer
Degree: Diploma in Computer Science, 1.0
Award: Best degree in Computer Science at Humboldt-Universität in 2011

Research Interests

Scientific Workflows
Today's scientific experiments typically involve running and refining a series of intertwined computational analysis and visualization tasks on large amounts of data. The complexity of these so-called analysis pipelines resulted in the emergence of e-Science and scientific workflows. Scientific workflows are compositions of sequential and concurrent data processing tasks, whose order is determined by data interdependencies. A scientific workflow is usually specified in the form of a directed, acyclic graph (DAG), in which individual tasks are represented as nodes.

Cloud Computing
Cloud computing describes a recently established form of distributed computing, which provides rentable compute and storage resources on-demand and over the Internet. The "pay-per-use" cost model of commercial cloud providers charges users by the hour and is therefore especially interesting in environments where data analysis is infrequent yet computationally intensive.

Next Generation Sequencing
Genomic sequencing aims at revealing the ordered sequence of nucleic acids (DNA) in a given sample, such as a human chromosome. A new generation of sequencing machines is capable of sequencing millions of short DNA reads in parallel, leading to many interesting challenges in data management and processing.

Publications

M. Bux, J. Brandt, C. Witt, J. Dowling, U. Leser (2017), Hi-WAY: Execution of Scientific Workflows on Hadoop YARN, 20th International Conference on Extending Database Technology (EDBT), Venice, Italy. (Paper)

A. Bessani, J. Brandt, M. Bux, V. Cogo, J. Dowling, A. Gholami, M. Hummel, M. Ismail, E. Laure, U. Leser, J.-E. Litton, R. Martinez, S. Niazi, J. Reichel (2015), BiobankCloud: a Platform for the Secure Storage, Sharing, and Processing of Large Biomedical Data Sets, First International Workshop on Data Management and Analytics for Medicine and Healthcare, Hawaii, USA. (Paper)

M. Bux, J. Brandt, C. Lipka, K. Hakimzadeh, J. Dowling, and U. Leser (2015), SAASFEE: Scalable Scientific Workflow Execution Engine, PVLDB 8(12): 1892–1903, Hawaii, USA. (Paper)

J. Brandt, M. Bux, U. Leser (2015), Cuneiform – A Functional Language for Large Scale Scientific Data Analysis, in Proceedings of the Workshops of the EDBT/ICDT, volume 1330, pages 17–26, Brussels, Belgium. (Paper)

M. Bux, U. Leser (2014), DynamicCloudSim: Simulating Heterogeneity in Computational Clouds, Future Generation Computer Systems 46(C): 85–99. (Paper)

S. Wandelt, J. Starlinger, M. Bux, U. Leser (2013), RCSI: Scalable similarity search in thousand(s) of genomes, PVLDB 6(13): 1534–1545, Hangzhou, China. (Paper)

M. Bux, U. Leser (2013), DynamicCloudSim: Simulating Heterogeneity in Computational Clouds, Int. Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET'13), in conjunction with ACM SIGMOD Conference, New York, USA. (Paper, Slides)

S. Wandelt, M. Bux, U. Leser (2013), Trends in Genome Compression, Journal of Current Bioinformatics. (Paper)

M. Bux, U. Leser (2013), Parallelization in Scientific Workflow Management Systems, Technical Report CoRR/arXiv:1303.7195. (Paper)

S. Wandelt, A. Rheinländer, M. Bux, L. Thalheim, B. Haldemann, U. Leser (2012), Data Management Challenges in Next Generation Sequencing, Datenbank-Spektrum 12(3):161–171. (Paper)

M. Bux (2011), Comparing Literature-enriched Experimental Gene Co-expression Networks in Colorectal Cancer, Diploma Thesis, Humboldt-Universität zu Berlin. (Exposé)