Dependency Management for Python Applications on Databricks
Databricks offers a powerful platform for distributed data processing. However, managing dependencies for Python jobs running on Databricks can be challengin...
Databricks offers a powerful platform for distributed data processing. However, managing dependencies for Python jobs running on Databricks can be challengin...
After not using Apache Spark at all in 2019, I am currently catching up on features and improvements I missed since version 2.1. While pandas UDFs are certai...
Last month Jérôme Petazzoni gave a talk about the security of Docker containers. The maybe most important message of his talk was that one should avoid runn...
I was curios how different compilers and the optimization options affect the speed of my DNA simulation program scrm. There are almost no hard computations i...
There are certain situations when you need to work with temporary files in R. For instance, my package Jaatha requires that an external simulation tool is ca...
I started playing with r-travis today. It’s a nice project from Craig Citro that makes it quite easy to use Travis CI to automatically build and test R pac...
After literally months of trial and error, I finally managed to run a large analysis using our program Jaatha on my new local supercomputer, superMUC. Jaatha...