Use Dynamic Partition Overwrite for ETL with Apache Spark

May 24, 2020. | By: Paul Staab

After not using Apache Spark at all in 2019, I am currently catching up on features and improvements I missed since version 2.1. While pandas UDFs are certainly the most prominent improvement, a colleague pointed me towards a less well-known and almost undocumented feature which dramatically simplifies the creation of Extract-Transform-Load (ETL) jobs with Spark: Dynamic partition overwrite mode.

[Read More]

Subscribe

Subscribe to this blog via RSS.

Recent Posts

Categories

Big Data 1

Popular Tags

ETL (1) Spark (1)