“Rust: The Rising Star in Data Engineering 🦀 Embracing the Power of Performance, Safety, and Elegance”

abhijrathod
8 min readFeb 15, 2024

--

While Rust may not outright replace Python, its encroachment on JavaScript tooling suggests a shift in the programming landscape. Increasingly, projects are emerging aiming to integrate Rust into the realm of Python and data engineering. Let’s dissect why Rust holds promise for data engineers, its strengths, and why it has consistently ranked as the most beloved programming language for seven consecutive years. Join us as we navigate the intersection of Python and Rust, and envision the future of data engineering.

Rust isn’t just another programming language; it’s a game-changer. Developed by former Mozilla employee Graydon Hoare, Rust emerged from a personal project to revolutionize the coding landscape. Since its stable release on May 15, 2015, Rust has garnered attention for its versatility and innovation.

So, what makes Rust truly unique?
But here's where Rust truly shines: 
So, what makes Rust truly unique?Rust is a programming language of unparalleled flexibility, accommodating imperative procedural, concurrent actor, object-oriented, and pure functional styles. It seamlessly integrates support for generic programming and metaprogramming, both statically and dynamically.

Unlike its counterparts, Rust prioritizes safety, concurrency, and performance without sacrificing usability. Its compiled nature ensures robust type and system checks during compile-time — yes, that’s right, preemptive error detection! Say goodbye to the frustrating hunt for bugs during runtime, a common plight with interpreted languages like Python. Thanks to Rust’s lightning-fast compiler, errors are caught swiftly, sparing developers endless headaches. Moreover, the Rust community’s dedication to crafting informative error messages elevates the programming experience to new heights.

In essence, Rust isn't just a language; it's a testament to innovation and efficiency.

Why Choose Rust for Data Engineering?

In the realm of data engineering, reliability is paramount. Imagine writing code that doesn’t just run smoothly during office hours but remains steadfast even in the dead of night or over the weekend. Rust brings this dream to life by proactively identifying errors and enhancements as you code, minimizing the risk of runtime failures.

While Python has long been the darling of data engineers, its reputation for robustness and safety leaves much to be desired. Rust, on the other hand, revolutionizes the developer experience by prioritizing reliability from the ground up.

So, why should data engineers embrace Rust?

Unlike Python, Rust’s compiler is relentless in its pursuit of perfection. By flagging potential issues during development, Rust saves valuable time and resources that would otherwise be spent troubleshooting in production. This proactive approach to error detection significantly reduces the likelihood of costly runtime failures, a luxury Python struggles to offer.

But Rust isn’t just about catching bugs; it’s about fostering a culture of excellence in software development. Within the Rust ecosystem, every facet of creating and maintaining production-quality software is treated as a first-class citizen. From robust tooling to comprehensive documentation, Rust empowers data engineers to elevate their craft to new heights.

In essence, Rust isn't just a programming language—it's a commitment to reliability, eff

What Rust Does Well

Python is dynamically typed (with only recent support for type hints) and requires writing extensive tests to catch these costly errors. But that takes a lot of time, and you must foresee all potential errors to write a test for it.

Rust is the opposite; it forces you to define types (or does it implicitly with type inference) and enforces them. This does not obsolete testing of course, but for example, the rust compiler will analyze e.g. borrow checking, and does things to your code that other compilers don’t do — check out rust-analyzer for bringing them into your IDE of choice. This makes it very good for data engineers as we have many moving parts such as incoming data sets that we do not control. Defining expectations with data types and having vigorous checks at coding and compile time will prevent many errors.

Less relevant for data engineers, but super helpful: speed. Rust, as a compiled language, is super fast at run-time. To many, Rust is primarily an alternative to other systems programming languages, like C or C++. But you don’t need a systems use case to use a systems language, as both Vercel and Crowdstrike are noticing.

Another one is integrations. With data pipelines being the glue code in most cases, connecting otherwise foreign systems, Rust almost runs platform agnostic. Rust makes it easy to integrate and communicate with other languages through a so-called foreign function interface (FFI). The FFI provides a zero-cost abstraction where function calls between Rust and C have identical performance to C function calls. Rust can be called easily from C, Python, Ruby, and vice-versa. Find more on Rust Once, Run Everywhere.

A less technical but still important element is to love or have fun, enjoying your programming language. Rust is a more complex language to learn, but it was the most loved technology for seven years (2022, 2021, 2020, 2019, 2018, 2017, 2016) in a row on the Stack Overflow survey:

Loved vs. Dreaded and most Wanted Programming Language on StackOverflow Survey 2022

Besides the love, it’s also rising in awareness of different trends such as Google Trend, one from 2019 Ranking on GitHub, or the StackOverflow below:

StackOverflow Trends

ℹ️ Why Rust is Popular?
For software engineers, many issues around systems programming are memory errors. Rust’s goal is to design a project with quality code management, readability, and quality performance at runtime

nteresting Open-Source Rust Projects

The language is always only as good as its community. Let’s look at some of the existing open-source tools and frameworks built in and around Rust:

  • DataFusion based on Apache Arrow: Apache Arrow DataFusion SQL Query Engine similar to Spark
  • Polars: It’s a faster Pandas. Probably going to compete with DuckDB (?)
  • Delta Lake Rust: A native Rust library for Delta Lake, with bindings into Python and Ruby
  • Cube: Headless BI for Building Data Applications
  • Written mostly in Rust, Cube’s data processing and storage are based on the Arrow DataFusion query execution framework, which uses Apache Arrow as its in-memory format. Especially the core of Cube, the cache layer called Cube Store is 100% built-in Rust
  • Vector.dev: A high-performance observability data pipeline for pulling system data (logs, metadata)
  • ROAPI: Create full-fledged APIs for slowly moving datasets without writing a single line of code
  • Meilisearch: Lightning Fast, Ultra Relevant, and Typo-Tolerant search engine
  • Tantivy: A full-text search engine library
  • PRQL: Pipelined Relational Query Language for transforming data
  • Many more; please let me know of any

Less relevant to data engineering, but still cool:

  • Deno: This is a fast Node.js version
  • Tauri: Tauri is a framework for building tiny, blazingly fast binaries for all major desktop platforms
  • Yew: A modern Rust framework for creating multi-threaded front-end web apps with WebAssembly.

Rust vs. Python

The downside of Rust, the learning curve is much higher than other languages, such as Python. That’s why most Rust programs in data engineering will have a Python wrapper for integrating it into any Python data pipelines for a long time. It’s also a shift from an interpreted language such as Python to a more Functional Language (FP) style, which Rust certainly supports.

📝 The upside and downside of the Python language

What makes Python popular right now:
* It’s old
* It’s beginner-friendly
* It’s versatile

The downsides of Python:
* Speed / Multithreading
* Scope
* Mobile Development
* Runtime Errors

Check more on Why Python is not the programming language of the future or a small Twitter poll if Rust is suited for data engineering.

Other Recent Programming Languages

Newer programming languages follow the functional programming approach. New functional programming languages started, such as Scala with Akka, Elixir, or multi-paradigm programming languages such as Julia, Kotlin (a fastest-growing language since Google made it default for Android development), and Rust.

GoLang seems to be a good compiled programming language usedin DevOps.

Elixir has servers monitoring data pipelines and re-tries included in the language; no framework is needed. It makes an excellent fit for data engineering and would replace parts of the Data Orchestrators.

Rust as a Primary Language?

Let’s see an example of a modern data pipeline integrating with Airbyte, dbt, and some ML models in Python.

Each step can have errors and data mismatches. That’s why we have orchestrator frameworks such as Dagster, which force you to write functional code or the concept of Functional Data Engineering. There is also lots of adoption in Python with the type hint or writing more Python and Functional Programming style. Or to bring up an example of another language, JavaScript, the rise of TypeScript.

❓ The exciting question to me is whether Rust will be adapted as a primary language and can do data orchestration work?

As we typically load data into a data frame and transform or add some business logic within our data pipelines. This could be done efficiently with Rust and Apache Arrow, and DataFusion, which is type-safe, and a good ecosystem. Time will tell.

Will Rust Be the Programming Language for Data Engineers?

Rust is a multi-use language and gets the job done for many problems of a data engineer. But the data engineering space is dominated by Python (and SQL) and will stay that way for the foreseeable future. There is no “until people fully move into Rust”. It’s hard to express how many tools and frameworks are written in Python to interoperate with other Python tools. It’s pretty hard to imagine that inertia changing substantially in the next decade.

The Rust projects we have seen above are excellent and will continue to grow for vital and core components, but for them to be helpful for the average data engineer. What was once supposed to be Scala will now be Rust — a backend tooling language to do tasks that need fast and well-maintained code, including a Python wrapper on top.

Writing libraries in Rust feels more like writing long-term infrastructure than writing in higher-level languages such as Python, Java, or the JVM.

What do you think? What is your take on Rust for data engineers?

Resources to Learn More on the Topic

Suppose you want to be up and running within minutes. Karim Jedda has an article, carefully exploring the Rust programming ecosystem as a 10+ years Python developer, checking how to do everyday programming tasks and what the tooling looks like. Shared Services of Canada did a hands-on example with Rust converting raw archive files into JSON for data analysis. Or Mehdi Ouazza’s article where he debates the Battle for Data Engineer’s Favorite Programming Language.

Learning Rust has many excellent resources. A half-hour to learn Rust, The Rust Book, Rust By Example, Read Rust, or This Week In Rust.

Or Learning Rust with different kinds of formats:

Or do you want to get hands-on and search for an example project? How about building an Airbyte Delta Lake Destination (Python interface) with delta-rs?

--

--

Responses (2)