Skip to main content World Without Eng

How I build a Rust backend service

Published: 2023-09-26
Updated: 2023-10-31

In early 2021 I built out a prototype Rust backend service at work. I knew Rust was powerful and ergonomic. It could fly through difficult computation, and language features like pattern matching and idiomatic Result and Option enums made it fresh and new and exciting. Honestly it’s a language that’s hard not to fall in love with. But at that time I had only used it for game development, so it was a big change to serve a REST API, publish and consume Kafka messages, and talk to a database. On top of that, it seemed like no one else was using Rust for web applications. The weekly newsletter This Week in Rust was still describing Rust as a “systems language” (though the description did change later that year). There was a lot of trail blazing to do. Here’s everything I learned and how I build Rust backend services now.

Architectural philosophy

The first thing to recognize is that Rust isn’t going to hold your hand on the big-picture architecture of your application. If you’re writing an application in Java, odds are you’re using a framework like Spring Boot which provides some guidance for laying out your code. Or if you’re writing Ruby, you’re probably going to use Rails. Rails is quite opinionated about file structure and how models should look and so on. But Rust doesn’t have a ubiquitous do-everything-for-me framework like Java or Ruby. As a matter of culture, Rust crates are small. They’re intended to do one thing and do it well. That means you have the freedom to pick an HTTP server crate and a database crate and an observability crate and put it all together however you’d like.

With so much freedom it’s really easy to shoot yourself in the foot. To try to keep my feet intact, I started by researching architectural paradigms and ended up settling on Bob Martin’s Clean Architecture. One of his big ideas is that dependencies should point in the direction of stability (that is to say, depend on things that don’t change), since changes often ripple down the dependency chain. You want to minimize the number of additional changes that your changes will cause. If the number of changes isn’t kept in check, codebases become more and more time-consuming to work on, which is a big problem for a company looking to scale. In general, he’s found that the code least likely to change is the set of business entities and the rules that govern them, which he calls use cases. Use cases depend on entities, and then transformation and mapping layers like controllers and adapters depend on the use cases. Lastly, databases, messaging systems, and other external interfaces depend on that layer of mappers and transformers. This frees you up to change your database technology, or to switch from REST to gRPC without ever touching your core business rules.

Check out the full book Clean Architecture for more about software architecture! (Please note this and the following Amazon book links are affiliate links, but I only recommend books I've actually read and liked!)

Git repository layout

After picking an architectural framework, I started setting up my project. I’m frequently working on services that need to provide a REST API while also consuming messages from Kafka or SQS. The REST API often needs to be highly-available, so I treat the HTTP server and the message consumer as two separate services. I like to deploy them independently so I can scale them independently and know that async message consumption isn’t going to bog down my services handling synchronous REST requests. Even if they were to run on the same hardware, it’s still worth treating them as separate components within your code to keep them decoupled. This allows you to make changes to them independently without side effects or regression risks.

Since both the server and the message consumer use the same business entities and rules, I often also create a common component, which is an independently deployable library containing the entities and use cases for the service. I try to keep one common library (which is scoped to a single bounded context) per repository. There can be as many additional service components that depend on that common library as necessary. For instance, if I just need a REST API, there might only be one additional component. If I need a REST API and a Kafka consumer, there might be two components. If I have a REST API, Kafka consumer, and some background ETL jobs, I might have three components. The key is that those components do not contain the core business logic or entities; they simply use the logic in common to do some work or provide an interface to the outside world. I’ve written more about this project organization in my article about the mini-monorepo concept.

For a Rust service, you can get this nice separation of components by using Cargo workspaces. Cargo is the package manager for Rust, and is quite similar to package managers like npm. You can configure a workspace by adding a Cargo.toml file at the root of your project and listing all the components in that file. Each component then lives in a separate directory inside your repo and has its own Cargo.toml. Components can be built independently with cargo build -p <project name> or together by running cargo build at the root level. Running tests works the same way. Keeping all the code in the same repo makes it easy to import common into your other components since it’s pulling it in locally, rather than getting it from Crates.io or Github. Importing becomes as simple as adding the line common = { path = "../common" } to your component Cargo.toml files in the [dependencies] section.

Serving a REST API

Since I write a lot of REST APIs, one of the first things I needed to track down was a solid HTTP server. The actix-web crate turned out to be a good candidate. I evaluated each server crate on a few criteria:

  1. It had to support async. Being able to handle requests concurrently brings huge efficiency gains for IO-bound services (which is typically what I’m working on).
  2. The library should support running on multiple threads. This allows the application to take full advantage of each vCPU.
  3. It should make it easy to handle path, query, and body params in a type-safe way.
  4. The API needs to be nice to use. I want defining and registering endpoints to be simple.
  5. The API needs to be well documented!
  6. It should be actively maintained.

Actix-web met all of those criteria. It comes with a Tokio runtime under the hood for safe, multi-threaded concurrency. It is well-documented (check out the site!) and the library is quite easy to use. With optional features for serde and serde-json, it was a breeze for me to ingest params and to serialize and deserialize JSON in a type-safe way. If requests come in with incorrect types, there’s no error handling necessary in the code; it simply fails to deserialize and automatically gives the user a 400 error. On top of being nice to work with, actix-web is stable and is still actively maintained. Plus it’s very performant.

There are definitely other libraries out there that meet my criteria. Rocket is probably a great choice as well. At the time I started looking, actix-web had a lot more Github activity than rocket did, which swung me toward actix-web, but feel free to look around more if actix-web isn’t quite what you want.

Want to learn more about API design? I liked API Design Patterns.

Processing background jobs

In addition to REST APIs, I write a lot of event and message handling code. See my article on how Rust helps you write a great message processor for some nitty-gritty details about the logic. With the right abstractions in the code, you can consume pretty much anything that carries messages, including Kafka, SNS, SQS, or RabbitMQ. If you delegate the message retrieval to some code tucked behind an interface, all that “piping” becomes just a detail. Handling a message is largely the same regardless of where it came from.

You will probably want your code to be async if your processor is doing IO-bound operations, like talking to a database. This will allow you to make database calls concurrently, or to process entire messages concurrently if ordering doesn’t matter. Even if ordering does matter, Kafka provides partitions and FIFO SQS has message groups that allow you to enforce ordering for a particular partition or group. You could pull from multiple partitions at once and process those messages concurrently, as long as messages within the partition are processed in order. I like to use the tokio runtime to make my service async.

Lastly, a note on libraries: I’ve processed messages from both SQS and Kafka. If you’re talking to SQS, or really any AWS service, you can use Rusoto. This is pretty much just a Rust wrapper over the HTTP API provided by AWS. There is an official AWS Rust SDK in the works that I will definitely switch over to once it’s stable. At the time of writing they don’t recommend using it in production, but it is being actively developed, so hopefully it’s available soon! As for Kafka, I used rust-rdkafka, which is just a Rust interface to librdkafka. It’s fast and works well, but it does require quite a bit of setup outside of the normal Rust toolchain. That can be a pain when you go to build your code. It looks like the kafka-rust library has improved quite a bit from when I first looked at it, so that could be a nice, pure-Rust alternative.

Using a database

Most backend services, whether they’re serving REST APIs or processing jobs, need to talk to a data store. For my Rust services, I’ve worked primarily with DynamoDB and Postgres. DynamoDB is pretty easy to interact with using Rusoto, and until the AWS Rust SDK is finished that’s the obvious crate to use. You have a bit more freedom with Postgres though since there are many SQL-compatible crates out there. Personally I like using tokio-postgres. Like the name implies, it’s for talking to a Postgres database, so it won’t work for you if you’re not using Postgres. However, one thing about it is important regardless of your database engine: it’s nice to have a library that supports async. As mentioned before, this allows your code to make concurrent calls to the database, thereby allowing your service to do other things while it’s waiting for your query to finish. This is one reason why I opted not to use the popular database library Diesel: it was synchronous so I couldn’t run queries concurrently. (Note that this may have changed since I last looked at Diesel—it’s been a couple years). Lastly, and this is just personal preference so take it with a grain of salt: I like writing raw SQL so I know exactly what’s going on, and tokio-postgres was perfect for that. I’ve been bitten before by obscured queries, so my default for simple CRUD services is to just write SQL. I will note that Diesel allows queries to be type-checked at compile time, which tokio-postgres with raw SQL does not. This is a big trade off, and it has had me eyeing something like SQLx for my next project. SQLx can check your queries at compile time by connecting to a dev database, and it allows you to use Postgres, MySQL, SQLite, and MSSQL. It is also async. Given all that, it’s probably the library I’ll try out next to keep everything I like about tokio-postgres, while also getting compile-time checks and the ability to change database engines.

Finding a library like tokio-postgres to talk to your database is only half the battle. Often you also need support for database migrations and database connection pooling. For migrations, I opted to use a crate called refinery. I didn’t have a strong preference for this library since my use cases are pretty simple. Basically I picked it because:

  1. It allows me to use SQL without any DSL obscuring what’s going on (you can also write migrations in Rust if you want).
  2. Ability to embed migrations right into the start up code. Refinery also has a CLI in case you wanted to run a separate migration script at an earlier stage in the deploy pipeline. I didn’t have any large migrations so I opted to embed directly to make my deployment config a little simpler.
  3. Works with tokio-postgres connections.

As for connection pooling, I went with a crate called deadpool. There’s actually a crate built on top of this crate called deadpool_postgres, which is specifically for use with tokio-postgres. It fit in really nicely with my existing stack and was simple to use, so I went with it. If I move to SQLx in the future, SQLx comes with a built-in connection pooling mechanism, so I’ll be able to drop deadpool at that point.

If you're curious about when to use Postgres vs DynamoDB vs other data storage technologies, I'd recommend Designing Data-Intensive Applications by Martin Kleppmann. I definitely had a knowledge gap here which prompted me to pick up his book.

Logging and tracing

Another capability your service should have is robust logging and tracing to make it easy to see what it’s doing in production at all times. Logging is straightforward with the log crate. It describes itself as a “lightweight logging facade” because it provides an “API that abstracts over the actual logging implementation”. Basically, if you make all your log statements through the log API, you can then plug in whatever logger you prefer, and you can swap it out at any time without updating your log statements. I’ve taken to just using env_logger since it’s simple.

Speaking of simple, I also have logs go right to stdout to get picked up by Cloudwatch Logs (or whatever the GCP or Azure equivalents are). From there, other services can forward the logs to places like Datadog, or can aggregate them, analyze them, and so on. This is a best practice described in the Twelve Factor App, which I generally try to adhere to. It’ll save your application from needing to deal with the details of forwarding and storage, and allows you to make changes to those systems independently of your application code.

As for tracing, it all starts with the tracing crate. This crate allows you to track traces and spans in your code, with one or more spans making up a trace. Traces are intended to follow a single request through the code in your service, even if it stops and starts due to being async. I like to just annotate methods with the instrument macro, which automagically generates a span when that method is entered. Spans can then be collected by subscribers, and there are many subscribers you can choose from. The OpenTelemetry subscriber is probably a good choice since OpenTelemetry traces can be piped into a variety of platforms, like Datadog or New Relic. I wrote my own Datadog specific tracing subscriber if you’re looking for a quick way to pipe trace data for your APM dashboards! If you have your own specific use case, you can also write your own subscriber implementation.

Testing

No service can be production-worthy without tests! For my Rust services, I try to write unit tests for all of my files, plus some end to end tests that can run against my nonprod environment before deploying to production.

Rust does have an idiomatic way of unit testing where unit tests live in the same file as the source code. There are libraries out there to help you mock out your code for unit testing, like mockall. However, I can say from experience that the best thing to do is simply use dependency inversion to write unit testable code from the start (that’s a link to a whole article I wrote on the subject). Dependency inversion also keeps your application code decoupled internally.

If you want to measure your unit test code coverage, you could try using this tool cargo-llvm-cov. When I first tried it, it required the nightly toolchain, but it seems to have stabilized with Rust version 1.66.

Lastly, Rust also provides a mechanism for integration testing the public bits of your API. I’ve used this to write and run my end to end tests, and I’ve written about that at length in this article.

Packaging and Deployment

Finally, your Rust service needs to get to production. I normally use GitHub Actions to run my CI/CD pipelines. My typical pipeline will check that the code compiles (cargo check), check the formatting (cargo fmt --check), run a linter (cargo clippy --all-targets --all-features -- -D warnings), and run unit tests (cargo test) all in parallel. If those all pass, the next stage is packaging the code up as a Docker image and provisioning any nonprod infrastructure (I normally use Terraform for infrastructure as code, and it lives in the same repo as the application code). From there, the image can be deployed to nonprod and then tested with those end to end tests we talked about in the previous section. If the tests pass, then we know it’s safe to provision infrastructure in production, and then deploy our image to production as well.

I’m not a Docker expert, so I won’t give much advice on how to set up your Dockerfile, but I did find this article to be extremely useful. I’ll usually write one Dockerfile containing multiple targets for each of my independently deployable components (i.e. my REST API and my background worker). This lets me keep my images slightly smaller, plus it saves me from having to rebuild the worker image if only the REST API changed, or vice versa.

Conclusion

Since 2021, I’ve written two Rust services from scratch, migrated two services from Ruby, written three serverless services, and built four shared libraries in Rust. The ecosystem has matured quite a lot during that time. Now is an excellent time to give Rust a try if you’ve been curious! Remember to be intentional about your architecture since Rust isn’t all that opinionated, and do some research before settling on a crate. Some crates, like tokio, are ubiquitous and are easy choices. Others, like SQLx are newer and might not be as battle-tested yet. On the bright side, Rust’s culture of small, single-purpose crates gives you the ability to compose your applications of whatever set of crates you like best, and you can always swap pieces out later if you’re careful about your architecture.

As a final note, let me say that Rust has made my life as an engineer much better. I never have to worry about language-level bugs, and rarely need to worry about my latency or error rates for my production services. Rust is very consistent, as opposed to Ruby or JVM languages I’ve used at work, which both have very long latency tails (i.e. the P95 and P99 latencies are dramatically worse than the P50, whereas the P50 and P99 are quite similar in Rust). This means I can write code more confidently, deploy more frequently, and get paged much less often. Anecdotally, I can say the Rust services on my team are some of the most performant and also the most stable at the company (and yes, I did trawl through Datadog to verify that). This is not an accident: Rust is built to eliminate failure modes. Again, I think it’s worth a try if you’ve been curious. Feel free to use my experience as a jumping off point for your own services! Happy coding!

Thanks for reading! If you want to get started with Rust, I'd recommend reading the online Rust book, or if you prefer an actual book, the O'Reilly Rust book is a good choice. If you want to learn more about concurrency in Rust, I'd also recommend Hands-On Concurrency with Rust. Best of luck with your own services!