Automate document conversion in Rust with unoserver

Automating document conversion can remove tedious manual steps from your workflow—especially when
you need to convert dozens or even thousands of files. In this DevTip, you'll learn how to pair Rust
with unoserver
, the modern successor to the widely-used unoconv
utility, to convert documents
quickly and reliably using LibreOffice under the hood.
Install unoserver and Rust dependencies
unoserver
acts as a headless LibreOffice bridge. It relies on Python, LibreOffice, and a small
helper script. On Debian/Ubuntu systems, you can install these dependencies with:
sudo apt-get update
sudo apt-get install -y libreoffice python3-full pipx
# Install unoserver (isolated via pipx, recommended)
pipx install unoserver
# If ~/.local/bin is not on your path, add it (e.g., in .bashrc or .zshrc)
# Export path="$path:$home/.local/bin"
After installation, verify unoserver
is working:
unoserver --version # prints version if setup is correct
Next, create a new Rust project and add the office-to-pdf
crate, which provides a convenient async
API for unoserver
:
cargo new doc_converter_rust
cd doc_converter_rust
Add the following to your Cargo.toml
:
# Cargo.toml
[dependencies]
office-to-pdf = "0.9"
tokio = { version = "1.35", features = ["full"] }
anyhow = "1.0" # For easy error handling
futures = "0.3" # For stream processing in batch example
Integrate Rust with unoserver
Below is a minimal end-to-end example that converts a single .docx
file to PDF. The
office-to-pdf
crate can automatically start unoserver
if it's not already running.
use office_to_pdf::{start_unoserver, ConvertServer, ConvertServerHost};
use std::fs;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// 1️⃣ Connect to—or spin up—a unoserver instance on port 2003
let server = ConvertServer::new(ConvertServerHost::Local { port: 2003 });
if !server.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
// Interface port (e.g., 2002) must differ from main port (e.g., 2003)
println!("Starting unoserver...");
start_unoserver(2003, 2002).await?;
}
// 2️⃣ Create a dummy DOCX file for testing if it doesn't exist
// In a real scenario, this would be your input file.
if fs::metadata("example.docx").is_err() {
fs::write("example.docx", "This is a test DOCX file.")?;
}
let doc_bytes = fs::read("example.docx")?;
// 3️⃣ Convert to PDF
println!("Converting example.docx to PDF...");
let pdf_bytes = server.convert_to_pdf(&doc_bytes).await?;
// 4️⃣ Persist the result
fs::write("example.pdf", pdf_bytes)?;
println!("✅ Conversion successful → example.pdf");
Ok(())
}
Batch-convert whole folders
Need to convert many files? You can walk an input directory, filter for supported document extensions, and process files concurrently using Tokio tasks. Keeping the snippet concise (⩽ 20 lines for the core logic) improves readability while showing the full logic:
use office_to_pdf::{start_unoserver, ConvertServer, ConvertServerHost};
use futures::stream::{FuturesUnordered, StreamExt};
use std::{fs, path::PathBuf, ffi::OsStr};
async fn batch_convert_directory(dir: &str) -> anyhow::Result<()> {
let server = ConvertServer::new(ConvertServerHost::Local { port: 2003 });
if !server.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
start_unoserver(2003, 2002).await?;
}
let mut tasks = FuturesUnordered::new();
for entry in fs::read_dir(dir)?.filter_map(Result::ok) {
let path = entry.path();
if path.is_file() {
if matches!(path.extension().and_then(OsStr::to_str).map(|s| s.to_lowercase().as_str()),
Some("doc" | "docx" | "odt" | "rtf" | "ppt" | "pptx" | "xls" | "xlsx")) {
let server_clone = server.clone();
tasks.push(tokio::spawn(async move {
let input_bytes = fs::read(&path)?;
let output_bytes = server_clone.convert_to_pdf(&input_bytes).await?;
let output_path = path.with_extension("pdf");
fs::write(&output_path, output_bytes)?;
println!("Converted: {} -> {}", path.display(), output_path.display());
anyhow::Ok(())
}));
}
}
}
while let Some(result) = tasks.next().await {
if let Err(e) = result? { eprintln!("A conversion task failed: {}", e); }
}
Ok(())
}
Handle errors gracefully
Robust applications require proper error handling. Consider these scenarios:
- Server start failures: The
ensure_server
function below demonstrates retrying with back-off. - Unsupported file types: Validate file extensions or use MIME types before attempting conversion, as shown in the batch example.
- Timeouts: For long-running conversions, wrap the call in
tokio::time::timeout
. - Resource leaks:
ConvertServer
handles are lightweight. Dropping them is usually sufficient. Ensureunoserver
processes terminate correctly if manually managed.
Here's an example of ensuring the server is running with retries:
use office_to_pdf::{start_unoserver, ConvertServer, ConvertServerHost};
use tokio::time::{sleep, Duration};
async fn ensure_server_running(port: u16, interface_port: u16) -> anyhow::Result<ConvertServer> {
let server = ConvertServer::new(ConvertServerHost::Local { port });
for attempt in 0..3 {
if server.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
return Ok(server);
}
println!("Attempt {} to start unoserver on port {}...", attempt + 1, port);
if let Err(e) = start_unoserver(port, interface_port).await {
eprintln!("Failed to start unoserver: {}; retrying in 2s...", e);
sleep(Duration::from_secs(2)).await;
} else {
// Give unoserver a moment to initialize fully after starting
sleep(Duration::from_secs(1)).await;
if server.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
return Ok(server);
}
}
}
anyhow::bail!("Unoserver refused to start after 3 attempts on port {}", port);
}
And an example of using tokio::time::timeout
:
// Inside an async function where `server` and `doc_bytes` are available
// let server: ConvertServer = ...;
// let doc_bytes: Vec<u8> = ...;
use tokio::time::{timeout, Duration};
let conversion_timeout = Duration::from_secs(30);
match timeout(conversion_timeout, server.convert_to_pdf(&doc_bytes)).await {
Ok(Ok(pdf_bytes)) => {
println!("Conversion successful within timeout.");
// fs::write("output.pdf", pdf_bytes)?;
}
Ok(Err(conversion_err)) => {
eprintln!("Conversion failed: {}", conversion_err);
}
Err(_timeout_err) => {
eprintln!("Conversion timed out after {} seconds.", conversion_timeout.as_secs());
}
}
Scale with multiple unoserver instances
The office-to-pdf
crate includes ConvertLoadBalancer
for distributing work across multiple
unoserver
instances. Launch several unoserver
processes on different ports and provide them to
the load balancer.
use office_to_pdf::{ConvertLoadBalancer, ConvertServer, ConvertServerHost, start_unoserver};
use tokio::time::{sleep, Duration};
async fn build_load_balancer() -> anyhow::Result<ConvertLoadBalancer> {
let mut servers = Vec::new();
let base_port = 2003;
let base_interface_port = 2002; // Must be different from base_port
for i in 0..3 { // Start three instances
let port = base_port + i;
let iface_port = base_interface_port + i * 10; // Ensure interface ports are unique and different
let server = ConvertServer::new(ConvertServerHost::Local { port });
if !server.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
println!("Starting unoserver instance on port {} (interface {})...", port, iface_port);
start_unoserver(port, iface_port).await?;
sleep(Duration::from_millis(500)).await; // Brief pause for server to init
}
servers.push(server);
}
Ok(ConvertLoadBalancer::new(servers))
}
Supported formats at a glance
unoserver
leverages LibreOffice for conversions, so it supports a wide array of formats comparable
to the LibreOffice desktop suite.
Category | Input formats | Common outputs |
---|---|---|
Word processing | DOCX, DOC, ODT, RTF, TXT, WPD, PAGES | PDF, HTML, TXT |
Spreadsheets | XLSX, XLS, ODS, CSV, NUMBERS | PDF, CSV |
Presentations | PPTX, PPT, ODP, KEY | PDF, HTML (as images) |
Graphics | ODG, SVG, various image formats (via Draw) | PDF, PNG, JPG |
Real-world example: convert uploads in a web API
For a production-grade service, you'd typically spin up one or more ConvertServer
instances
(perhaps managed by the ConvertLoadBalancer
) at application launch and share it across request
handlers. Here’s a trimmed Axum web server route that accepts multipart/form-data
uploads and
returns the PDF bytes directly.
use axum::{
extract::{Multipart, State, DefaultBodyLimit},
http::StatusCode,
response::{IntoResponse, Response},
routing::post,
Router,
};
use office_to_pdf::{start_unoserver, ConvertServer, ConvertServerHost};
use std::sync::Arc;
use tokio::sync::Mutex; // Mutex for single ConvertServer, or use ConvertLoadBalancer
struct AppState { server: Mutex<ConvertServer> }
// Initialize a single server instance for this simple example
async fn init_server_for_state() -> ConvertServer {
let srv = ConvertServer::new(ConvertServerHost::Local { port: 2003 });
if !srv.is_running(ConvertServer::DEFAULT_RUNNING_TIMEOUT).await {
println!("Starting unoserver for Axum app...");
// Fire-and-forget startup in a background task
tokio::spawn(async move {
if let Err(e) = start_unoserver(2003, 2002).await {
eprintln!("Background unoserver startup failed: {}", e);
}
});
// Give it a moment to start. In production, use a more robust check.
tokio::time::sleep(tokio::time::Duration::from_secs(2)).await;
}
srv
}
async fn upload_and_convert(State(state): State<Arc<AppState>>, mut multipart: Multipart) -> Response {
while let Some(field) = match multipart.next_field().await {
Ok(field_option) => field_option,
Err(e) => return (StatusCode::BAD_REQUEST, format!("Error reading multipart field: {}", e)).into_response(),
} {
if field.name() == Some("document") {
let data = match field.bytes().await {
Ok(bytes) => bytes,
Err(e) => return (StatusCode::PAYLOAD_TOO_LARGE, format!("Error reading document data: {}", e)).into_response(),
};
let server_guard = state.server.lock().await;
return match server_guard.convert_to_pdf(&data).await {
Ok(pdf_bytes) => ([(axum::http::header::CONTENT_TYPE, "application/pdf")], pdf_bytes).into_response(),
Err(err) => (StatusCode::INTERNAL_SERVER_ERROR, format!("Conversion failed: {}", err.to_string())).into_response(),
};
}
}
(StatusCode::BAD_REQUEST, "No 'document' field found in the upload".to_string()).into_response()
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let initial_server = init_server_for_state().await;
let shared_state = Arc::new(AppState { server: Mutex::new(initial_server) });
let app = Router::new()
.route("/upload", post(upload_and_convert))
.with_state(shared_state)
.layer(DefaultBodyLimit::max(10 * 1024 * 1024)); // 10MB limit
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
println!("Listening on {}", listener.local_addr()?);
axum::serve(listener, app.into_make_service()).await?;
Ok(())
}
Wrap-up
With unoserver
, LibreOffice, and the office-to-pdf
Rust crate, you can implement robust document
conversion capabilities in your Rust applications with relatively few lines of asynchronous code.
This setup is lightweight, scales with load balancing, and integrates well into various application
types, from CLI tools and background job processors to web servers handling on-the-fly conversions.
Need conversion without maintaining infrastructure? Transloadit's 🤖 /document/convert Robot offers fully managed document processing in the cloud. It supports conversion between numerous formats including PDF, DOCX, ODT, and more, handling all the complexity at scale—no LibreOffice install required.