Stream Tar archives between servers without local storage

Transferring files between servers can be challenging, especially when local storage is limited. Fortunately, you can bypass local storage entirely by streaming tar archives directly between servers using SSH tunnels. This method is particularly useful for server migrations, backups, and managing large datasets.
Understanding Tar streaming basics
The tar
command is versatile and can stream data directly through pipes. When creating a tar
archive to be extracted elsewhere, it's crucial to change the directory to the source path first, or
use the -C
option for tar cf
. Otherwise, the archive will contain the full path from the root of
the source server, which is usually not desired on the destination server. The basic syntax for
creating and extracting archives using pipes correctly is:
tar cf - -C /path/to/source . | tar xf - -C /path/to/destination
Here, tar cf - -C /path/to/source .
creates an archive of the contents of /path/to/source
(with
paths relative to this directory using .
) and streams it to stdout. Then,
tar xf - -C /path/to/destination
reads from stdin and extracts the archive into
/path/to/destination
. This ensures that the files are extracted into the intended directory
structure.
Direct server-to-server transfers using ssh
To transfer files directly between two remote servers, combine tar
with SSH:
ssh user@source-server "tar cf - -C /path/to/source ." | ssh user@destination-server "tar xf - -C /path/to/destination"
This command streams the archive directly from the source server to the destination server without intermediate storage.
Advanced techniques: ssh proxyjump
If your servers are behind a bastion host, use SSH's ProxyJump
feature:
ssh -J user@bastion-host user@source-server "tar cf - -C /path/to/source ." | ssh -J user@bastion-host user@destination-server "tar xf - -C /path/to/destination"
This securely tunnels your tar stream through the bastion host.
Monitoring progress with pipe viewer (pv)
For large transfers, monitoring progress is helpful. Use pv
to visualize the transfer:
ssh user@source-server "tar cf - -C /path/to/source ." | pv | ssh user@destination-server "tar xf - -C /path/to/destination"
pv
provides real-time feedback on transfer speed and progress.
Encrypting streams on-the-fly
To secure your data during transfer, encrypt the stream. For production environments, using key
files or prompting for a password (by omitting -pass
) is generally better than hardcoding
passwords. When using openssl enc
for symmetric encryption, consider using a key derivation
function like PBKDF2, enabled by the -pbkdf2
option (available in modern OpenSSL versions) for
enhanced security:
# Consider prompting for password or using key files for better security
ssh user@source-server "tar cf - -C /path/to/source . | openssl enc -aes-256-cbc -pbkdf2 -salt -pass pass:yourpassword" | ssh user@destination-server "openssl enc -d -aes-256-cbc -pbkdf2 -pass pass:yourpassword | tar xf - -C /path/to/destination"
Alternatively, use gpg
for encryption. You can use any valid GPG key identifier (such as a key ID,
fingerprint, or an email associated with the key) for the --recipient
. The recipient must have the
corresponding private key to decrypt the stream.
ssh user@source-server "tar cf - -C /path/to/source . | gpg --encrypt --recipient your-key-identifier" | ssh user@destination-server "gpg --decrypt | tar xf - -C /path/to/destination"
Real-world example: migrating a web application
Imagine migrating a web application between cloud providers. You can stream the entire application directory directly:
ssh user@old-server "tar cf - -C /var/www/html ." | pv | ssh user@new-server "tar xf - -C /var/www/html"
This approach minimizes downtime and ensures a smooth transition.
Performance considerations and optimization tips
- Compression: Use compression flags with
tar
to reduce bandwidth usage. For example,z
for gzip,j
for bzip2, orJ
for xz. The command below uses gzip (z
):
ssh user@source-server "tar czf - -C /path/to/source ." | ssh user@destination-server "tar xzf - -C /path/to/destination"
To use bzip2, you would use cjf
on the source and xjf
on the destination. For xz, use cJf
and
xJf
respectively. Choose the algorithm based on the desired balance between compression ratio and
CPU usage (gzip is fast, xz offers higher compression).
- Network Speed: Ensure your network connection is stable and fast enough to handle large transfers.
- Resource Management: Monitor CPU and memory usage during transfers to avoid overloading your servers.
Troubleshooting common issues
- Permission Errors: Ensure the user has appropriate read permissions on all source files and directories, and write permissions on the destination directory.
- Network Interruptions: Use tools like
screen
ortmux
to maintain sessions during network interruptions. This allows the command to continue running in the background and lets you reattach to the session even if your local connection drops. - Corrupted Archives: Verify integrity by checking file sizes. For more robust checks, compare
checksums (e.g., using
md5sum
orsha256sum
) of a few key files on both the source and destination after the transfer.
Conclusion
Streaming tar archives directly between servers is a powerful technique for efficient file
transfers, especially when local storage is limited. At Transloadit, our
🤖 /file/compress Robot provides robust server-side file
archiving into formats like tar
(optionally gzipped) and zip
, which can be part of a larger
workflow for managing and transferring files efficiently.