Creating large tar archives in PHP can be tricky once your data set grows beyond a few hundred megabytes. While PHP's PharData class is stream-oriented for many operations, the overall process still operates within PHP’s configured memory_limit. When archiving a vast number of files or very large individual files, you might encounter the "Allowed memory size exhausted" error, especially with default limits like 128M. Fortunately, we can call the system-level tar utility with proc_open() and let the operating system handle the heavy lifting. The result is a constant-memory stream that you can pipe straight to the browser, object storage, or another process, enabling efficient PHP tar streaming.

Understand PHP memory limits with phardata

While PharData is efficient for many operations, it may face memory constraints with very large archives or when performing bulk operations. The main limitation comes from PHP's memory_limit setting rather than PharData itself. For instance, buffering, metadata handling, and opcode memory can collectively contribute to exceeding this limit when dealing with thousands of files or exceptionally large ones.

If you only need a write-once, read-never archive, there is often no significant benefit in keeping the entire operation within the PHP process. Offloading the work to the system's tar command frees you from PHP’s memory limitations and provides streaming capabilities inherently. This makes it an excellent solution for memory efficient tar creation.

Stream Tar output with proc_open()

proc_open() starts an external command and exposes its standard input, output, and error streams as PHP resources. The following helper function reads from the tar command's stdout and pushes the bytes directly to the client, ensuring memory usage remains constant, which is ideal for a PHP tar download.

<?php
function streamTarArchive(string $directory, string $downloadName = 'archive.tar'): void
{
    // Sanitize the download name to prevent header injection or other issues
    $safeDownloadName = basename(str_replace(['/', '\'], '', $downloadName));
    if (empty($safeDownloadName)) {
        $safeDownloadName = 'archive.tar';
    }

    $cmd = 'tar -cf - ' . escapeshellarg($directory);
    $descriptorSpec = [
        0 => ["pipe", "r"], // stdin
        1 => ["pipe", "w"], // stdout
        2 => ["pipe", "w"]  // stderr
    ];
    $pipes = [];
    $process = null;

    try {
        $process = proc_open($cmd, $descriptorSpec, $pipes);

        if (!is_resource($process)) {
            throw new RuntimeException('Failed to create tar process. Is tar installed and in PATH?');
        }

        header('Content-Type: application/x-tar');
        header('Content-Disposition: attachment; filename="' . $safeDownloadName . '"');
        header('X-Content-Type-Options: nosniff'); // Security: prevent MIME-sniffing

        // Disable output buffering if active
        if (ob_get_level()) {
            ob_end_clean();
        }

        // Stream the tar output
        while (!feof($pipes[1])) {
            $chunk = fread($pipes[1], 8192); // Read in 8KB chunks
            if ($chunk === false || $chunk === '') {
                // Check for read errors or empty chunks that are not EOF
                if (feof($pipes[1])) break; // Legitimate EOF
                // Potentially log an error here if fread returns false unexpectedly
                continue;
            }
            echo $chunk;
            flush(); // Flush output to the client
        }

        // Check for errors from tar
        $stderrOutput = stream_get_contents($pipes[2]);
        if (!empty($stderrOutput)) {
            // Log tar errors but don't send them to the client if headers are already sent
            error_log('tar process stderr: ' . $stderrOutput);
        }

    } catch (Throwable $e) {
        // Log the exception
        error_log('Error during tar streaming: ' . $e->getMessage());
        // If headers haven't been sent, we can send an error response
        if (!headers_sent()) {
            // Potentially clear any headers already set if appropriate
            header_remove();
            http_response_code(500);
            echo 'An error occurred while generating the archive.';
        }
        // Otherwise, the stream might be incomplete, which is hard to recover from at this point.
    } finally {
        // Ensure all pipes are closed
        if (isset($pipes[0]) && is_resource($pipes[0])) fclose($pipes[0]);
        if (isset($pipes[1]) && is_resource($pipes[1])) fclose($pipes[1]);
        if (isset($pipes[2]) && is_resource($pipes[2])) fclose($pipes[2]);

        // Close the process
        if (is_resource($process)) {
            proc_close($process);
        }
    }
}

This function returns a fully valid tar stream without ever touching the drive for intermediate storage. On a modern VPS, this approach happily pushes multi-gigabyte archives while PHP memory usage stays remarkably low, often below 10 MB.

Archive ci/cd logs in real time

Continuous integration (CI/CD) servers often generate numerous small log files—perfect candidates for real-time archiving PHP streaming. The snippet below validates input, helps prevent directory traversal, and then calls the streamTarArchive helper. This is a common use case for PHP CI/CD logs.

<?php
function downloadCiLogs(string $logDirIdentifier): void
{
    // Example: map an identifier to an actual path
    // In a real app, this might come from a config or database
    $logPathMappings = [
        'project-alpha-build-123' => '/var/logs/ci-cd/project-alpha/build-123',
        'project-beta-deploy-45' => '/var/logs/ci-cd/project-beta/deploy-45',
    ];

    if (!isset($logPathMappings[$logDirIdentifier])) {
        throw new InvalidArgumentException('Invalid log directory identifier.');
    }

    $actualLogDir = $logPathMappings[$logDirIdentifier];

    // Use realpath to resolve symbolic links and '..'
    $realDir = realpath($actualLogDir);
    $allowedRoot = realpath('/var/logs/ci-cd'); // Ensure this base path is secure

    if ($realDir === false || !is_dir($realDir)) {
        throw new InvalidArgumentException('Directory does not exist or is not accessible.');
    }
    // Check if the resolved path starts with the allowed root directory
    if ($allowedRoot === false || strpos($realDir, $allowedRoot) !== 0) {
        throw new RuntimeException('Access denied to the specified directory.');
    }

    streamTarArchive($realDir, 'ci-logs-' . $logDirIdentifier . '-' . date('Y-m-d') . '.tar');
}

// Example usage in a controller action:
try {
    // Ensure $_GET['path'] is validated/sanitized before use if it's a direct path.
    // Better to use an identifier as shown in downloadCiLogs.
    $logIdentifier = $_GET['log_id'] ?? '';
    if (empty($logIdentifier) || !preg_match('/^[a-zA-Z0-9_-]+$/', $logIdentifier)) {
        throw new InvalidArgumentException('Invalid log identifier format.');
    }
    downloadCiLogs($logIdentifier);
} catch (InvalidArgumentException $e) {
    http_response_code(400); // Bad Request
    echo 'Error: ' . htmlspecialchars($e->getMessage(), ENT_QUOTES, 'UTF-8');
} catch (RuntimeException $e) {
    http_response_code(403); // Forbidden or 500 Internal Server Error depending on context
    echo 'Error: ' . htmlspecialchars($e->getMessage(), ENT_QUOTES, 'UTF-8');
} catch (Throwable $e) {
    http_response_code(500); // Internal Server Error
    error_log('Unhandled error in CI log download: ' . $e->getMessage()); // Log for admin
    echo 'An unexpected error occurred. Please try again later.';
}

Report progress with server-sent events

When archiving thousands of files, providing progress feedback can greatly improve user experience. Since tar -v (verbose mode) prints each filename to stderr as it's processed, we can stream these lines as Server-Sent Events (SSE).

<?php
function sseTarProgress(string $directory): void
{
    // Validate and secure the directory path as in previous examples
    $realDir = realpath($directory);
    $allowedRoot = realpath('/var/logs/ci-cd'); // Example allowed root

    if ($realDir === false || !is_dir($realDir)) {
        throw new InvalidArgumentException('Directory does not exist or is not accessible.');
    }
    if ($allowedRoot === false || strpos($realDir, $allowedRoot) !== 0) {
        throw new RuntimeException('Access denied to the specified directory for progress reporting.');
    }

    header('Content-Type: text/event-stream');
    header('Cache-Control: no-cache');
    header('Connection: keep-alive'); // Important for SSE

    // Use -v for verbose output (filenames to stderr)
    // Redirect tar's stdout to /dev/null as we only care about stderr for progress
    // and don't want the actual tar archive data blocking the stderr pipe.
    $cmd = 'tar -cvf - ' . escapeshellarg($realDir) . ' > /dev/null';

    $descriptorSpec = [
        0 => ["pipe", "r"], // stdin - not used by tar -c
        1 => ["pipe", "w"], // stdout - redirected to /dev/null in $cmd
        2 => ["pipe", "w"]  // stderr - where tar -v outputs filenames
    ];
    $pipes = [];
    $process = null;

    try {
        $process = proc_open($cmd, $descriptorSpec, $pipes);

        if (!is_resource($process)) {
            throw new RuntimeException('tar process failed to start for progress reporting.');
        }

        // Disable output buffering for SSE
        if (ob_get_level()) {
            ob_end_clean();
        }

        // Ensure PHP doesn't close the connection prematurely
        ignore_user_abort(true);
        set_time_limit(0); // Allow script to run indefinitely for long tar processes

        // Read from stderr for progress
        while (is_resource($pipes[2]) && !feof($pipes[2])) {
            $line = fgets($pipes[2]); // Read verbose output line by line
            if ($line === false || trim($line) === '') {
                // Check for keep-alive or if process ended
                if (!is_resource($process) || proc_get_status($process)['running'] === false) {
                    break;
                }
                // Send a comment to keep the connection alive if no output from tar
                echo ": keepalive

";
                flush();
                usleep(250000); // Sleep for 250ms
                continue;
            }
            echo 'data: ' . json_encode(['file' => trim($line)]) . "

";
            flush(); // Flush data to the client
        }

        $stderrRemaining = stream_get_contents($pipes[2]); // Get any remaining stderr
        if (!empty(trim($stderrRemaining))) {
             // Could be an error message if tar failed after last file
            error_log("sseTarProgress remaining stderr: " . trim($stderrRemaining));
        }

    } catch (Throwable $e) {
        error_log('Error in sseTarProgress: ' . $e->getMessage());
        echo "event: error
";
        echo 'data: ' . json_encode(['message' => 'An error occurred during progress reporting.']) . "

";
        flush();
    } finally {
        if (isset($pipes[0]) && is_resource($pipes[0])) fclose($pipes[0]);
        if (isset($pipes[1]) && is_resource($pipes[1])) fclose($pipes[1]);
        if (isset($pipes[2]) && is_resource($pipes[2])) fclose($pipes[2]);
        if (is_resource($process)) {
            proc_close($process);
        }
        // Final event to signal completion or error to the client
        echo "event: complete
";
        echo 'data: {"done":true}

';
        flush();
    }
}

The client consumes this stream via the JavaScript EventSource API and can display a live list of processed files or update a progress bar.

Harden proc_open() calls

Calling shell utilities from PHP always invites command-injection bugs if not handled carefully. Keep these rules in mind:

  • Validate and Sanitize Inputs: Always validate any user-provided data. For file paths, resolve them using realpath() to get the canonicalized absolute pathname and to check existence.
  • Allowlist Directories: Maintain a strict allowlist of directories from which operations are permitted. Reject any paths not conforming to this list.
  • Escape Shell Arguments: Crucially, escape every argument passed to shell commands using escapeshellarg() or escapeshellcmd() as appropriate. escapeshellarg() is generally preferred for individual arguments. Never concatenate raw input directly into a shell command string.
  • Principle of Least Privilege: Run the PHP (and web server) process with the minimum necessary privileges. Avoid running as root. The user should only have read access to the directories being archived and execute permission for the tar utility.
  • Monitor stderr and Exit Codes: Always check stderr for error messages from the external command and inspect its exit code (via proc_close() or proc_get_status()) to detect failures early. Log these errors for monitoring and debugging.
  • Error Handling: Implement robust error handling around proc_open() calls to manage scenarios like the command not being found, failing to start, or exiting with an error.

The SecureTarStreamer class example demonstrates a good approach by encapsulating path validation and allowlist checks:

<?php
class SecureTarStreamer
{
    private array $allowedRoots;

    public function __construct(array $allowedRoots)
    {
        // Ensure allowedRoots are absolute and valid paths during construction
        $this->allowedRoots = array_map(function($root) {
            $realRoot = realpath($root);
            if ($realRoot === false || !is_dir($realRoot)) {
                throw new InvalidArgumentException("Invalid allowed root directory: {$root}");
            }
            return $realRoot;
        }, $allowedRoots);
    }

    public function send(string $userSuppliedDir, string $downloadName = 'archive.tar'): void
    {
        $path = realpath($userSuppliedDir); // Resolve the user-supplied path

        if ($path === false || !is_dir($path)) {
            throw new InvalidArgumentException('Invalid or non-existent directory specified.');
        }

        $isAllowed = false;
        foreach ($this->allowedRoots as $root) {
            // Check if the resolved path starts with one of the allowed root directories
            if (str_starts_with($path, $root)) {
                $isAllowed = true;
                break;
            }
        }

        if (!$isAllowed) {
            throw new RuntimeException('Access to the specified directory is not allowed.');
        }

        // Now it's safer to call the streaming function
        streamTarArchive($path, $downloadName);
    }
}

// Example Usage:
// $streamer = new SecureTarStreamer(['/var/www/safe_uploads', '/mnt/user_data']);
// $streamer->send($_GET['directory_to_archive']);

Compare approaches

Performance characteristics vary by use case:

  • PharData: Optimal for archive manipulation (reading/writing individual files within an archive) when memory limits are not a concern or for smaller archives.
  • proc_open + tar streaming: Best for creating large archives with minimal PHP memory usage (PHP archive streaming), especially for write-once scenarios like backups or log archiving. This is a highly memory efficient tar method.
  • Transloadit Robot: Ideal for scalable, production use where you want to offload the entire process. It offers automatic optimization, handles various formats (tar, zip, etc.), and integrates with cloud storage.

Choose the method that best matches your workload: random file access, on-premises streaming, or fully managed cloud compression.

What about interrupted downloads?

The tar format is sequential, so true byte-range resumes are not natively supported in a simple HTTP download without third-party tools or server-side logic that can index the stream. In practice, clients might not always gracefully handle interruptions of multi-gigabyte downloads. If resuming is critical for your application:

  1. Split Data: Consider splitting the data into multiple smaller tar files. Clients can then download these individually, and a failure only affects one part.
  2. Resumable Transport Protocols: Use a resumable transport mechanism. After the tar archive is produced (even if streamed to a temporary local file first, or directly piped), you could then serve it or upload it using protocols like tus.io or leverage features like S3 multipart uploads if the destination is cloud storage.
  3. Let Transloadit Handle It: Transloadit's platform is designed for robust file processing and delivery, inherently managing many complexities of large file transfers.

Wrap-up

Using proc_open() in combination with the system's tar utility is a simple yet powerful pattern for creating large archives in PHP with a near-zero memory footprint within the PHP process itself. This technique is particularly effective for tasks like archiving PHP CI/CD logs, creating nightly backups, or handling any large dataset that needs to be written once and streamed efficiently. It's a great way to achieve tar without memory limits in PHP.

Need something even more hands-off? Transloadit’s 🤖 /file/compress Robot can create .tar (optionally gzipped) archives for you. A minimal Assembly looks like this:

{
  "steps": {
    "compressed": {
      "robot": "/file/compress",
      "use": ":original",
      "format": "tar",
      "gzip": true
    }
  }
}

The Robot supports both tar and zip formats, with optional gzip compression. Give it a try and let our infrastructure worry about memory limits, concurrency, and edge-case handling while you focus on building features.