Downloading large files from web servers can be challenging, especially when dealing with network interruptions or slow connections. In this DevTip, we'll explore how to build a robust file downloader in Go that supports resumable downloads, concurrent chunk fetching, and real-time progress tracking.

Why build a custom file downloader?

Standard file download methods often lack resilience against network interruptions and don't efficiently utilize available bandwidth. By building a custom downloader, you can:

  • Resume interrupted downloads without starting over
  • Download files concurrently in chunks to maximize bandwidth
  • Provide real-time progress updates to users

Understanding HTTP range requests and partial content

HTTP range requests allow clients to request specific byte ranges from a server, enabling partial content downloads. This is essential for implementing resumable downloads and concurrent chunk fetching.

Here's how a typical HTTP range request looks:

GET /largefile.zip HTTP/1.1
Host: example.com
Range: bytes=0-1023

The server responds with the requested byte range, indicated by the 206 Partial Content status code.

Basic file download implementation in Go

Let's start with a simple file download example using Go's built-in HTTP client. Ensure you have fmt, io, net/http, os, sync, and time imported for the following examples. For the progress bar, you'll need github.com/schollz/progressbar/v3.

package main

import (
	"io"
	"net/http"
	"os"
	"fmt" // Added for later examples
	"sync" // Added for later examples
	"time" // Added for later examples
	// "github.com/schollz/progressbar/v3" // For progress bar example
)

func main() {
	resp, err := http.Get("https://example.com/largefile.zip")
	if err != nil {
		fmt.Printf("Error making GET request: %v\n", err)
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		fmt.Printf("Server returned non-200 status: %s\n", resp.Status)
		return
	}

	file, err := os.Create("largefile.zip")
	if err != nil {
		fmt.Printf("Error creating file: %v\n", err)
		return
	}
	defer file.Close()

	_, err = io.Copy(file, resp.Body)
	if err != nil {
		fmt.Printf("Error copying content to file: %v\n", err)
		return
	}
	fmt.Println("Download completed successfully.")
}

Adding support for resumable downloads

To support resumable downloads, we need to:

  1. Check the existing file size.
  2. Request the remaining bytes using HTTP range requests.

Here's how you can implement this. For this sequential append-style resume, we open the file with os.O_APPEND|os.O_CREATE|os.O_WRONLY.

func downloadFileResumable(url, filepath string) error {
	file, err := os.OpenFile(filepath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		return fmt.Errorf("failed to open file: %w", err)
	}
	defer file.Close()

	stat, err := file.Stat()
	if err != nil {
		return fmt.Errorf("failed to get file stats: %w", err)
	}
	currentSize := stat.Size()

	client := &http.Client{Timeout: 30 * time.Second}
	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		return fmt.Errorf("failed to create request: %w", err)
	}

	// Set the Range header to download from where it left off
	req.Header.Set("Range", fmt.Sprintf("bytes=%d-", currentSize))
	resp, err := client.Do(req)
	if err != nil {
		return fmt.Errorf("failed to perform request: %w", err)
	}
	defer resp.Body.Close()

	// Check if server supports partial content and if the range is valid
	if resp.StatusCode != http.StatusPartialContent && resp.StatusCode != http.StatusOK {
		// If it's a 200 OK, it might mean the server doesn't support range requests or the file is being sent from the beginning.
		// If currentSize > 0 and we get 200 OK, this might be problematic.
		// For simplicity, we'll proceed if it's 200 OK and currentSize is 0.
		if !(resp.StatusCode == http.StatusOK && currentSize == 0) {
			return fmt.Errorf("server returned unexpected status: %s. Expected 206 Partial Content or 200 OK for new download", resp.Status)
		}
	}

	// If using O_APPEND, seeking is not strictly necessary as writes go to the end.
	// However, if the file was somehow truncated after stat, this ensures writing at the correct position.
	// For O_WRONLY without O_APPEND, seeking to stat.Size() would be crucial.
	// Since we use O_APPEND, this seek is more of a safeguard or for clarity if one were to switch file modes.
	// _, err = file.Seek(currentSize, io.SeekStart)
	// if err != nil {
	//	 return fmt.Errorf("failed to seek file: %w", err)
	// }

	_, err = io.Copy(file, resp.Body)
	if err != nil {
		return fmt.Errorf("failed to copy content to file: %w", err)
	}
	return nil
}

Implementing concurrent chunk downloading

To speed up downloads, we can split the file into chunks and download them concurrently using goroutines. For concurrent chunk writing, the file should be opened with os.O_RDWR|os.O_CREATE. It's also a good practice to truncate the file to its expected full size (e.g., using file.Truncate(totalSize)) before starting concurrent writes to pre-allocate space and avoid issues. We'll use file.WriteAt which is safe for concurrent use when writing to different, non-overlapping parts of the file.

// Ensure these imports are present:
// import (
// 	"fmt"
// 	"io"
// 	"net/http"
// 	"os"
// 	"sync"
// 	"github.com/schollz/progressbar/v3" // If using progress bar
// )

func downloadChunk(client *http.Client, url string, start, end int64, file *os.File, wg *sync.WaitGroup, bar *progressbar.ProgressBar) {
	defer wg.Done()

	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		fmt.Printf("Error creating request for chunk %d-%d: %v\n", start, end, err)
		return
	}
	req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", start, end))

	resp, err := client.Do(req)
	if err != nil {
		fmt.Printf("Error downloading chunk %d-%d: %v\n", start, end, err)
		// Implement retry logic here if desired, or use the downloadChunkWithRetry example below
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusPartialContent {
		fmt.Printf("Server responded with %s for range %d-%d\n", resp.Status, start, end)
		return
	}

	bodyBytes, err := io.ReadAll(resp.Body)
	if err != nil {
		fmt.Printf("Error reading chunk body for range %d-%d: %v\n", start, end, err)
		return
	}

	_, err = file.WriteAt(bodyBytes, start)
	if err != nil {
		fmt.Printf("Error writing chunk for range %d-%d: %v\n", start, end, err)
		return
	}
	if bar != nil {
		bar.Add(len(bodyBytes))
	}
}

// Example usage of downloadChunk would involve:
// 1. Getting total file size (e.g., via a HEAD request).
// 2. Deciding on chunk size and number of concurrent downloads.
// 3. Opening the file with os.O_RDWR|os.O_CREATE (and os.O_TRUNC if starting fresh).
// 4. Creating the file with the total size (e.g., file.Truncate(totalSize)).
// 5. Launching goroutines for each chunk, passing the HTTP client, file handle, etc.
// 6. Waiting for all goroutines to complete using sync.WaitGroup.

Adding a progress bar with real-time updates

To enhance user experience, add a progress bar using the github.com/schollz/progressbar/v3 package. First, install it: go get -u github.com/schollz/progressbar/v3.

// Import: import "github.com/schollz/progressbar/v3"

// Initialize the bar before starting downloads:
// bar := progressbar.DefaultBytes(totalSize, "downloading")

// Inside your sequential download loop (like in downloadFileResumable):
// _, err = io.Copy(io.MultiWriter(file, bar), resp.Body)

// For concurrent downloads (inside downloadChunk or downloadChunkWithRetry, after successful WriteAt):
// if bar != nil {
// 	 bar.Add(len(bodyBytes))
// }

Error handling and retry logic

Implement retry logic to handle transient network errors. This can be added within downloadChunk or a wrapper function like downloadChunkWithRetry.

func downloadChunkWithRetry(client *http.Client, url string, start, end int64, file *os.File, wg *sync.WaitGroup, bar *progressbar.ProgressBar, maxRetries int) {
	defer wg.Done()

	var lastErr error
	for attempt := 0; attempt < maxRetries; attempt++ {
		if attempt > 0 {
			fmt.Printf("Retrying chunk %d-%d, attempt %d/%d after error: %v\n", start, end, attempt, maxRetries-1, lastErr)
			time.Sleep(time.Duration(attempt*2) * time.Second) // Exponential backoff could be better
		}

		req, err := http.NewRequest("GET", url, nil)
		if err != nil {
			lastErr = fmt.Errorf("creating request for chunk %d-%d failed: %w", start, end, err)
			continue
		}
		req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", start, end))

		resp, err := client.Do(req)
		if err != nil {
			lastErr = fmt.Errorf("downloading chunk %d-%d failed: %w", start, end, err)
			continue
		}

		if resp.StatusCode != http.StatusPartialContent {
			resp.Body.Close() // Important to close body before next attempt
			lastErr = fmt.Errorf("server responded with %s for range %d-%d", resp.Status, start, end)
			continue
		}

		bodyBytes, err := io.ReadAll(resp.Body)
		resp.Body.Close() // Important to close body after reading or on error
		if err != nil {
			lastErr = fmt.Errorf("reading chunk body for range %d-%d failed: %w", start, end, err)
			continue
		}

		_, err = file.WriteAt(bodyBytes, start)
		if err != nil {
			lastErr = fmt.Errorf("writing chunk for range %d-%d failed: %w", start, end, err)
			// Depending on the error, retrying a write might not be useful, or might need specific handling.
			continue
		}

		if bar != nil {
			bar.Add(len(bodyBytes))
		}
		// fmt.Printf("Successfully downloaded and wrote chunk %d-%d\n", start, end)
		return // Success
	}
	fmt.Printf("Failed to download chunk %d-%d after %d attempts. Last error: %v\n", start, end, maxRetries, lastErr)
}

Optimizing performance with connection pooling

Reuse HTTP connections by configuring the HTTP client. This is especially useful for concurrent downloads.

client := &http.Client{
	Transport: &http.Transport{
		MaxIdleConns:        100, // Max total idle connections, can be tuned.
		MaxIdleConnsPerHost: 20,  // Max idle connections per host, tune based on expected concurrency.
		IdleConnTimeout:     90 * time.Second,
	},
	Timeout: 60 * time.Second, // Timeout for each chunk request (overall request timeout).
}

Complete example: building a CLI downloader

Combining all these elements into a complete CLI tool requires more extensive code, including command-line argument parsing, determining total file size (often via a HEAD request), managing chunk division, and coordinating goroutines. While a specific external example is not provided here, the principles discussed form the foundation for building such a tool.

Best practices and common pitfalls

  • Always handle errors gracefully and provide meaningful feedback.
  • Limit the number of concurrent downloads to avoid overwhelming the server or your own network resources.
  • Validate server support for HTTP range requests (check for Accept-Ranges: bytes header in response to a HEAD or initial GET request) before implementing chunked downloads.
  • Ensure proper file handling: open files with appropriate flags (os.O_RDWR|os.O_CREATE for WriteAt, os.O_APPEND|os.O_CREATE|os.O_WRONLY for simple sequential resume) and close them, typically using defer.
  • For concurrent writes with WriteAt, pre-allocate the file to its full size (e.g., using file.Truncate(totalSize) after opening with os.O_CREATE|os.O_RDWR and before starting downloads) to prevent sparse file issues or out-of-order writes causing file corruption on some file systems.

Conclusion

Building a custom file downloader in Go provides resilience, speed, and user-friendly progress tracking. At Transloadit, we leverage similar techniques in our 🤖 /import/http Robot, part of our File Importing service. For more Go integrations, check out our go-sdk.