Boost js file uploads using Web Workers and streams

Uploading files efficiently is crucial for modern web applications. Traditional approaches often read entire files into memory, which can freeze the UI, consume excessive RAM, and lead to a poor user experience, especially with large files. By offloading heavy processing to Web Workers and streaming data in manageable chunks using JavaScript Streams, you can maintain a responsive main thread—even when users upload multi-gigabyte files.
Challenges with traditional file uploads
A common, yet inefficient, approach involves:
- Reading the entire
File
object into memory usingFileReader
. - Constructing a
FormData
object with the file data. - Sending the
FormData
via afetch
orXMLHttpRequest
POST request.
Uploading a single 2 GB video this way can cause significant memory spikes and make the page unresponsive. Attempting multiple large uploads in parallel using this method will likely crash the browser tab due to memory exhaustion.
Meet Web Workers and streams
Web Workers
Web Workers allow you to run JavaScript code in background threads, separate from the main execution
thread that handles the UI. This means computationally intensive tasks like hashing, compression, or
chunking data for uploads won't block rendering, scrolling, or user input, resulting in a smoother
experience and improved file upload performance
.
JavaScript streams
The Streams API enables processing data incrementally as chunks. Instead of loading an entire file
into memory, you can read and process small pieces (typically Uint8Array
s) as they become
available. This drastically reduces memory usage and allows data to be sent over the network almost
immediately after being read, making JavaScript Streams
ideal for large js file upload
tasks.
Architecture overview
A robust parallel processing
upload system using these technologies typically involves:
- Main Thread: Handles UI interactions (like drag-and-drop), file selection, and displaying
progress updates. It passes
File
objects to the worker pool. - Worker Pool: A set of
Web Workers
manages the file processing tasks. Each available worker receives aFile
reference. - Individual Worker: Uses the
Blob.stream()
orFile.stream()
API to read the file chunk by chunk. Each chunk is then POSTed to the back-end upload endpoint. Progress messages (percentage complete) and status updates (completion, errors) are sent back to the main thread. - Back-end: Receives the chunks and reassembles them into the complete file. This often involves protocols like Tus, cloud storage multipart uploads (e.g., S3 Multipart Upload), or custom server-side logic.
Streaming a file without freezing the UI
The following examples demonstrate a minimal but practical pattern for chunked uploads using a worker pool. Note the inclusion of error handling and cleanup mechanisms.
Main thread (main.js
)
This script sets up the worker pool and handles file input events, delegating the processing of each file to the pool.
// Assumes WorkerPool class is defined elsewhere (see below)
const pool = new WorkerPool('upload-worker.js')
const fileInput = document.querySelector('#file-input')
fileInput.addEventListener('change', (evt) => {
const files = Array.from(evt.target.files)
files.forEach((file) => {
console.log(`Queueing ${file.name} for upload...`)
pool.processFile(file, {
onProgress: (pct, msg) => updateProgressUI(file.name, pct, msg),
onComplete: (msg) => showSuccess(file.name, msg),
onError: (err) => showError(file.name, err),
})
})
})
function updateProgressUI(filename, pct, message) {
// Update your progress bar or UI element here
console.log(`${filename}: ${pct.toFixed(1)}% – ${message}`)
}
function showSuccess(filename, message) {
// Update UI to show completion
console.info(`${filename}: ${message}`)
}
function showError(filename, error) {
// Update UI to show error state
console.error(`${filename}: Upload failed - ${error}`)
}
// Example: Clean up the pool when the page unloads
window.addEventListener('unload', () => {
pool.terminate()
})
Worker implementation (upload-worker.js
)
This worker script receives a File
object, reads it as a stream, uploads chunks, and sends
progress/status messages back.
self.onmessage = async (event) => {
const file = event.data
let reader // Declare reader outside the try block for finally access
try {
if (!file || typeof file.stream !== 'function') {
throw new Error('Invalid file object received.')
}
const stream = file.stream()
reader = stream.getReader()
const total = file.size
let uploaded = 0
let chunkIndex = 0
while (true) {
let value, done
try {
// Read the next chunk from the stream
;({ value, done } = await reader.read())
} catch (readError) {
// Handle potential errors during stream reading
throw new Error(`Error reading file stream: ${readError.message}`)
}
if (done) {
// End of stream reached
break
}
try {
// Upload the current chunk
await uploadChunk(value, file.name, chunkIndex++)
uploaded += value.length
// Calculate and report progress
const pct = total > 0 ? (uploaded / total) * 100 : 100
self.postMessage({
type: 'progress',
progress: pct,
message: `Uploading chunk ${chunkIndex}...`,
})
} catch (uploadError) {
// Handle chunk upload errors
// Consider implementing retry logic here before throwing
console.error(
`Chunk ${chunkIndex - 1} upload failed for ${file.name}: ${uploadError.message}`,
)
throw new Error(`Chunk upload failed: ${uploadError.message}`)
}
}
// Signal completion if all chunks were uploaded successfully
self.postMessage({ type: 'complete', message: 'Upload finished successfully.' })
} catch (err) {
// Catch any errors from the process and report them back
self.postMessage({
type: 'error',
message: err.message || 'An unknown error occurred in the worker.',
})
} finally {
// IMPORTANT: Ensure the stream lock is released regardless of success or failure
if (reader) {
try {
await reader.releaseLock()
} catch (releaseLockError) {
// Log error if releasing the lock fails, but don't block completion/error reporting
console.error('Error releasing stream lock:', releaseLockError)
}
}
// Optional: Close the worker if it's designed for single use
// self.close();
}
}
async function uploadChunk(chunk, filename, index) {
const formData = new FormData()
// Send chunk index for server-side reassembly
formData.append('chunkIndex', index.toString())
// Send the actual chunk data as a Blob
formData.append('fileChunk', new Blob([chunk]), `${filename}.part${index}`)
// Add any other necessary data (e.g., unique upload ID, total chunks)
// formData.append('uploadId', 'unique-upload-identifier');
// formData.append('totalChunks', totalChunks.toString());
// Replace '/upload' with your actual back-end endpoint
const res = await fetch('/upload', {
method: 'POST',
body: formData,
// Consider adding an AbortSignal here for cancellation support
// signal: abortController.signal
})
if (!res.ok) {
// Try to get more details from the server response on failure
let errorText = `Server responded with status ${res.status}`
try {
const serverError = await res.text()
if (serverError) {
errorText += `: ${serverError}`
}
} catch (e) {
/* Ignore errors reading response body */
}
throw new Error(errorText)
}
// Optionally process the server response if needed
// const result = await res.json();
// return result;
}
Why not read the whole file once in the worker?
While reading the entire file within the worker avoids blocking the main thread, it still consumes significant memory within the worker thread itself. Streaming the file chunk-by-chunk inside the worker offers several advantages:
- Lower Memory Footprint: Only small chunks reside in memory at any given time.
- Resumability: If an upload fails mid-way (e.g., network drop), you only need to retry the failed chunks, not the entire file.
- Parallel Chunk Uploads: Advanced implementations could potentially upload multiple chunks concurrently (though this adds complexity in ordering and server-side handling).
Building a tiny worker pool
Using a single worker can still become a bottleneck if you need to process many files
simultaneously. A WorkerPool
distributes tasks among a limited number of workers (ideally matching
CPU cores) and queues pending tasks.
class WorkerPool {
constructor(script, size = navigator.hardwareConcurrency || 4) {
this.workers = []
this.idleWorkers = []
this.taskQueue = []
this.taskCallbacks = new Map() // Map task ID to callbacks
console.log(`Initializing WorkerPool with size ${size}`)
for (let i = 0; i < size; i++) {
const worker = new Worker(script)
worker.id = `worker_${i}`
// Handle messages from the worker
worker.onmessage = (e) => this.handleWorkerMessage(worker, e.data)
// Handle errors occurring within the worker itself
worker.onerror = (e) => this.handleWorkerError(worker, e)
this.workers.push(worker)
this.idleWorkers.push(worker)
}
}
generateTaskId() {
// Simple unique ID generator for tasks
return Date.now().toString(36) + Math.random().toString(36).substring(2)
}
processFile(file, callbacks) {
const taskId = this.generateTaskId()
const task = { id: taskId, file }
this.taskCallbacks.set(taskId, callbacks)
const idleWorker = this.idleWorkers.pop()
if (idleWorker) {
// An idle worker is available, run the task immediately
this.runTask(idleWorker, task)
} else {
// All workers are busy, add task to the queue
this.taskQueue.push(task)
console.log(
`Worker pool busy. Queued task ${taskId} for ${file.name}. Queue size: ${this.taskQueue.length}`,
)
}
}
runTask(worker, task) {
console.log(`Assigning task ${task.id} (${task.file.name}) to ${worker.id}`)
worker.currentTask = task // Associate task metadata with the worker
// Send the file object to the worker to start processing
// For large files, consider if Transferable Objects are applicable/needed
worker.postMessage(task.file)
}
handleWorkerMessage(worker, data) {
const task = worker.currentTask
if (!task) {
console.warn(`Received message from worker ${worker.id} without an assigned task.`)
return
}
const callbacks = this.taskCallbacks.get(task.id)
if (!callbacks) {
console.warn(`Received message for unknown or completed task ${task.id}`)
return // Task might have been cancelled or already completed/failed
}
// Process messages based on their type
switch (data.type) {
case 'progress':
if (callbacks.onProgress) callbacks.onProgress(data.progress, data.message)
break
case 'complete':
if (callbacks.onComplete) callbacks.onComplete(data.message)
this.finishTask(worker, task.id) // Mark task as finished
break
case 'error':
if (callbacks.onError) callbacks.onError(data.message)
this.finishTask(worker, task.id) // Also finish task on error
break
default:
console.warn(`Received unknown message type from worker ${worker.id}:`, data.type)
}
}
handleWorkerError(worker, errorEvent) {
console.error(`Error in worker ${worker.id}:`, errorEvent.message, errorEvent)
const task = worker.currentTask
if (task) {
const callbacks = this.taskCallbacks.get(task.id)
if (callbacks && callbacks.onError) {
// Report the error back via the task's error callback
callbacks.onError(`Worker script error: ${errorEvent.message}`)
}
// Finish the task as it cannot continue
this.finishTask(worker, task.id)
} else {
console.error(
`Unhandled error in idle worker ${worker.id}. Consider worker replacement strategy.`,
)
// If an idle worker errors, you might want to remove it or try respawning
// For simplicity, we just log it here.
}
}
finishTask(worker, taskId) {
const task = worker.currentTask
if (task && task.id === taskId) {
console.log(`Task ${taskId} (${task.file.name}) finished on ${worker.id}`)
worker.currentTask = null // Disassociate task from worker
}
this.taskCallbacks.delete(taskId) // Remove callbacks
// Check the queue for the next task
if (this.taskQueue.length > 0) {
const nextTask = this.taskQueue.shift()
console.log(`Dequeuing task ${nextTask.id}. Queue size: ${this.taskQueue.length}`)
this.runTask(worker, nextTask) // Assign the next task to this now-free worker
} else {
this.idleWorkers.push(worker) // No pending tasks, return worker to the idle pool
console.log(`Worker ${worker.id} is now idle. Idle workers: ${this.idleWorkers.length}`)
}
}
terminate() {
console.log('Terminating worker pool...')
this.workers.forEach((worker) => {
console.log(`Terminating worker ${worker.id}`)
worker.terminate()
})
// Clear internal state
this.workers = []
this.idleWorkers = []
this.taskQueue = []
this.taskCallbacks.clear()
}
}
Browser compatibility
Web APIs evolve, so always verify browser support for the features you rely on.
Feature | Chrome | Firefox | Safari | Edge | Notes |
---|---|---|---|---|---|
Web Workers | 4+ | 3.5+ | 4+ | 12+ | Widely supported. |
File.stream() |
76+ | 69+ | 15.2+ | 79+ | The core API for reading file contents as a stream. |
Blob.stream() |
76+ | 69+ | 14.1+ | 79+ | Similar to File.stream() . |
Transferable Streams | 77+ | 111+* | ✗ | 79+ | Allows transferring stream ownership between threads efficiently. |
*
Transferable streams might require enabling flags in some Firefox versions or might have
limitations. Check current support on MDN or Can I Use.
For browsers lacking support for File.stream()
, you might need to fall back to a
FileReader
-based approach (potentially within the worker to avoid blocking the main thread, but
still using more memory) or use established libraries like Tus-JS-Client or Uppy, which handle
compatibility and provide features like resumability.
Memory management best practices
- Limit Worker Count: Spawn workers based on available CPU cores, typically
navigator.hardwareConcurrency
. Creating too many workers can lead to excessive context switching and memory overhead. - Terminate Workers: Explicitly call
worker.terminate()
orpool.terminate()
when the workers are no longer needed (e.g., after all uploads complete, or on page unload) to release resources. Usetry...finally
blocks in your application logic to ensure termination happens even if errors occur during the upload process. - Release References: In both the main thread and workers, nullify references to large objects
(like
File
objects,Blob
s,ArrayBuffer
s, or stream readers) once they are no longer needed (reader = null
,file = null
,chunk = null
) to allow garbage collection. Ensure stream readers are released usingreader.releaseLock()
. - Chunk Size: Choose a sensible chunk size (e.g., 1-10 MiB). Very small chunks increase network overhead (more HTTP requests per file), while very large chunks negate some of the memory-saving benefits of streaming.
- Monitor Memory: Use browser developer tools (like Chrome's Memory tab or Firefox's Memory
tool) or the
performance.memory
API (where available and applicable) during development and testing to monitor memory usage under load and identify potential leaks.
// Example using try...finally for pool termination in application code
const pool = new WorkerPool('upload-worker.js')
try {
// ... use pool to process files ...
// Example: await Promise.all(files.map(file => pool.processFileAsync(file)));
} finally {
// Ensure pool is terminated regardless of success or failure
console.log('Cleaning up worker pool...')
pool.terminate()
}
Security and resilience
- CORS: Configure your upload endpoint's Cross-Origin Resource Sharing (CORS) policy carefully
on the server. Allow only necessary HTTP methods (POST, potentially OPTIONS for preflight
requests), required headers (like
Content-Type
,Content-Range
, custom headers likeX-Chunk-Index
), and restrict origins (Access-Control-Allow-Origin
) to your application's domain. - Authentication/Authorization: Secure your upload endpoint. For chunked uploads, ensure each
chunk request is authenticated and authorized. Methods include using secure HTTP-only session
cookies, bearer tokens (JWTs) sent in the
Authorization
header, or generating pre-signed URLs for each chunk or the entire upload session (common with cloud storage). - Retries: Network issues are common. Implement a retry mechanism in your
uploadChunk
function for failed chunk uploads. Use exponential back-off (waiting progressively longer between retries: e.g., 1s, 2s, 4s) to avoid overwhelming the server or network. Abort retrying after a reasonable number of attempts (e.g., 3-5). - Cancellation: Provide users with a way to cancel ongoing uploads. Use the
AbortController
API. Create anAbortController
instance before starting the upload, pass itssignal
to eachfetch
request, and callcontroller.abort()
when the user cancels. Ensure your error handling catches theAbortError
.
// Example: Using AbortController for fetch cancellation
const ctrl = new AbortController()
const signal = ctrl.signal
// In uploadChunk function:
try {
const res = await fetch('/upload', { method: 'POST', body: formData, signal })
// ... handle response ...
} catch (err) {
if (err.name === 'AbortError') {
console.log('Chunk upload fetch aborted')
// Propagate cancellation signal or specific error
throw new Error('Upload cancelled by user')
} else {
console.error('Chunk upload fetch error:', err)
throw err // Re-throw other errors
}
}
// To cancel the upload associated with this controller:
// ctrl.abort();
Debugging Web Workers
Debugging workers can be slightly different from main thread debugging:
- Browser DevTools: Most modern browsers provide tools to inspect active workers. In Chrome
DevTools, look under the "Sources" tab (you might see worker scripts listed) or the dedicated
"Application" -> "Workers" section. In Firefox, check the "Debugger" tab (worker scripts appear in
the sources list) or "Application" -> "Workers". You can set breakpoints, inspect variables, and
view
console.log
messages from workers. - Error Handling: Robust
postMessage
communication for errors (as shown in the examples) is crucial for understanding issues occurring within the worker, as directtry...catch
from the main thread won't catch worker errors. Ensure worker errors are explicitly caught and posted back.
Common pitfalls
- Excessive Workers: Spawning a new worker for every file instead of using a pool can overwhelm the system's resources (CPU and memory).
- Stream Locks: Forgetting to call
reader.releaseLock()
on aReadableStreamDefaultReader
after finishing reading or encountering an error. This prevents the stream from being properly closed or potentially read again. Always use afinally
block forreleaseLock()
. - Large Message Payloads: Avoid posting very large data objects between the main thread and
workers using
postMessage
, as this involves serialization and deserialization overhead (or structured cloning). For large binary data, investigate usingTransferable
objects (likeArrayBuffer
) for more efficient zero-copy transfers where supported and appropriate. - Chunk Ordering: Assuming the server will receive chunks in the exact order they were sent. Network latency and concurrent requests can cause reordering. Always include an index or byte offset with each chunk so the server can reassemble the file correctly.
- Unhandled Errors: Lack of proper
try...catch
blocks within the worker, especially around asynchronous operations like stream reading (reader.read()
) and network requests (fetch
), can cause silent failures or unhandled promise rejections within the worker.
Key takeaways
Web Workers
are essential for moving CPU-intensive file processing (like chunking) off the main thread, keeping the UI responsive duringjs file upload
.JavaScript Streams
allow efficient handling of large files by processing data in chunks, significantly reducing peak memory usage and enabling features like resumability.- A worker pool manages
parallel processing
, preventing the system from being overloaded when handling multiple simultaneous uploads and improving overallfile upload performance
. - Robust error handling (including network retries and stream error handling), proper resource
cleanup (
terminate
,releaseLock
), security considerations (CORS, auth), and attention to browser compatibility are vital for production-ready implementations.
For a production-ready solution that handles chunking, resumability, retries, and parallel uploads out of the box, consider using libraries like Uppy with its various upload plugins, or explore services designed for robust file handling. Transloadit's handling uploads service integrates these concepts for reliable large file uploads. Happy uploading!