Ensuring accurate file type verification in browsers is crucial for keeping your application—and your users—safe. Relying solely on file extensions or MIME types leaves room for spoofing, while deep-learning approaches such as Google’s open-source tool Magika analyze the bytes themselves for far greater accuracy.

Challenges with traditional file type verification

Traditional techniques generally inspect only metadata:

  • File extensions can be renamed in seconds.
  • MIME types come from the client and are frequently wrong.
  • Magic numbers work for common formats but struggle with proprietary or polyglot files.

These weak spots enable malicious uploads—evil.exe masquerading as holiday.jpg, for example—and create a real attack surface for web apps.

Introducing Magika: AI-powered file identification

Magika is Google’s open-source answer to that problem. A deep-learning model trained on approximately 100 million files, it recognizes over 200 binary and text file formats—even when metadata lies. Magika can identify file types within milliseconds on a single CPU with around 99% accuracy across its extensive test set.

Latest stable version: 0.6.1 (March 2024) • Requires Python 3.8+

How Magika works

  1. Reads a configurable byte slice from the file (fast, even on large uploads). Magika also supports stream-based analysis for very large files.
  2. Feeds that data into a lightweight neural network. It offers different identification modes, with HIGH_CONFIDENCE as the default.
  3. Returns a label, MIME type, and confidence score—typically in under a millisecond on a single CPU.

Integrate Magika into a browser application

1. Install Magika

pip install magika

2. Create a minimal verification API (Flask)

from flask import Flask, request, jsonify
from magika import Magika

app = Flask(__name__)
magika = Magika()  # initialise once at startup

@app.route('/verify', methods=['POST'])
def verify_file():
    if 'file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400

    content = request.files['file'].read()
    result = magika.identify_bytes(content)

    if not result.ok:
        return jsonify({'error': 'File analysis failed'}), 500

    return jsonify({
        'file_type': result.output.label,
        'mime_type': result.output.mime_type,
        'description': result.output.description,
        'score': result.score,
    })

if __name__ == '__main__':  # dev only—use Gunicorn in production
    app.run(debug=True)

3. Wire up the front-end

async function verifyFile(file) {
  const formData = new FormData()
  formData.append('file', file)

  try {
    const response = await fetch('/verify', {
      method: 'POST',
      body: formData,
    })

    if (!response.ok) {
      throw new Error(`HTTP error! Status: ${response.status}`)
    }

    const result = await response.json()
    return result
  } catch (error) {
    console.error('File verification request failed:', error)
    throw error
  }
}

document.getElementById('fileInput').addEventListener('change', async (event) => {
  const file = event.target.files[0]
  if (!file) {
    return // No file selected
  }

  try {
    const analysisResult = await verifyFile(file)
    document.getElementById('result').textContent =
      `Detected: ${analysisResult.file_type} (${(analysisResult.score * 100).toFixed(1)}% confidence)`
  } catch (error) {
    document.getElementById('result').textContent = 'Verification failed. See console for details.'
  }
})

4. Compare results to an allowlist

This client-side check can improve user experience, but always validate on the server too.

// Assuming 'analysisResult' is available from the previous step
const allowedFileTypes = ['pdf', 'jpeg', 'png'] // Using Magika's labels

if (!allowedFileTypes.includes(analysisResult.file_type)) {
  alert(
    `File type "${analysisResult.file_type}" is not allowed. Allowed types are: ${allowedFileTypes.join(', ')}.`,
  )
  // Or throw new Error(`File type ${analysisResult.file_type} is not permitted.`);
}

Magika vs. Traditional methods

Feature File extension / MIME Magic numbers Magika
Detects spoofed extensions ✅/Partial
Handles polyglot files
Coverage (200+ formats)
Speed (milliseconds per file)
Open-source & maintained N/A Some

Practical use cases

Secure upload forms

Validate uploads before saving or passing them to further processing. This check should primarily happen on your server after receiving the file and Magika's analysis.

// Example client-side feedback based on server verification result
// const analysisResult = await verifyFile(file); // from server
if (!['jpeg', 'png', 'pdf'].includes(analysisResult.file_type)) {
  // Display error to user: Only images and PDFs allowed
  throw new Error('Only images and PDFs allowed')
}

Content moderation

Route files to specialised pipelines based on Magika’s label—images to an AI-moderation service, videos to FFmpeg, documents to OCR, and so on.

Pre-scan for malware

Flag risky executables early. This is particularly useful on the server-side before extensive processing.

// Server-side logic example (conceptual)
// const analysisResult = magika.identify_bytes(file_content);
const riskyFileTypes = ['exe', 'dll', 'bat', 'sh']
if (riskyFileTypes.includes(analysisResult.output.label)) {
  // await deepScan(file); // Trigger a more intensive scan
  // Or reject the file immediately
}

Error handling & best practices

  1. Verify on the server. Client-side checks are a convenience, not a robust defence.
  2. Set request timeouts. Large files or cold-start containers still need limits for your API endpoint.
  3. Log confidence scores. They help trace edge cases when Magika is unsure or for auditing.
  4. Update regularly. Each Magika release adds formats and improves accuracy.
  5. Layer defences. Combine Magika with antivirus software, denylist logic based on other criteria, and rate-limiting for your upload endpoint.

Transloadit alternative

Prefer an off-the-shelf SaaS? Our [🤖 /file/verify]({{ robot_links["/file/verify"] }}) Robot performs similar content-based checks in the cloud. A minimal Step looks like:

{
  "robot": "/file/verify",
  "use": ":original",
  "verify_to_be": "pdf",
  "error_on_decline": true,
  "error_msg": "File type verification failed"
}

Pair it with Uppy for a complete, front-to-back upload pipeline.

Wrap-up

Magika significantly enhances file type verification by inspecting file content rather than trusting metadata. Whether you self-host Magika or use Transloadit’s Robot, content-based checks are a straightforward way to harden your upload forms against spoofing and malware.