Verify file types in browsers with Magika

Ensuring accurate file type verification in browsers is crucial for keeping your application—and your users—safe. Relying solely on file extensions or MIME types leaves room for spoofing, while deep-learning approaches such as Google’s open-source tool Magika analyze the bytes themselves for far greater accuracy.
Challenges with traditional file type verification
Traditional techniques generally inspect only metadata:
- File extensions can be renamed in seconds.
- MIME types come from the client and are frequently wrong.
- Magic numbers work for common formats but struggle with proprietary or polyglot files.
These weak spots enable malicious uploads—evil.exe
masquerading as holiday.jpg
, for example—and
create a real attack surface for web apps.
Introducing Magika: AI-powered file identification
Magika is Google’s open-source answer to that problem. A deep-learning model trained on approximately 100 million files, it recognizes over 200 binary and text file formats—even when metadata lies. Magika can identify file types within milliseconds on a single CPU with around 99% accuracy across its extensive test set.
Latest stable version: 0.6.1 (March 2024) • Requires Python 3.8+
How Magika works
- Reads a configurable byte slice from the file (fast, even on large uploads). Magika also supports stream-based analysis for very large files.
- Feeds that data into a lightweight neural network. It offers different identification modes, with
HIGH_CONFIDENCE
as the default. - Returns a label, MIME type, and confidence score—typically in under a millisecond on a single CPU.
Integrate Magika into a browser application
1. Install Magika
pip install magika
2. Create a minimal verification API (Flask)
from flask import Flask, request, jsonify
from magika import Magika
app = Flask(__name__)
magika = Magika() # initialise once at startup
@app.route('/verify', methods=['POST'])
def verify_file():
if 'file' not in request.files:
return jsonify({'error': 'No file provided'}), 400
content = request.files['file'].read()
result = magika.identify_bytes(content)
if not result.ok:
return jsonify({'error': 'File analysis failed'}), 500
return jsonify({
'file_type': result.output.label,
'mime_type': result.output.mime_type,
'description': result.output.description,
'score': result.score,
})
if __name__ == '__main__': # dev only—use Gunicorn in production
app.run(debug=True)
3. Wire up the front-end
async function verifyFile(file) {
const formData = new FormData()
formData.append('file', file)
try {
const response = await fetch('/verify', {
method: 'POST',
body: formData,
})
if (!response.ok) {
throw new Error(`HTTP error! Status: ${response.status}`)
}
const result = await response.json()
return result
} catch (error) {
console.error('File verification request failed:', error)
throw error
}
}
document.getElementById('fileInput').addEventListener('change', async (event) => {
const file = event.target.files[0]
if (!file) {
return // No file selected
}
try {
const analysisResult = await verifyFile(file)
document.getElementById('result').textContent =
`Detected: ${analysisResult.file_type} (${(analysisResult.score * 100).toFixed(1)}% confidence)`
} catch (error) {
document.getElementById('result').textContent = 'Verification failed. See console for details.'
}
})
4. Compare results to an allowlist
This client-side check can improve user experience, but always validate on the server too.
// Assuming 'analysisResult' is available from the previous step
const allowedFileTypes = ['pdf', 'jpeg', 'png'] // Using Magika's labels
if (!allowedFileTypes.includes(analysisResult.file_type)) {
alert(
`File type "${analysisResult.file_type}" is not allowed. Allowed types are: ${allowedFileTypes.join(', ')}.`,
)
// Or throw new Error(`File type ${analysisResult.file_type} is not permitted.`);
}
Magika vs. Traditional methods
Feature | File extension / MIME | Magic numbers | Magika |
---|---|---|---|
Detects spoofed extensions | ❌ | ✅/Partial | ✅ |
Handles polyglot files | ❌ | ❌ | ✅ |
Coverage (200+ formats) | ❌ | ✅ | ✅ |
Speed (milliseconds per file) | ✅ | ✅ | ✅ |
Open-source & maintained | N/A | Some | ✅ |
Practical use cases
Secure upload forms
Validate uploads before saving or passing them to further processing. This check should primarily happen on your server after receiving the file and Magika's analysis.
// Example client-side feedback based on server verification result
// const analysisResult = await verifyFile(file); // from server
if (!['jpeg', 'png', 'pdf'].includes(analysisResult.file_type)) {
// Display error to user: Only images and PDFs allowed
throw new Error('Only images and PDFs allowed')
}
Content moderation
Route files to specialised pipelines based on Magika’s label—images to an AI-moderation service, videos to FFmpeg, documents to OCR, and so on.
Pre-scan for malware
Flag risky executables early. This is particularly useful on the server-side before extensive processing.
// Server-side logic example (conceptual)
// const analysisResult = magika.identify_bytes(file_content);
const riskyFileTypes = ['exe', 'dll', 'bat', 'sh']
if (riskyFileTypes.includes(analysisResult.output.label)) {
// await deepScan(file); // Trigger a more intensive scan
// Or reject the file immediately
}
Error handling & best practices
- Verify on the server. Client-side checks are a convenience, not a robust defence.
- Set request timeouts. Large files or cold-start containers still need limits for your API endpoint.
- Log confidence scores. They help trace edge cases when Magika is unsure or for auditing.
- Update regularly. Each Magika release adds formats and improves accuracy.
- Layer defences. Combine Magika with antivirus software, denylist logic based on other criteria, and rate-limiting for your upload endpoint.
Transloadit alternative
Prefer an off-the-shelf SaaS? Our [🤖 /file/verify]({{ robot_links["/file/verify"] }}) Robot performs similar content-based checks in the cloud. A minimal Step looks like:
{
"robot": "/file/verify",
"use": ":original",
"verify_to_be": "pdf",
"error_on_decline": true,
"error_msg": "File type verification failed"
}
Pair it with Uppy for a complete, front-to-back upload pipeline.
Wrap-up
Magika significantly enhances file type verification by inspecting file content rather than trusting metadata. Whether you self-host Magika or use Transloadit’s Robot, content-based checks are a straightforward way to harden your upload forms against spoofing and malware.