How I Fixed the Mystery of Corrupted File Downloads from AWS S3

How I Fixed the Mystery of Corrupted File Downloads from AWS S3 When your users report "corrupted files" but everything looks fine on your end Last month, I ran into one of those bugs that makes you question everything you think you know about web development. Users were complaining that downloaded files from our document management system were "corrupted" and wouldn't open. The twist? The files were perfectly fine in our AWS S3 bucket. The Problem That Started It All Picture this: You're building an application where users upload important kyc documents—driver's licenses, bank statements, tax documents. Everything seems to work perfectly during testing. Files upload fine, they're stored securely in S3, and when you download them through your admin panel, they open without issues. Then the support tickets start rolling in: "I can't open the driver's license I just downloaded" "The bank statement file seems corrupted" "Why is my tax document showing as a 'Blob' file?" My first instinct was to blame the users (classic developer move, I know). Maybe they're using old browsers? Maybe their antivirus is interfering? But when I started investigating, I realized the problem was much closer to home. The Investigation I pulled up our file download function—a simple, straightforward piece of code that I'd written dozens of times before: export async function downloadBlobFile( blobUrl: string, fileName: string ): Promise { try { const response = await fetch(blobUrl); const blob = await response.blob(); const url = window.URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = fileName; document.body.appendChild(a); a.click(); // cleanup... } catch (err) { // error handling... } } Looks innocent enough, right? The issue became clear when I started examining what was actually happening: Our S3 objects had no file extensions - Files were stored as doc-12381-0530103115-bank-statement instead of bank-statement.pdf S3 wasn't setting Content-Type headers - Everything was coming back as application/octet-stream or no content type at all Files were being saved as .bin files - Because the browser had no idea what they actually were The "corruption" wasn't corruption at all—it was a case of mistaken identity. A PDF being saved as a .bin file naturally won't open in a PDF viewer. My First (Failed) Approach My initial thought was simple: "I'll just add the file extension to the filename when saving!" But how do you know if a file should be .pdf, .jpg, or .docx when the server isn't telling you? I tried extracting it from the URL, but our presigned URLs looked like this: https://uploads.s3.amazonaws.com/bank-statement-12381-0530103115?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=... No extension in sight. I considered asking users to specify the file type when uploading, but that felt like admitting defeat. There had to be a better way. The Eureka Moment Then I remembered something from blogs and few articles I've seen before now: file signatures (or "magic numbers"). Every file type has a unique byte sequence at the beginning that identifies what it is, regardless of its name or extension. PDFs start with %PDF (hex: 25 50 44 46) JPEGs start with FF D8 FF PNGs start with 89 50 4E 47 I could read the first few bytes of each downloaded file, identify its actual type, and assign the correct extension! Building the Solution Here's the approach I took: Download the file as a blob (necessary for CORS anyway) Check if the server provided a useful Content-Type If not, read the first 20 bytes and analyze the file signature Map the signature to the correct MIME type and extension Download with the proper filename The implementation handles the most common file types we encounter: async function detectFileType( blob: Blob ): Promise { const arrayBuffer = await blob.slice(0, 20).arrayBuffer(); const bytes = new Uint8Array(arrayBuffer); // PDF signature if ( bytes[0] === 0x25 && bytes[1] === 0x50 && bytes[2] === 0x44 && bytes[3] === 0x46 ) { return { mimeType: 'application/pdf', extension: '.pdf' }; } // JPEG signature if (bytes[0] === 0xff && bytes[1] === 0xd8 && bytes[2] === 0xff) { return { mimeType: 'image/jpeg', extension: '.jpg' }; } // ... more file types } The Results The transformation was immediate and dramatic: Zero corruption complaints since deploying the fix User satisfaction improved - files now open in the correct applications automatically Support tickets dropped by about 60% for file-related issues Developer confidence restored - no more mysterious "it works on my machine" scenarios Users went from downloading bank-statement.blob files that wouldn't open, to getting bank-statement.pdf files that opened immediately in their PDF viewers.

Jun 2, 2025 - 11:30
 0
How I Fixed the Mystery of Corrupted File Downloads from AWS S3

How I Fixed the Mystery of Corrupted File Downloads from AWS S3

When your users report "corrupted files" but everything looks fine on your end

Last month, I ran into one of those bugs that makes you question everything you think you know about web development. Users were complaining that downloaded files from our document management system were "corrupted" and wouldn't open. The twist? The files were perfectly fine in our AWS S3 bucket.

The Problem That Started It All

Picture this: You're building an application where users upload important kyc documents—driver's licenses, bank statements, tax documents. Everything seems to work perfectly during testing. Files upload fine, they're stored securely in S3, and when you download them through your admin panel, they open without issues.

Then the support tickets start rolling in:

"I can't open the driver's license I just downloaded"

"The bank statement file seems corrupted"

"Why is my tax document showing as a 'Blob' file?"

My first instinct was to blame the users (classic developer move, I know). Maybe they're using old browsers? Maybe their antivirus is interfering? But when I started investigating, I realized the problem was much closer to home.

The Investigation

I pulled up our file download function—a simple, straightforward piece of code that I'd written dozens of times before:

export async function downloadBlobFile(
  blobUrl: string,
  fileName: string
): Promise<void> {
  try {
    const response = await fetch(blobUrl);
    const blob = await response.blob();
    const url = window.URL.createObjectURL(blob);

    const a = document.createElement('a');
    a.href = url;
    a.download = fileName;
    document.body.appendChild(a);
    a.click();
    // cleanup...
  } catch (err) {
    // error handling...
  }
}

Looks innocent enough, right? The issue became clear when I started examining what was actually happening:

  1. Our S3 objects had no file extensions - Files were stored as doc-12381-0530103115-bank-statement instead of bank-statement.pdf
  2. S3 wasn't setting Content-Type headers - Everything was coming back as application/octet-stream or no content type at all
  3. Files were being saved as .bin files - Because the browser had no idea what they actually were

The "corruption" wasn't corruption at all—it was a case of mistaken identity. A PDF being saved as a .bin file naturally won't open in a PDF viewer.

My First (Failed) Approach

My initial thought was simple: "I'll just add the file extension to the filename when saving!"

But how do you know if a file should be .pdf, .jpg, or .docx when the server isn't telling you?

I tried extracting it from the URL, but our presigned URLs looked like this:

https://uploads.s3.amazonaws.com/bank-statement-12381-0530103115?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...

No extension in sight.

I considered asking users to specify the file type when uploading, but that felt like admitting defeat. There had to be a better way.

The Eureka Moment

Then I remembered something from blogs and few articles I've seen before now: file signatures (or "magic numbers"). Every file type has a unique byte sequence at the beginning that identifies what it is, regardless of its name or extension.

  • PDFs start with %PDF (hex: 25 50 44 46)
  • JPEGs start with FF D8 FF
  • PNGs start with 89 50 4E 47

I could read the first few bytes of each downloaded file, identify its actual type, and assign the correct extension!

Building the Solution

Here's the approach I took:

  1. Download the file as a blob (necessary for CORS anyway)
  2. Check if the server provided a useful Content-Type
  3. If not, read the first 20 bytes and analyze the file signature
  4. Map the signature to the correct MIME type and extension
  5. Download with the proper filename

The implementation handles the most common file types we encounter:

async function detectFileType(
  blob: Blob
): Promise<{ mimeType: string; extension: string } | null> {
  const arrayBuffer = await blob.slice(0, 20).arrayBuffer();
  const bytes = new Uint8Array(arrayBuffer);

  // PDF signature
  if (
    bytes[0] === 0x25 &&
    bytes[1] === 0x50 &&
    bytes[2] === 0x44 &&
    bytes[3] === 0x46
  ) {
    return { mimeType: 'application/pdf', extension: '.pdf' };
  }

  // JPEG signature
  if (bytes[0] === 0xff && bytes[1] === 0xd8 && bytes[2] === 0xff) {
    return { mimeType: 'image/jpeg', extension: '.jpg' };
  }

  // ... more file types
}

The Results

The transformation was immediate and dramatic:

  • Zero corruption complaints since deploying the fix
  • User satisfaction improved - files now open in the correct applications automatically
  • Support tickets dropped by about 60% for file-related issues
  • Developer confidence restored - no more mysterious "it works on my machine" scenarios

Users went from downloading bank-statement.blob files that wouldn't open, to getting bank-statement.pdf files that opened immediately in their PDF viewers.

Current Limitations (And Future Plans)

While this solution works great for our use case, it's not perfect:

What it handles well:

  • Common image formats (JPEG, PNG, GIF, WebP)
  • PDF documents
  • Basic Office documents (as ZIP files)
  • Text-based formats (JSON, XML, HTML)

What it doesn't handle yet:

  • Audio/video files (MP3, MP4, etc.)
  • More exotic document formats
  • Encrypted or password-protected files
  • Very large files (though it only reads the first 20 bytes, so this shouldn't be an issue)

Performance considerations:

  • Adds a small overhead to read file signatures
  • Uses more memory than direct downloads
  • Still requires the blob method for CORS compatibility

Lessons Learned

This experience taught me a few valuable lessons:

  1. User complaints about "corruption" aren't always about actual file corruption - sometimes it's about incorrect file associations
  2. Cloud storage services don't always preserve file metadata - you can't assume Content-Type headers will be present or accurate
  3. File extensions matter more than we think - they're not just cosmetic; they determine how operating systems handle files
  4. Sometimes the old-school approaches (like file signatures) are still the best solutions to modern problems

Want to Use This Solution?

I've open-sourced this file download utility because I suspect other developers are running into similar issues. The complete implementation handles edge cases, includes extensive logging for debugging, and supports the most common file types you'll encounter in web applications.

You can find it in this GitHub Gist with full documentation and usage examples.

Have you run into similar file download mysteries in your applications? I'd love to hear about your experiences and solutions in the comments below.

Building reliable file handling in web applications is trickier than it looks. Sometimes the solution isn't more sophisticated cloud configuration—it's going back to the fundamentals of how computers identify files.