How to Block Requests with Puppeteer

When working with Puppeteer for web automation and scraping, you'll often encounter situations where you need to block specific requests to improve performance or reduce bandwidth usage. In this guide, we'll explore effective techniques for intercepting and blocking unwanted network requests in Puppeteer. Why Block Requests in Puppeteer? Blocking unnecessary requests offers several benefits: Faster page loading - Skipping images, analytics, and ads can dramatically reduce load times Reduced bandwidth usage - Especially important for cloud-based scraping or serverless functions Lower memory consumption - Fewer resources to process means less memory used Improved stability - Fewer network requests means fewer potential points of failure Let's dive into the practical implementations. Basic Request Interception in Puppeteer The foundation of request blocking in Puppeteer is the request interception mechanism. Here's a simple example: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); // Enable request interception await page.setRequestInterception(true); // Add event listener to intercept requests page.on('request', (request) => { // Your logic to determine which requests to block // ... // Either abort or continue the request request.continue(); }); await page.goto('https://example.com'); // Continue with your automation await page.screenshot({ path: 'example.png' }); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); This is the basic structure, but the real power comes in how you decide which requests to block. Blocking Requests by Resource Type One of the most common approaches is to block requests based on resource type. Puppeteer allows you to identify the type of each request: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); // Enable request interception await page.setRequestInterception(true); // Block requests by resource type page.on('request', (request) => { const resourceType = request.resourceType(); // Define which resources to block const blockResources = ['image', 'stylesheet', 'font', 'media']; if (blockResources.includes(resourceType)) { request.abort(); } else { request.continue(); } }); await page.goto('https://example.com'); // Take a screenshot to see the effect await page.screenshot({ path: 'no-images-css.png' }); console.log('Page loaded without images, CSS, fonts, and media!'); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); Puppeteer supports the following resource types: document stylesheet image media font script texttrack xhr fetch eventsource websocket manifest other Choose which types to block based on your specific use case. For pure data extraction, blocking images, stylesheets, fonts, and media is often a good choice. Blocking Requests by URL Pattern Another approach is to block requests that match specific URL patterns: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); await page.setRequestInterception(true); // Block requests by URL pattern page.on('request', (request) => { const url = request.url(); // Define patterns to block const blockedPatterns = [ 'google-analytics.com', 'googletagmanager.com', 'doubleclick.net', 'facebook.net', 'ads', 'tracking', '.png', '.jpg', '.jpeg', '.gif' ]; // Check if URL contains any blocked pattern if (blockedPatterns.some(pattern => url.includes(pattern))) { request.abort(); } else { request.continue(); } }); await page.goto('https://example.com'); await page.screenshot({ path: 'no-trackers-images.png' }); console.log('Page loaded without tracking scripts and images!'); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); This technique is particularly useful for: Blocking analytics and tracking scripts Preventing ad networks from loading Filtering out certain file types Blocking third-party resources Combining Multiple Blocking Strategies For more comprehensive request blocking, combine both resource type and URL pattern approaches: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); await page.setRequestInterception(true); // Combined blo

May 2, 2025 - 13:09
 0
How to Block Requests with Puppeteer

When working with Puppeteer for web automation and scraping, you'll often encounter situations where you need to block specific requests to improve performance or reduce bandwidth usage. In this guide, we'll explore effective techniques for intercepting and blocking unwanted network requests in Puppeteer.

Why Block Requests in Puppeteer?

Blocking unnecessary requests offers several benefits:

  1. Faster page loading - Skipping images, analytics, and ads can dramatically reduce load times
  2. Reduced bandwidth usage - Especially important for cloud-based scraping or serverless functions
  3. Lower memory consumption - Fewer resources to process means less memory used
  4. Improved stability - Fewer network requests means fewer potential points of failure

Let's dive into the practical implementations.

Basic Request Interception in Puppeteer

The foundation of request blocking in Puppeteer is the request interception mechanism. Here's a simple example:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    // Enable request interception
    await page.setRequestInterception(true);

    // Add event listener to intercept requests
    page.on('request', (request) => {
      // Your logic to determine which requests to block
      // ...

      // Either abort or continue the request
      request.continue();
    });

    await page.goto('https://example.com');

    // Continue with your automation
    await page.screenshot({ path: 'example.png' });

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await browser.close();
  }
})();

This is the basic structure, but the real power comes in how you decide which requests to block.

Blocking Requests by Resource Type

One of the most common approaches is to block requests based on resource type. Puppeteer allows you to identify the type of each request:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    // Enable request interception
    await page.setRequestInterception(true);

    // Block requests by resource type
    page.on('request', (request) => {
      const resourceType = request.resourceType();

      // Define which resources to block
      const blockResources = ['image', 'stylesheet', 'font', 'media'];

      if (blockResources.includes(resourceType)) {
        request.abort();
      } else {
        request.continue();
      }
    });

    await page.goto('https://example.com');

    // Take a screenshot to see the effect
    await page.screenshot({ path: 'no-images-css.png' });

    console.log('Page loaded without images, CSS, fonts, and media!');

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await browser.close();
  }
})();

Puppeteer supports the following resource types:

  • document
  • stylesheet
  • image
  • media
  • font
  • script
  • texttrack
  • xhr
  • fetch
  • eventsource
  • websocket
  • manifest
  • other

Choose which types to block based on your specific use case. For pure data extraction, blocking images, stylesheets, fonts, and media is often a good choice.

Blocking Requests by URL Pattern

Another approach is to block requests that match specific URL patterns:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    await page.setRequestInterception(true);

    // Block requests by URL pattern
    page.on('request', (request) => {
      const url = request.url();

      // Define patterns to block
      const blockedPatterns = [
        'google-analytics.com',
        'googletagmanager.com',
        'doubleclick.net',
        'facebook.net',
        'ads',
        'tracking',
        '.png',
        '.jpg',
        '.jpeg',
        '.gif'
      ];

      // Check if URL contains any blocked pattern
      if (blockedPatterns.some(pattern => url.includes(pattern))) {
        request.abort();
      } else {
        request.continue();
      }
    });

    await page.goto('https://example.com');

    await page.screenshot({ path: 'no-trackers-images.png' });

    console.log('Page loaded without tracking scripts and images!');

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await browser.close();
  }
})();

This technique is particularly useful for:

  • Blocking analytics and tracking scripts
  • Preventing ad networks from loading
  • Filtering out certain file types
  • Blocking third-party resources

Combining Multiple Blocking Strategies

For more comprehensive request blocking, combine both resource type and URL pattern approaches:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    await page.setRequestInterception(true);

    // Combined blocking strategy
    page.on('request', (request) => {
      const resourceType = request.resourceType();
      const url = request.url();

      // Block by resource type
      const blockedResourceTypes = ['image', 'media', 'font'];

      // Block by URL pattern
      const blockedUrlPatterns = [
        'analytics',
        'tracking',
        'advertisement',
        'doubleclick',
      ];

      // Block if either condition is met
      if (
        blockedResourceTypes.includes(resourceType) ||
        blockedUrlPatterns.some(pattern => url.includes(pattern))
      ) {
        request.abort();
      } else {
        request.continue();
      }
    });

    await page.goto('https://example.com');

    // Continue with your automation

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await browser.close();
  }
})();

This combined approach gives you the most control over which requests are allowed.

Using Wildcards for URL Pattern Matching

For more flexible URL matching, consider using a wildcard matching library:

const puppeteer = require('puppeteer');
const wildcardMatch = require('wildcard-match');

(async () => {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    await page.setRequestInterception(true);

    // Create matchers for different patterns
    const isBlockedDomain = wildcardMatch(['*.analytics.com', '*.ads.*', 'tracker.*']);
    const isBlockedFile = wildcardMatch(['*.png', '*.jpg', '*.gif']);

    page.on('request', (request) => {
      const url = request.url();

      if (isBlockedDomain(url) || isBlockedFile(url)) {
        request.abort();
      } else {
        request.continue();
      }
    });

    await page.goto('https://example.com');

  } catch (error) {
    console.error('Error:', error);
  } finally {
    await browser.close();
  }
})();

You'll need to install the wildcard-match package:

npm install wildcard-match

Measuring Performance Improvements

To understand the impact of request blocking, you can measure the performance difference:

const puppeteer = require('puppeteer');

async function measureLoadTime(blockRequests) {
  const browser = await puppeteer.launch();
  try {
    const page = await browser.newPage();

    // Set up request interception if enabled
    if (blockRequests) {
      await page.setRequestInterception(true);
      page.on('request', (request) => {
        const resourceType = request.resourceType();
        if (['image', 'stylesheet', 'font'].includes(resourceType)) {
          request.abort();
        } else {
          request.continue();
        }
      });
    }

    // Measure performance
    const startTime = Date.now();

    await page.goto('https://example.com', {
      waitUntil: 'networkidle2',
    });

    const loadTime = Date.now() - startTime;
    return loadTime;
  } finally {
    await browser.close();
  }
}

(async () => {
  console.log('Testing performance...');

  // Without blocking
  const regularLoadTime = await measureLoadTime(false);
  console.log(`Regular load time: ${regularLoadTime}ms`);

  // With blocking
  const optimizedLoadTime = await measureLoadTime(true);
  console.log(`Optimized load time: ${optimizedLoadTime}ms`);

  // Calculate improvement
  const improvement = ((regularLoadTime - optimizedLoadTime) / regularLoadTime * 100).toFixed(2);
  console.log(`Performance improvement: ${improvement}%`);
})();

This script helps you quantify the benefits of request blocking for your specific use case.

Potential Issues with Request Interception

While request interception is powerful, be aware of these potential issues:

  1. Performance impact - In older versions of Puppeteer, request interception disables the native browser cache, which can slow down navigation
  2. Breaking functionality - Blocking certain resources might break website functionality if you need to interact with the page
  3. Race conditions - Complex request handling can sometimes lead to race conditions

For simple use cases, these issues are rarely a problem, but keep them in mind for more complex scenarios.

Simplified Solution: Using a Screenshot API

If you're using Puppeteer primarily for taking screenshots, CaptureKit Screenshot API offers a simpler approach with built-in request blocking:

curl "https://api.capturekit.dev/capture?url=https://example.com&block_resources=image,media,font&access_key=YOUR_ACCESS_KEY"

With CaptureKit, you can specify which resource types to block without managing browser instances or request interception:

Parameter Type Description
block_resources string Comma-separated list of resource types to block (e.g., "image,stylesheet,font")
block_urls string Comma-separated list of URL patterns to block (e.g., "analytics,tracking,advertisement")

This approach eliminates the complexity of managing browser instances and request interception directly.

Conclusion

Blocking unnecessary requests in Puppeteer can significantly improve performance, reduce bandwidth usage, and make your automation scripts more efficient. By selectively filtering requests based on resource type, URL patterns, or a combination of both, you can focus on the content that matters for your specific use case.

Whether you implement request blocking directly with Puppeteer or use a service like CaptureKit API, this technique should be part of your web automation toolkit.

Happy optimizing!