How to Block Requests with Puppeteer
When working with Puppeteer for web automation and scraping, you'll often encounter situations where you need to block specific requests to improve performance or reduce bandwidth usage. In this guide, we'll explore effective techniques for intercepting and blocking unwanted network requests in Puppeteer. Why Block Requests in Puppeteer? Blocking unnecessary requests offers several benefits: Faster page loading - Skipping images, analytics, and ads can dramatically reduce load times Reduced bandwidth usage - Especially important for cloud-based scraping or serverless functions Lower memory consumption - Fewer resources to process means less memory used Improved stability - Fewer network requests means fewer potential points of failure Let's dive into the practical implementations. Basic Request Interception in Puppeteer The foundation of request blocking in Puppeteer is the request interception mechanism. Here's a simple example: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); // Enable request interception await page.setRequestInterception(true); // Add event listener to intercept requests page.on('request', (request) => { // Your logic to determine which requests to block // ... // Either abort or continue the request request.continue(); }); await page.goto('https://example.com'); // Continue with your automation await page.screenshot({ path: 'example.png' }); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); This is the basic structure, but the real power comes in how you decide which requests to block. Blocking Requests by Resource Type One of the most common approaches is to block requests based on resource type. Puppeteer allows you to identify the type of each request: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); // Enable request interception await page.setRequestInterception(true); // Block requests by resource type page.on('request', (request) => { const resourceType = request.resourceType(); // Define which resources to block const blockResources = ['image', 'stylesheet', 'font', 'media']; if (blockResources.includes(resourceType)) { request.abort(); } else { request.continue(); } }); await page.goto('https://example.com'); // Take a screenshot to see the effect await page.screenshot({ path: 'no-images-css.png' }); console.log('Page loaded without images, CSS, fonts, and media!'); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); Puppeteer supports the following resource types: document stylesheet image media font script texttrack xhr fetch eventsource websocket manifest other Choose which types to block based on your specific use case. For pure data extraction, blocking images, stylesheets, fonts, and media is often a good choice. Blocking Requests by URL Pattern Another approach is to block requests that match specific URL patterns: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); await page.setRequestInterception(true); // Block requests by URL pattern page.on('request', (request) => { const url = request.url(); // Define patterns to block const blockedPatterns = [ 'google-analytics.com', 'googletagmanager.com', 'doubleclick.net', 'facebook.net', 'ads', 'tracking', '.png', '.jpg', '.jpeg', '.gif' ]; // Check if URL contains any blocked pattern if (blockedPatterns.some(pattern => url.includes(pattern))) { request.abort(); } else { request.continue(); } }); await page.goto('https://example.com'); await page.screenshot({ path: 'no-trackers-images.png' }); console.log('Page loaded without tracking scripts and images!'); } catch (error) { console.error('Error:', error); } finally { await browser.close(); } })(); This technique is particularly useful for: Blocking analytics and tracking scripts Preventing ad networks from loading Filtering out certain file types Blocking third-party resources Combining Multiple Blocking Strategies For more comprehensive request blocking, combine both resource type and URL pattern approaches: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); try { const page = await browser.newPage(); await page.setRequestInterception(true); // Combined blo

When working with Puppeteer for web automation and scraping, you'll often encounter situations where you need to block specific requests to improve performance or reduce bandwidth usage. In this guide, we'll explore effective techniques for intercepting and blocking unwanted network requests in Puppeteer.
Why Block Requests in Puppeteer?
Blocking unnecessary requests offers several benefits:
- Faster page loading - Skipping images, analytics, and ads can dramatically reduce load times
- Reduced bandwidth usage - Especially important for cloud-based scraping or serverless functions
- Lower memory consumption - Fewer resources to process means less memory used
- Improved stability - Fewer network requests means fewer potential points of failure
Let's dive into the practical implementations.
Basic Request Interception in Puppeteer
The foundation of request blocking in Puppeteer is the request interception mechanism. Here's a simple example:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
// Add event listener to intercept requests
page.on('request', (request) => {
// Your logic to determine which requests to block
// ...
// Either abort or continue the request
request.continue();
});
await page.goto('https://example.com');
// Continue with your automation
await page.screenshot({ path: 'example.png' });
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
})();
This is the basic structure, but the real power comes in how you decide which requests to block.
Blocking Requests by Resource Type
One of the most common approaches is to block requests based on resource type. Puppeteer allows you to identify the type of each request:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
// Block requests by resource type
page.on('request', (request) => {
const resourceType = request.resourceType();
// Define which resources to block
const blockResources = ['image', 'stylesheet', 'font', 'media'];
if (blockResources.includes(resourceType)) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
// Take a screenshot to see the effect
await page.screenshot({ path: 'no-images-css.png' });
console.log('Page loaded without images, CSS, fonts, and media!');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
})();
Puppeteer supports the following resource types:
document
stylesheet
image
media
font
script
texttrack
xhr
fetch
eventsource
websocket
manifest
other
Choose which types to block based on your specific use case. For pure data extraction, blocking images, stylesheets, fonts, and media is often a good choice.
Blocking Requests by URL Pattern
Another approach is to block requests that match specific URL patterns:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.setRequestInterception(true);
// Block requests by URL pattern
page.on('request', (request) => {
const url = request.url();
// Define patterns to block
const blockedPatterns = [
'google-analytics.com',
'googletagmanager.com',
'doubleclick.net',
'facebook.net',
'ads',
'tracking',
'.png',
'.jpg',
'.jpeg',
'.gif'
];
// Check if URL contains any blocked pattern
if (blockedPatterns.some(pattern => url.includes(pattern))) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
await page.screenshot({ path: 'no-trackers-images.png' });
console.log('Page loaded without tracking scripts and images!');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
})();
This technique is particularly useful for:
- Blocking analytics and tracking scripts
- Preventing ad networks from loading
- Filtering out certain file types
- Blocking third-party resources
Combining Multiple Blocking Strategies
For more comprehensive request blocking, combine both resource type and URL pattern approaches:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.setRequestInterception(true);
// Combined blocking strategy
page.on('request', (request) => {
const resourceType = request.resourceType();
const url = request.url();
// Block by resource type
const blockedResourceTypes = ['image', 'media', 'font'];
// Block by URL pattern
const blockedUrlPatterns = [
'analytics',
'tracking',
'advertisement',
'doubleclick',
];
// Block if either condition is met
if (
blockedResourceTypes.includes(resourceType) ||
blockedUrlPatterns.some(pattern => url.includes(pattern))
) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
// Continue with your automation
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
})();
This combined approach gives you the most control over which requests are allowed.
Using Wildcards for URL Pattern Matching
For more flexible URL matching, consider using a wildcard matching library:
const puppeteer = require('puppeteer');
const wildcardMatch = require('wildcard-match');
(async () => {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.setRequestInterception(true);
// Create matchers for different patterns
const isBlockedDomain = wildcardMatch(['*.analytics.com', '*.ads.*', 'tracker.*']);
const isBlockedFile = wildcardMatch(['*.png', '*.jpg', '*.gif']);
page.on('request', (request) => {
const url = request.url();
if (isBlockedDomain(url) || isBlockedFile(url)) {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
} catch (error) {
console.error('Error:', error);
} finally {
await browser.close();
}
})();
You'll need to install the wildcard-match package:
npm install wildcard-match
Measuring Performance Improvements
To understand the impact of request blocking, you can measure the performance difference:
const puppeteer = require('puppeteer');
async function measureLoadTime(blockRequests) {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
// Set up request interception if enabled
if (blockRequests) {
await page.setRequestInterception(true);
page.on('request', (request) => {
const resourceType = request.resourceType();
if (['image', 'stylesheet', 'font'].includes(resourceType)) {
request.abort();
} else {
request.continue();
}
});
}
// Measure performance
const startTime = Date.now();
await page.goto('https://example.com', {
waitUntil: 'networkidle2',
});
const loadTime = Date.now() - startTime;
return loadTime;
} finally {
await browser.close();
}
}
(async () => {
console.log('Testing performance...');
// Without blocking
const regularLoadTime = await measureLoadTime(false);
console.log(`Regular load time: ${regularLoadTime}ms`);
// With blocking
const optimizedLoadTime = await measureLoadTime(true);
console.log(`Optimized load time: ${optimizedLoadTime}ms`);
// Calculate improvement
const improvement = ((regularLoadTime - optimizedLoadTime) / regularLoadTime * 100).toFixed(2);
console.log(`Performance improvement: ${improvement}%`);
})();
This script helps you quantify the benefits of request blocking for your specific use case.
Potential Issues with Request Interception
While request interception is powerful, be aware of these potential issues:
- Performance impact - In older versions of Puppeteer, request interception disables the native browser cache, which can slow down navigation
- Breaking functionality - Blocking certain resources might break website functionality if you need to interact with the page
- Race conditions - Complex request handling can sometimes lead to race conditions
For simple use cases, these issues are rarely a problem, but keep them in mind for more complex scenarios.
Simplified Solution: Using a Screenshot API
If you're using Puppeteer primarily for taking screenshots, CaptureKit Screenshot API offers a simpler approach with built-in request blocking:
curl "https://api.capturekit.dev/capture?url=https://example.com&block_resources=image,media,font&access_key=YOUR_ACCESS_KEY"
With CaptureKit, you can specify which resource types to block without managing browser instances or request interception:
Parameter | Type | Description |
---|---|---|
block_resources |
string | Comma-separated list of resource types to block (e.g., "image,stylesheet,font") |
block_urls |
string | Comma-separated list of URL patterns to block (e.g., "analytics,tracking,advertisement") |
This approach eliminates the complexity of managing browser instances and request interception directly.
Conclusion
Blocking unnecessary requests in Puppeteer can significantly improve performance, reduce bandwidth usage, and make your automation scripts more efficient. By selectively filtering requests based on resource type, URL patterns, or a combination of both, you can focus on the content that matters for your specific use case.
Whether you implement request blocking directly with Puppeteer or use a service like CaptureKit API, this technique should be part of your web automation toolkit.
Happy optimizing!