Scrapebase + Permit.io: Web Scraping with API-First Authorization

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined What I Built I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach. In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one. Key Features Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities API Key Authentication: Simple authentication using API keys Role-Based Access Control: Permissions managed through Permit.io Domain Blacklist System: Resource-level restrictions for sensitive domains Text Processing: Basic and advanced text processing with role-based restrictions How It Works The core authentication and authorization flow: User sends request with x-api-key header permitAuth middleware intercepts the request Middleware maps API key to user role (free_user, pro_user, or admin) User is synced to Permit.io Permission check runs against Permit.io cloud PDP Request is allowed or denied based on policy decision ┌──────────┐ ┌───────────────┐ ┌────────────┐ ┌──────────────┐ │ Client │───▶│ Scrapebase API│───▶│permitAuth │───▶│ Permit.io │ │ │◀───│ │◀───│ middleware │◀───│ Cloud PDP │ └──────────┘ └───────────────┘ └────────────┘ └──────────────┘ │ ▲ │ │ └────────────────────────────────────────────────────────┘ Permission policies defined in Permit.io dashboard Demo You can test the API using the following endpoints: # Test with free user curl -X POST http://localhost:8080/api/processLinks \ -H "Content-Type: application/json" \ -H "x-api-key: 2025DEVChallenge_free" \ -d '{"url": "https://example.com"}' # Test with admin user curl -X POST http://localhost:8080/api/processLinks \ -H "Content-Type: application/json" \ -H "x-api-key: 2025DEVChallenge_admin" \ -d '{"url": "https://example.com", "advanced": true}' Project Repo 0xtamizh / scrapebase-permit-IO Scrapebase with Permit.io Authorization A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service. Features Tiered Access Control: Different permissions for Free, Pro, and Admin users Resource-Based Authorization: Control access based on target domains Rate Limiting: Tier-specific rate limits enforced through policies Advanced Scraping Features: Premium capabilities restricted to Pro users Real-time Policy Updates: Changes to permissions take effect immediately Audit Logging: Track all authorization decisions Quick Start Clone the repository: git clone https://github.com/yourusername/scrapebase-permit cd scrapebase-permit Install dependencies: npm install Set up environment variables: cp .env.example .env Edit .env with your Permit.io API key and other configurations: PERMIT_API_KEY=your_permit_api_key ADMIN_API_KEY=2025DEVChallenge_admin USER_API_KEY=2025DEVChallenge_user Start the development server: npm run dev Visit http://localhost:3000 to access the testing UI Testing the Authorization Features Test Credentials Admin User: Username: admin API Key: 2025DEVChallenge_admin Regular… View on GitHub My Journey The Problem with Traditional Authorization Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle. I chose to build a web scraping service because it presents meaningful access control requirements: Tiered service levels that mirror real-world SaaS subscription models Administrative functions that require elevated permissions Resource-based restrictions through a domain blacklist system The Power of API-First Authorization The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to: Define all permission policies in one place Enforce consistent access control across all endpoints Update policies without changing application code The implementation was straightforward - here's the core middleware that powers the authorization flow

May 5, 2025 - 06:36
 0
Scrapebase + Permit.io: Web Scraping with API-First Authorization

This is a submission for the Permit.io Authorization Challenge: Permissions Redefined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

How It Works

The core authentication and authorization flow:

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role (free_user, pro_user, or admin)
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard

Demo

Scrapebase Demo

You can test the API using the following endpoints:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"url": "https://example.com", "advanced": true}'

Project Repo

Scrapebase with Permit.io Authorization

A powerful web scraping API with fine-grained authorization controls powered by Permit.io. This project demonstrates how to implement sophisticated authorization patterns in a real-world API service.

Features

  • Tiered Access Control: Different permissions for Free, Pro, and Admin users
  • Resource-Based Authorization: Control access based on target domains
  • Rate Limiting: Tier-specific rate limits enforced through policies
  • Advanced Scraping Features: Premium capabilities restricted to Pro users
  • Real-time Policy Updates: Changes to permissions take effect immediately
  • Audit Logging: Track all authorization decisions

Quick Start

  1. Clone the repository:
git clone https://github.com/yourusername/scrapebase-permit
cd scrapebase-permit
  1. Install dependencies:
npm install
  1. Set up environment variables:
cp .env.example .env

Edit .env with your Permit.io API key and other configurations:

PERMIT_API_KEY=your_permit_api_key
ADMIN_API_KEY=2025DEVChallenge_admin
USER_API_KEY=2025DEVChallenge_user
  1. Start the development server:
npm run dev
  1. Visit http://localhost:3000 to access the testing UI

Testing the Authorization Features

Test Credentials

Admin User:

  • Username: admin
  • API Key: 2025DEVChallenge_admin

Regular




My Journey

The Problem with Traditional Authorization

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. When I started this project, I wanted to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

I chose to build a web scraping service because it presents meaningful access control requirements:

  1. Tiered service levels that mirror real-world SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through a domain blacklist system

The Power of API-First Authorization

The key insight that drove this project was the separation of concerns: business logic should be distinct from authorization decisions. By using Permit.io, I was able to:

  1. Define all permission policies in one place
  2. Enforce consistent access control across all endpoints
  3. Update policies without changing application code

The implementation was straightforward - here's the core middleware that powers the authorization flow:

// Map API key to user role
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// Sync user to Permit.io
await permit.api.syncUser({
  key: userKey,
  email: `${userKey}@scrapebase.xyz`,
  attributes: { tier, roles: [tier] }
});

// Check permission
const action = req.body.advanced ? 'scrape_advanced' : 'scrape_basic';
const permissionCheck = await permit.check(user.key, action, 'website');

if (!permissionCheck) {
  return res.status(403).json({
    success: false,
    error: 'Access denied by Permit.io'
  });
}

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);

Role Assignment

Another challenge was ensuring roles were properly synchronized and recognized. The solution was two-fold:

  1. Properly sync users with their role information
  2. Manually configure role permissions in the Permit.io dashboard

Using Permit.io for Authorization

Setting up Permit.io involved these key steps:

  1. Creating a project in the Permit.io dashboard
  2. Defining resources (website), actions (scrape_basic, scrape_advanced), and roles (free_user, pro_user, admin)
  3. Configuring the permission matrix in the dashboard
  4. Integrating the Permit.io SDK into my application

Here's the role-based capability matrix I implemented:

Feature Free User Pro User Admin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Permission Enforcement

Permissions are enforced in two places:

  1. The permitAuth middleware for API endpoints:
   const permissionCheck = await permit.check(user.key, action, 'website');
   if (!permissionCheck) {
     return res.status(403).json({ success: false, error: 'Access denied' });
   }
  1. Directly in route handlers for specific features:
   // src/routes/summarize.ts
   if (summarize) {
     const userTier = req.user?.attributes?.tier;
     if (userTier !== 'pro_user' && userTier !== 'admin') {
       return res.status(403).json({
         success: false,
         error: 'Access denied',
         details: 'Text summarization is only available for Pro and Admin users'
       });
     }
   }

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Separate authorization concerns from business logic
  2. Implement role-based access control with external policy management
  3. Design a flexible permission system that doesn't require code changes to update policies

The advantages of this approach are clear:

  1. Separation of concerns: Business logic remains focused on core functionality while authorization is handled externally
  2. Adaptable policies: Permissions can be updated without code changes or redeployments
  3. Consistent enforcement: Authorization decisions follow the same rules across all application endpoints
  4. Improved security: Centralized policy management reduces the risk of inconsistent permission checks
  5. Developer experience: Cleaner codebase with reduced authorization-related complexity

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

Scrapebase demonstrates how modern SaaS apps can delegate complex authorization to a specialized service like Permit.io, allowing developers to focus on core features while maintaining robust access controls.