A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow

In this tutorial, we demonstrate how to leverage ScrapeGraph’s powerful scraping tools in combination with Gemini AI to automate the collection, parsing, and analysis of competitor information. By using ScrapeGraph’s SmartScraperTool and MarkdownifyTool, users can extract detailed insights from product offerings, pricing strategies, technology stacks, and market presence directly from competitor websites. The tutorial then […] The post A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow appeared first on MarkTechPost.

Jun 3, 2025 - 09:20

A Coding Guide Implementing ScrapeGraph and Gemini AI for an Automated, Scalable, Insight-Driven Competitive Intelligence and Market Analysis Workflow

In this tutorial, we demonstrate how to leverage ScrapeGraph’s powerful scraping tools in combination with Gemini AI to automate the collection, parsing, and analysis of competitor information. By using ScrapeGraph’s SmartScraperTool and MarkdownifyTool, users can extract detailed insights from product offerings, pricing strategies, technology stacks, and market presence directly from competitor websites. The tutorial then employs Gemini’s advanced language model to synthesize these disparate data points into structured, actionable intelligence. Throughout the process, ScrapeGraph ensures that the raw extraction is both accurate and scalable, allowing analysts to focus on strategic interpretation rather than manual data gathering.

Copy CodeCopiedUse a different Browser

%pip install --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn

We quietly upgrade or install the latest versions of essential libraries, including langchain-scrapegraph for advanced web scraping and langchain-google-genai for integrating Gemini AI, as well as data analysis tools such as pandas, matplotlib, and seaborn, to ensure your environment is ready for seamless competitive intelligence workflows.

Copy CodeCopiedUse a different Browser

import getpass
import os
import json
import pandas as pd
from typing import List, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

We import essential Python libraries for setting up a secure, data-driven pipeline: getpass and os manage passwords and environment variables, json handles serialized data, and pandas offers robust DataFrame operations. The typing module provides type hints for better code clarity, while datetime records timestamps. Finally, matplotlib.pyplot and seaborn equip us with tools for creating insightful visualizations.

Copy CodeCopiedUse a different Browser

if not os.environ.get("SGAI_API_KEY"):
    os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:\n")


if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:\n")

We check if the SGAI_API_KEY and GOOGLE_API_KEY environment variables are already set; if not, the script securely prompts the user for their ScrapeGraph and Google (Gemini) API keys via getpass and stores them in the environment for subsequent authenticated requests.

Copy CodeCopiedUse a different Browser

from langchain_scrapegraph.tools import (
    SmartScraperTool,
    SearchScraperTool,
    MarkdownifyTool,
    GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser


smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credits = GetCreditsTool()


llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.1,
    convert_system_message_to_human=True
)

Here, we import and instantiate ScrapeGraph tools, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing web data, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” model (low temperature and human-readable system messages) to drive our analysis. We also bring in ChatPromptTemplate, RunnableConfig, chain, and JsonOutputParser from langchain_core to structure prompts and parse model outputs.

Copy CodeCopiedUse a different Browser

class CompetitiveAnalyzer:
    def __init__(self):
        self.results = []
        self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
   
    def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
        """Scrape comprehensive data from a competitor website"""
       
        extraction_prompt = """
        Extract the following information from this website:
        1. Company name and tagline
        2. Main products/services offered
        3. Pricing information (if available)
        4. Target audience/market
        5. Key features and benefits highlighted
        6. Technology stack mentioned
        7. Contact information
        8. Social media presence
        9. Recent news or announcements
        10. Team size indicators
        11. Funding information (if mentioned)
        12. Customer testimonials or case studies
        13. Partnership information
        14. Geographic presence/markets served
       
        Return the information in a structured JSON format with clear categorization.
        If information is not available, mark as 'Not Available'.
        """
       
        try:
            result = smartscraper.invoke({
                "user_prompt": extraction_prompt,
                "website_url": url,
            })
           
            markdown_content = markdownify.invoke({"website_url": url})
           
            competitor_data = {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": result,
                "markdown_length": len(markdown_content),
                "analysis_date": self.analysis_timestamp,
                "success": True,
                "error": None
            }
           
            return competitor_data
           
        except Exception as e:
            return {
                "company_name": company_name or "Unknown",
                "url": url,
                "scraped_data": None,
                "error": str(e),
                "success": False,
                "analysis_date": self.analysis_timestamp
            }
   
    def analyze_competitor_landscape(self, competitors: List[Dict[str, str]]) -> Dict[str, Any]:
        """Analyze multiple competitors and generate insights"""
       
        print(f"
                            
                                Read More


                                        
                        Tags:
                        
                                                    
                    
                    
                        
                            
                                                                    
                                        
                                            
                                            Previous Article                                        
                                    
                                    
                                        This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Lang...
                                    
                                                            
                            
                                                                    
                                        
                                            Next Article                                            
                                        
                                    
                                    
                                        Integrating AI Autocomplete in React with OpenAI
                                    
                                                            
                        
                    
                                        
                        
                            
                                
                                    
                                        Related Posts
                                    
                                
                                
                                    
                                                                                            
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        Georgia Tech and Stanford Researchers Introduce MLE-Doj...
                                                                May 15, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        The Legal Accountability of AI-Generated Deepfakes in E...
                                                                Jun 1, 2025
     0

                                                        
                                                    
                                                                                                    
                                                        
                                                                                                                            
                                                                    
                                                                        
                                                                                                                                            
                                                                
                                                                                                                        Anthropic Releases Claude Opus 4 and Claude Sonnet 4: A...
                                                                May 23, 2025
     0

                                                        
                                                    
                                                                                    
                                
                            
                        
                    
                                            
                            
                                
                                    
                                                                                    
                                                                            
                                    
                                                                                    
                                                    
        
        
        
            
                
                    Name
                    
                
                
                    Email
                    
                
            
        
        
            Comment


            
                
    
        
                    
            Popular Posts
            
                
                                                
                                
            
                            
                    
                        
                                            
                
                    
        
        Open World Dress Up Game ‘Infinity Nikki’ Gets New...
            May 12, 2025
     0

    
                            
                                                    
                                
            
                            
                    
                        
                                            
                
                    
        
        How to Reset BrosTrend WiFi Extender to Factory De...
            May 12, 2025
     0

    
                            
                                                    
                                
            
                            
                    
                        
                                            
                
                    
        
        Accelerate Your AI Skills: Essential Generative AI...
            May 12, 2025
     1

    
                            
                                                    
                                
            
                            
                    
                        
                                            
                
                    
        
        How to Use AI Chatbots to Create a Month’s Worth o...
            May 12, 2025
     0

    
                            
                                                    
                                
            
                            
                    
                        
                                            
                
                    
        
        Stochastic cluster embedding – a new method for vi...
            May 12, 2025
     0