This tutorial exhibits how one can use it ScraygraphHighly effective scraping instruments mixed with Gemini AI automate the gathering, evaluation and evaluation of competitor data. Scrapegraph’s SmartScrapertool and MarkdownifyTool enable customers to extract detailed insights from the market presence immediately from product choices, pricing methods, expertise stacks, and rivals’ web sites. On this tutorial, we make use of Gemini’s superior language mannequin to synthesize these completely different knowledge factors into structured, sensible intelligence. All through the method, the scrape graph ensures that uncooked extractions are correct and scalable, permitting analysts to concentrate on strategic interpretation somewhat than guide knowledge assortment.
%pip set up --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn
Quietly improve or set up the newest model of the Important Library, together with Langchain-Scrapegraph for superior internet scraping and knowledge evaluation instruments equivalent to Langchain-Google-Genai, Pandas, Matplotlib, and Seaborn for integrating Gemini AI to make sure your atmosphere is prepared for seamless aggressive intelligence workflows.
import getpass
import os
import json
import pandas as pd
from typing import Listing, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
Essential Python library for establishing a safe, data-driven pipeline: GetPass and OS administration passwords and atmosphere variable administration, JSON handles serialized knowledge, Pandas gives strong knowledge body manipulation. The typing module gives kind hints for higher code readability, however DateTime data the timestamp. Lastly, Matplotlib.pyplot and Seaborn are outfitted with instruments to create insightful visualizations.
if not os.environ.get("SGAI_API_KEY"):
os.environ["SGAI_API_KEY"] = getpass.getpass("ScrapeGraph AI API key:n")
if not os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API key for Gemini:n")
Verify if the sgai_api_key and google_api_key atmosphere variables are already set. In any other case, the script will firmly encourage the person to make use of Scrapegraph and Google (Gemini) API keys through GetPass and retailer them within the atmosphere for subsequent authenticated requests.
from langchain_scrapegraph.instruments import (
SmartScraperTool,
SearchScraperTool,
MarkdownifyTool,
GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser
smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credit = GetCreditsTool()
llm = ChatGoogleGenerativeAI(
mannequin="gemini-1.5-flash",
temperature=0.1,
convert_system_message_to_human=True
)
Right here, to extract and course of internet knowledge, Scrapegraph Instruments, Smartscrapertool, SearchScrapertool, MarkDownifyTool, and GetCreditStool are imported and instantiated, and the “Gemini-1.5-Flash” mannequin (chilly and human-readable system messages) is constructed with the “Gemini-1.5-Flash” mannequin (chilly and human-readable system messages). It additionally introduces LangChain_Core’s ChatPromptTemplate, RunnableConfig, Chain, and JSonOutputPuparser, and configures prompts to investigate the mannequin output.
class CompetitiveAnalyzer:
def __init__(self):
self.outcomes = []
self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict[str, Any]:
"""Scrape complete knowledge from a competitor web site"""
extraction_prompt = """
Extract the next data from this web site:
1. Firm identify and tagline
2. Essential merchandise/companies provided
3. Pricing data (if obtainable)
4. Audience/market
5. Key options and advantages highlighted
6. Expertise stack talked about
7. Contact data
8. Social media presence
9. Latest information or bulletins
10. Group measurement indicators
11. Funding data (if talked about)
12. Buyer testimonials or case research
13. Partnership data
14. Geographic presence/markets served
Return the knowledge in a structured JSON format with clear categorization.
If data just isn't obtainable, mark as 'Not Obtainable'.
"""
strive:
outcome = smartscraper.invoke({
"user_prompt": extraction_prompt,
"website_url": url,
})
markdown_content = markdownify.invoke({"website_url": url})
competitor_data = {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": outcome,
"markdown_length": len(markdown_content),
"analysis_date": self.analysis_timestamp,
"success": True,
"error": None
}
return competitor_data
besides Exception as e:
return {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": None,
"error": str(e),
"success": False,
"analysis_date": self.analysis_timestamp
}
def analyze_competitor_landscape(self, rivals: Listing[Dict[str, str]]) -> Dict[str, Any]:
"""Analyze a number of rivals and generate insights"""
print(f"🔍 Beginning aggressive evaluation for {len(rivals)} firms...")
for i, competitor in enumerate(rivals, 1):
print(f"📊 Analyzing {competitor['name']} ({i}/{len(rivals)})...")
knowledge = self.scrape_competitor_data(
competitor['url'],
competitor['name']
)
self.outcomes.append(knowledge)
analysis_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a senior business analyst specializing in competitive intelligence.
Analyze the scraped competitor data and provide comprehensive insights including:
1. Market positioning analysis
2. Pricing strategy comparison
3. Feature gap analysis
4. Target audience overlap
5. Technology differentiation
6. Market opportunities
7. Competitive threats
8. Strategic recommendations
Provide actionable insights in JSON format with clear categories and recommendations.
"""),
("human", "Analyze this competitive data: {competitor_data}")
])
clean_data = []
for end in self.outcomes:
if outcome['success']:
clean_data.append({
'firm': outcome['company_name'],
'url': outcome['url'],
'knowledge': outcome['scraped_data']
})
analysis_chain = analysis_prompt | llm | JsonOutputParser()
strive:
competitive_analysis = analysis_chain.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
besides:
analysis_chain_text = analysis_prompt | llm
competitive_analysis = analysis_chain_text.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
return {
"evaluation": competitive_analysis,
"raw_data": self.outcomes,
"summary_stats": self.generate_summary_stats()
}
def generate_summary_stats(self) -> Dict[str, Any]:
"""Generate abstract statistics from the evaluation"""
successful_scrapes = sum(1 for r in self.outcomes if r['success'])
failed_scrapes = len(self.outcomes) - successful_scrapes
return {
"total_companies_analyzed": len(self.outcomes),
"successful_scrapes": successful_scrapes,
"failed_scrapes": failed_scrapes,
"success_rate": f"{(successful_scrapes/len(self.outcomes)*100):.1f}%" if self.outcomes else "0%",
"analysis_timestamp": self.analysis_timestamp
}
def export_results(self, filename: str = None):
"""Export outcomes to JSON and CSV recordsdata"""
if not filename:
filename = f"competitive_analysis_{datetime.now().strftime('%Ypercentmpercentd_percentHpercentMpercentS')}"
with open(f"{filename}.json", 'w') as f:
json.dump({
"outcomes": self.outcomes,
"abstract": self.generate_summary_stats()
}, f, indent=2)
df_data = []
for end in self.outcomes:
if outcome['success']:
df_data.append({
'Firm': outcome['company_name'],
'URL': outcome['url'],
'Success': outcome['success'],
'Data_Length': len(str(outcome['scraped_data'])) if outcome['scraped_data'] else 0,
'Analysis_Date': outcome['analysis_date']
})
if df_data:
df = pd.DataFrame(df_data)
df.to_csv(f"{filename}.csv", index=False)
print(f"✅ Outcomes exported to {filename}.json and {filename}.csv")
Competitor courses coordinate end-to-end competitor analysis, use Scrupgraph instruments to scrape detailed firm data, edit and clear outcomes, and leverage Gemini AI to generate structured, aggressive insights. It additionally gives utility methods for monitoring success charges and timestamps and exporting each RAW and summary knowledge to JSON and CSV codecs to facilitate downstream reporting and evaluation.
def run_ai_saas_analysis():
"""Run a complete evaluation of AI/SaaS rivals"""
analyzer = CompetitiveAnalyzer()
ai_saas_competitors = [
{"name": "OpenAI", "url": "https://openai.com"},
{"name": "Anthropic", "url": "https://anthropic.com"},
{"name": "Hugging Face", "url": "https://huggingface.co"},
{"name": "Cohere", "url": "https://cohere.ai"},
{"name": "Scale AI", "url": "https://scale.com"},
]
outcomes = analyzer.analyze_competitor_landscape(ai_saas_competitors)
print("n" + "="*80)
print("🎯 COMPETITIVE ANALYSIS RESULTS")
print("="*80)
print(f"n📊 Abstract Statistics:")
stats = outcomes['summary_stats']
for key, worth in stats.gadgets():
print(f" {key.exchange('_', ' ').title()}: {worth}")
print(f"n🔍 Strategic Evaluation:")
if isinstance(outcomes['analysis'], dict):
for part, content material in outcomes['analysis'].gadgets():
print(f"n {part.exchange('_', ' ').title()}:")
if isinstance(content material, checklist):
for merchandise in content material:
print(f" • {merchandise}")
else:
print(f" {content material}")
else:
print(outcomes['analysis'])
analyzer.export_results("ai_saas_competitive_analysis")
return outcomes
The above capabilities start aggressive evaluation by instantiating a aggressive benefit and defining the key AI/SAAS gamers to be evaluated. It then runs a whole scraping and insights workflow, prints formatted abstract statistics and strategic findings, and ultimately exports detailed outcomes to JSON and CSV for additional use.
def run_ecommerce_analysis():
"""Analyze e-commerce platform rivals"""
analyzer = CompetitiveAnalyzer()
ecommerce_competitors = [
{"name": "Shopify", "url": "https://shopify.com"},
{"name": "WooCommerce", "url": "https://woocommerce.com"},
{"name": "BigCommerce", "url": "https://bigcommerce.com"},
{"name": "Magento", "url": "https://magento.com"},
]
outcomes = analyzer.analyze_competitor_landscape(ecommerce_competitors)
analyzer.export_results("ecommerce_competitive_analysis")
return outcomes
The above operate units up a aggressive veanalyzer that evaluates the key ecommerce platforms by scraping particulars from every website, producing strategic insights, and exporting the findings to each JSON and CSV recordsdata beneath the identify “ecommerce_competitive_analysis”.
@chain
def social_media_monitoring_chain(company_urls: Listing[str], config: RunnableConfig):
"""Monitor social media presence and engagement methods of rivals"""
social_media_prompt = ChatPromptTemplate.from_messages([
("system", """
You are a social media strategist. Analyze the social media presence and strategies
of these companies. Focus on:
1. Platform presence (LinkedIn, Twitter, Instagram, etc.)
2. Content strategy patterns
3. Engagement tactics
4. Community building approaches
5. Brand voice and messaging
6. Posting frequency and timing
Provide actionable insights for improving social media strategy.
"""),
("human", "Analyze social media data for: {urls}")
])
social_data = []
for url in company_urls:
strive:
outcome = smartscraper.invoke({
"user_prompt": "Extract all social media hyperlinks, neighborhood engagement options, and social proof parts",
"website_url": url,
})
social_data.append({"url": url, "social_data": outcome})
besides Exception as e:
social_data.append({"url": url, "error": str(e)})
chain = social_media_prompt | llm
evaluation = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
return {
"social_analysis": evaluation,
"raw_social_data": social_data
}
Right here, this chain defines a pipeline for accumulating and analyzing the social media footprints of rivals. Use Scrapegraph’s good scraper to extract social media hyperlinks and engagement parts and provides that knowledge a immediate centered on presence, content material technique and neighborhood techniques. Lastly, it returns each uncooked scrape data and sensible social media insights generated by AI in a single structured output.
def check_credits():
"""Verify obtainable credit"""
strive:
credits_info = credit.invoke({})
print(f"💳 Obtainable Credit: {credits_info}")
return credits_info
besides Exception as e:
print(f"⚠️ Couldn't verify credit: {e}")
return None
The above operate calls getCreditStool to get and show obtainable Scraiggraph/Gemini API credit, print out a outcome or warning if the verify breaks down, and return credit score data (or none on error).
if __name__ == "__main__":
print("🚀 Superior Aggressive Evaluation Device with Gemini AI")
print("="*60)
check_credits()
print("n🤖 Working AI/SaaS Aggressive Evaluation...")
ai_results = run_ai_saas_analysis()
run_additional = enter("n❓ Run e-commerce evaluation as nicely? (y/n): ").decrease().strip()
if run_additional == 'y':
print("n🛒 Working E-commerce Platform Evaluation...")
ecom_results = run_ecommerce_analysis()
print("n✨ Evaluation full! Verify the exported recordsdata for detailed outcomes.")
Lastly, the final code piece acts as an entry level for the script. Begin a aggressive evaluation (and optionally e-commerce evaluation) for AI/SaaS earlier than printing the header, checking the API credit, and signaling that each one outcomes have been exported.
In conclusion, integrating scraping capabilities with Gemini AI with scraping capabilities of Scraiphon, interprets conventional, time-consuming, aggressive intelligence workflows into environment friendly, reproducible pipelines. Scraig handles heavy lifting of web-based data acquisition and normalization, however Gemini’s language understanding turns its uncooked knowledge into high-level strategic suggestions. Because of this, firms can rapidly assess market positioning, determine purposeful gaps, and determine new alternatives with minimal guide intervention. By automating these steps, customers achieve pace and consistency, and the pliability to develop their analytics to new rivals and markets as wanted.
Please verify Github Notebook. All credit for this research can be directed to researchers on this undertaking. Additionally, please be at liberty to comply with us Twitter And do not forget to affix us 95k+ ml subreddit And subscribe Our Newsletter.
Asif Razzaq is CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, ASIF is dedicated to leveraging the probabilities of synthetic intelligence for social advantages. His newest efforts are the launch of MarkTechPost, a synthetic intelligence media platform. That is distinguished by its detailed protection of machine studying and deep studying information, and is straightforward to grasp by a technically sound and vast viewers. The platform has over 2 million views every month, indicating its recognition amongst viewers.

