Origin Docs
GuidesScraping

Generate Themes from a Website with ORGN Gateway

Scrape a website and generate design themes using ORGN Gateway inference — example workflow with the OpenAI SDK.

Example workflow: analyze a company website and generate business themes using ORGN Gateway.

  1. Read sitemap.xml
  2. Filter relevant pages (/services, /product, /platform)
  3. Scrape text content
  4. Send combined content to a Gateway model
  5. Return structured themes

Only the theme-generation step uses Gateway. Scraping is standard HTTP — respect robots.txt and site terms.

Step 1: Read the sitemap

import requests
import xml.etree.ElementTree as ET

def get_relevant_urls(sitemap_url):
   response = requests.get(sitemap_url)
   root = ET.fromstring(response.content)
   urls = []
   for url in root.findall(".//{*}loc"):
       link = url.text
       if any(path in link for path in ["/services", "/product", "/platform"]):
           urls.append(link)
   return urls

Step 2: Scrape page content

from bs4 import BeautifulSoup

def scrape_page(url):
   response = requests.get(url)
   soup = BeautifulSoup(response.text, "html.parser")
   text = soup.get_text(separator=" ", strip=True)
   return " ".join(text.split())

Step 3: Generate themes via Gateway

from openai import OpenAI

client = OpenAI(
   base_url="https://api.gateway.orgn.com/v1",
   api_key="sk-ollm-YOUR_API_KEY"
)

def generate_themes(content):
   response = client.chat.completions.create(
       model="near_glm_4_7",
       messages=[
           {
               "role": "system",
               "content": "Extract high-level business themes from website content."
           },
           {
               "role": "user",
               "content": f"Analyze and extract themes:\n\n{content}"
           }
       ]
   )
   return response.choices[0].message.content

near_glm_4_7 is a TEE model — inference runs in a Trust Domain with optional Scanner attestation. For frontier models without hardware receipts, use a vercel_* ID instead.

Step 4: Full workflow

def run_analysis():
   urls = get_relevant_urls("https://example.com/sitemap.xml")
   combined_content = ""
   for url in urls:
       combined_content += "\n\n" + scrape_page(url)
   themes = generate_themes(combined_content)
   print(themes)

Production considerations

  • Chunk or truncate large content (MAX_CHARS) to stay within context limits
  • Record response.usage.total_tokens for cost tracking
  • Handle HTTP and API errors gracefully
  • For end-to-end confidential pipelines, run scraping and analysis inside CDE cloud worktrees

On this page