Module M.06

Cloudflare Block

The Invisible Wall

Many websites use Cloudflare or similar bot protection. When GenLayer validators call web.render() on these URLs, they receive a challenge or blocked page instead of the actual content. This causes inconsistent responses and consensus failure. Use GenLayer's Intelligent Crawler or maintain a health-checked URL registry.

Side-by-side · Vulnerable vs. Patched

two contracts · proven by paired transactions
vulnerablecontracts/vulnerable/VulnerableCrawler.py
Failed TX
> consensus failed · validators diverged
1# { "Depends": "py-genlayer:15qfivjvy80800rh998pcxmd2m8va1wq2qzqhz850n8ggcr4i9q0" }23from genlayer import *45# Module 6 (Vulnerable) -- Anti-bot wall.6# The contract accepts any URL from the caller and pulls it through7# nondet.web.render. If the target is fronted by an anti-bot service,8# different validators see different challenge pages (rotating tokens,9# rate-limit messages, captcha HTML), so strict_eq cannot agree.101112class VulnerableCrawler(gl.Contract):13    last_excerpt: str1415    def __init__(self):16        self.last_excerpt = ""1718    @gl.public.write19    def crawl(self, url: str) -> None:20        def _fetch() -> str:21            return gl.nondet.web.render(url, mode="text")[:400]2223        # strict_eq + an unconstrained URL == fragile.24        self.last_excerpt = gl.eq_principle.strict_eq(_fetch)2526    @gl.public.view27    def get_last_excerpt(self) -> str:28        return self.last_excerpt
patchedcontracts/patched/HealthCheckedCrawler.py
Success TX
> consensus reached · all validators agree
1# { "Depends": "py-genlayer:15qfivjvy80800rh998pcxmd2m8va1wq2qzqhz850n8ggcr4i9q0" }23from genlayer import *4import json5import re67# Module 6 (Patched) -- Health-checked crawler.8# Defenses:9#   1. URL must be on a mutable allow-list of pre-validated hosts.10#   2. The fetched body is fed through prompt_comparative so validators11#      agree on its semantic content even if rendering differs slightly.12#   3. Bodies that look like challenge pages are rejected (so we never13#      confuse a captcha for content).1415DEFAULT_HOSTS = [16    "lite.cnn.com",17    "text.npr.org",18    "news.ycombinator.com",19]2021CHALLENGE_MARKERS = [22    "checking your browser",23    "cloudflare",24    "ddos protection",25    "are you human",26    "captcha",27]282930class HealthCheckedCrawler(gl.Contract):31    owner: str32    allowed_hosts_json: str33    last_summary: str3435    def __init__(self):36        self.owner = str(gl.message.sender_address)37        self.allowed_hosts_json = json.dumps(DEFAULT_HOSTS)38        self.last_summary = ""3940    def _require_owner(self) -> None:41        if str(gl.message.sender_address) != self.owner:42            raise Exception("only owner")4344    @gl.public.write45    def crawl(self, url: str) -> None:46        m = re.match(r"^https://([^/]+)(/.*)?$", url)47        if not m:48            raise ValueError("https URL required")49        host = m.group(1).lower()50        if host.startswith("www."):51            host = host[4:]52        allowed = json.loads(self.allowed_hosts_json)53        if host not in allowed:54            raise ValueError(f"host not on allow-list: {host}")5556        def _fetch() -> str:57            body = gl.nondet.web.render(url, mode="text") or ""58            lower = body.lower()59            for marker in CHALLENGE_MARKERS:60                if marker in lower:61                    raise Exception(f"challenge page detected: {marker}")62            return body[:1000]6364        self.last_summary = gl.eq_principle.prompt_comparative(65            _fetch,66            principle="Bodies must convey the same top news content; ignore minor formatting differences.",67        )6869    @gl.public.write70    def add_host(self, host: str) -> None:71        self._require_owner()72        allowed = json.loads(self.allowed_hosts_json)73        if host not in allowed:74            allowed.append(host)75            self.allowed_hosts_json = json.dumps(allowed)7677    @gl.public.write78    def remove_host(self, host: str) -> None:79        self._require_owner()80        allowed = json.loads(self.allowed_hosts_json)81        allowed = [h for h in allowed if h != host]82        self.allowed_hosts_json = json.dumps(allowed)8384    @gl.public.write85    def check_url(self, url: str) -> None:86        """Runs the host allow-list gate only -- no web call, deterministic.87        Demonstrates the first layer of the fix: reject unknown hosts before88        spending gas on a remote render that would likely diverge anyway."""89        m = re.match(r"^https://([^/]+)(/.*)?$", url)90        if not m:91            self.last_summary = "REJECTED: not https"92            return93        host = m.group(1).lower()94        if host.startswith("www."):95            host = host[4:]96        allowed = json.loads(self.allowed_hosts_json)97        self.last_summary = f"ALLOWED: {host}" if host in allowed else f"REJECTED: {host} not on allow-list"9899    @gl.public.view100    def get_last_summary(self) -> str:101        return self.last_summary102103    @gl.public.view104    def get_allowed_hosts(self) -> str:105        return self.allowed_hosts_json
Call invoked
crawl("https://www.cloudflare.com")

strict_eq on Cloudflare-fronted body -> rotating tokens / challenge pages diverge

Call invoked
check_url("https://reuters.com/article/123")

host allow-list gate rejects an unknown host -- deterministic first line of defense

On-chain receipts

Knowledge check · M.06

01 / 02

Two questions on this incident. Pick the best answer; the question locks once committed.

Question 01 / 02
Why does Cloudflare break GenLayer consensus?