Module M.06
Cloudflare Block
The Invisible Wall
Many websites use Cloudflare or similar bot protection. When GenLayer validators call web.render() on these URLs, they receive a challenge or blocked page instead of the actual content. This causes inconsistent responses and consensus failure. Use GenLayer's Intelligent Crawler or maintain a health-checked URL registry.
Side-by-side · Vulnerable vs. Patched
two contracts · proven by paired transactions
vulnerable ▸contracts/vulnerable/VulnerableCrawler.py
Failed TX> consensus failed · validators diverged
1# { "Depends": "py-genlayer:15qfivjvy80800rh998pcxmd2m8va1wq2qzqhz850n8ggcr4i9q0" }23from genlayer import *45# Module 6 (Vulnerable) -- Anti-bot wall.6# The contract accepts any URL from the caller and pulls it through7# nondet.web.render. If the target is fronted by an anti-bot service,8# different validators see different challenge pages (rotating tokens,9# rate-limit messages, captcha HTML), so strict_eq cannot agree.101112class VulnerableCrawler(gl.Contract):13 last_excerpt: str1415 def __init__(self):16 self.last_excerpt = ""1718 @gl.public.write19 def crawl(self, url: str) -> None:20 def _fetch() -> str:21 return gl.nondet.web.render(url, mode="text")[:400]2223 # strict_eq + an unconstrained URL == fragile.24 self.last_excerpt = gl.eq_principle.strict_eq(_fetch)2526 @gl.public.view27 def get_last_excerpt(self) -> str:28 return self.last_excerpt
patched ▸contracts/patched/HealthCheckedCrawler.py
Success TX> consensus reached · all validators agree
1# { "Depends": "py-genlayer:15qfivjvy80800rh998pcxmd2m8va1wq2qzqhz850n8ggcr4i9q0" }23from genlayer import *4import json5import re67# Module 6 (Patched) -- Health-checked crawler.8# Defenses:9# 1. URL must be on a mutable allow-list of pre-validated hosts.10# 2. The fetched body is fed through prompt_comparative so validators11# agree on its semantic content even if rendering differs slightly.12# 3. Bodies that look like challenge pages are rejected (so we never13# confuse a captcha for content).1415DEFAULT_HOSTS = [16 "lite.cnn.com",17 "text.npr.org",18 "news.ycombinator.com",19]2021CHALLENGE_MARKERS = [22 "checking your browser",23 "cloudflare",24 "ddos protection",25 "are you human",26 "captcha",27]282930class HealthCheckedCrawler(gl.Contract):31 owner: str32 allowed_hosts_json: str33 last_summary: str3435 def __init__(self):36 self.owner = str(gl.message.sender_address)37 self.allowed_hosts_json = json.dumps(DEFAULT_HOSTS)38 self.last_summary = ""3940 def _require_owner(self) -> None:41 if str(gl.message.sender_address) != self.owner:42 raise Exception("only owner")4344 @gl.public.write45 def crawl(self, url: str) -> None:46 m = re.match(r"^https://([^/]+)(/.*)?$", url)47 if not m:48 raise ValueError("https URL required")49 host = m.group(1).lower()50 if host.startswith("www."):51 host = host[4:]52 allowed = json.loads(self.allowed_hosts_json)53 if host not in allowed:54 raise ValueError(f"host not on allow-list: {host}")5556 def _fetch() -> str:57 body = gl.nondet.web.render(url, mode="text") or ""58 lower = body.lower()59 for marker in CHALLENGE_MARKERS:60 if marker in lower:61 raise Exception(f"challenge page detected: {marker}")62 return body[:1000]6364 self.last_summary = gl.eq_principle.prompt_comparative(65 _fetch,66 principle="Bodies must convey the same top news content; ignore minor formatting differences.",67 )6869 @gl.public.write70 def add_host(self, host: str) -> None:71 self._require_owner()72 allowed = json.loads(self.allowed_hosts_json)73 if host not in allowed:74 allowed.append(host)75 self.allowed_hosts_json = json.dumps(allowed)7677 @gl.public.write78 def remove_host(self, host: str) -> None:79 self._require_owner()80 allowed = json.loads(self.allowed_hosts_json)81 allowed = [h for h in allowed if h != host]82 self.allowed_hosts_json = json.dumps(allowed)8384 @gl.public.write85 def check_url(self, url: str) -> None:86 """Runs the host allow-list gate only -- no web call, deterministic.87 Demonstrates the first layer of the fix: reject unknown hosts before88 spending gas on a remote render that would likely diverge anyway."""89 m = re.match(r"^https://([^/]+)(/.*)?$", url)90 if not m:91 self.last_summary = "REJECTED: not https"92 return93 host = m.group(1).lower()94 if host.startswith("www."):95 host = host[4:]96 allowed = json.loads(self.allowed_hosts_json)97 self.last_summary = f"ALLOWED: {host}" if host in allowed else f"REJECTED: {host} not on allow-list"9899 @gl.public.view100 def get_last_summary(self) -> str:101 return self.last_summary102103 @gl.public.view104 def get_allowed_hosts(self) -> str:105 return self.allowed_hosts_json
Call invoked
crawl("https://www.cloudflare.com")strict_eq on Cloudflare-fronted body -> rotating tokens / challenge pages diverge
Call invoked
check_url("https://reuters.com/article/123")host allow-list gate rejects an unknown host -- deterministic first line of defense
On-chain receipts
Knowledge check · M.06
01 / 02
Two questions on this incident. Pick the best answer; the question locks once committed.
Question 01 / 02
Why does Cloudflare break GenLayer consensus?