Overview
This project addresses a structural inefficiency in B2B outbound sales: the gap between intent signal detection and actionable sales intelligence. Enterprise intent data platforms surface account-level buying signals but leave the last mile — identifying the specific person, confirming ICP fit, and routing to the right rep — to manual effort. At the other end, LinkedIn automation tools offer contact finding but limited intelligence bespoke to a given business, no compliance controls, and no operational resilience.
The LinkedIn intent pipeline occupies the gap between these two categories. It is more actionable than intent data platforms (as it delivers named, scored contacts directly into the client's CRM) and more reliable than point automation tools (running unattended with resumable execution, structured progress tracking, and conservative rate limiting at every external API boundary).
The Challenge
B2B sales development teams spend 11+ hours per rep per week on manual prospecting research — scrolling LinkedIn, cross-referencing job titles, guessing at buyer intent — with no systematic way to distinguish contacts who are actively in-market from those on a stale lead list. Off-the-shelf intent data platforms (Bombora, 6sense, Cognism) cost £2,000–5,000/month and deliver account-level signals that still require human interpretation. The alternative — building an internal automation — typically produces a script that works for three weeks before silently breaking, with no monitoring, no error recovery, and no audit trail.
Prospecting into the client's industry (social housing sector) presents a unique set of difficulties that generic sales intelligence tools are poorly equipped to address:
- Fragmented Market. A set of registered providers, ranging from large groups managing 100,000+ homes to individual non-profits with fewer than 50 units. Many small providers operate under umbrella organisations — making it unclear which entity to target. No single database maps decision-makers across this full landscape.
- Inconsistent Role Titles. The person responsible for asset investment at one housing association may be titled "Director of Asset Management," at another "Head of Property Services," and at a third "Executive Director of Operations." Keyword-based prospecting misses a significant proportion of relevant contacts. The pipeline addresses this with 21 regex patterns covering asset management, compliance, sustainability, decarbonisation, retrofit, digital transformation, and housing strategy roles.
- Intent Opacity. Traditional prospecting identifies who holds a relevant role but not who is actively thinking about the problem the client solves. A Head of Sustainability who commented on an Awaab's Law post yesterday is a fundamentally different prospect from one who has been silent for six months — but conventional tools treat them identically.
- Topic Specificity. The client's sales conversations are anchored to specific regulatory and funding developments — Awaab's Law (damp and mould compliance deadlines from October 2025), the Social Housing Decarbonisation Fund and Warm Homes Plan, tenant satisfaction measures (TPAS), and fuel poverty initiatives. These topics generate active LinkedIn discussion that is invisible to standard lead databases but directly indicates intent and/or discussion readiness.
- Manual Overhead. Without automation, monitoring LinkedIn for engagement on these topics requires a sales rep to manually search for posts, scroll through commenters, check each person's profile, cross-reference against target accounts, and log the result in the CRM — a workflow that does not scale beyond a handful of posts per week.
The Solution
QualitaX designed and operates a managed intent pipeline that inverts the traditional prospecting workflow. Instead of finding companies, then finding contacts, then guessing at intent — the pipeline discovers intent signals first (people actively engaging with topic-relevant LinkedIn posts), retrieves their profiles, enriches them with company and role data, cross-references them against the client's target organisation list, scores them by role relevance and engagement depth, and pushes qualified leads directly into HubSpot with full provenance. The system is designed for continuous weekly operation across multiple days, with every step saving progress incrementally and respecting conservative LinkedIn safety limits.
The Approach: Intent-First Architecture
The pipeline inverts the conventional prospecting funnel. Rather than starting with a target account list and searching for contacts, it starts with live engagement signals and filters backward to ICP-matched contacts at verified organisations.
Five-Stage Pipeline
The system executes five discrete stages in sequence, each with independent progress tracking and error handling. Every stage saves state incrementally after each action — a network error, rate limit, or manual interruption at any point loses at most one action's worth of work.
Stage 1a — People Search (SearchAPI.io). Google search queries find LinkedIn profiles of people using four role keywords (asset, compliance, sustainability, carbon). This produces 6,316 queries (1,579 orgs × 4 keywords), executed over multiple days at a configurable daily limit (default: 2,000 queries/day). Each query is structured as site:linkedin.com/in "{Organisation}" "{keyword}" -intitle:hiring -intitle:apply. Progress is saved after every 25 queries.
Stage 1b — Topic Post Discovery (SearchAPI.io). Four curated Google search queries discover recent LinkedIn posts discussing Awaab's Law, SHDF/retrofit funding, tenant satisfaction, and fuel poverty. Each query is restricted to site:linkedin.com/posts with topic-specific keywords, date-filtered to the past month, and paginated up to three pages. This stage typically completes in under a minute with approximately 12 API calls.
Stage 2 — Engager Extraction (PhantomBuster). The Post Commenter and Liker Scraper extracts every person who engaged with the discovered posts. This is the pipeline's core insight: engagement with a topic-relevant post is a direct intent signal. A compliance manager who commented on an Awaab's Law post with specific concerns about damp remediation timelines has publicly demonstrated both domain expertise and active problem awareness — a categorically different signal from a name on a purchased list.
The extraction step is operationally complex. The PhantomBuster Post Commenter and Liker Scraper is a multi-agent orchestrator comprising four internal agents: a master coordinator, a post extractor, a commenter worker, and a liker worker. Processing a single post requires a six-step orchestration sequence: save the post URL to the master config, launch the post extractor and wait for completion, launch the master to process commenters, launch the liker worker, launch the master again to combine results, then download the result CSV from S3 storage. Each post takes 3–5 minutes to process.
Stage 3 — Profile Enrichment (PhantomBuster, Two-Pass). A two-pass strategy minimises expensive LinkedIn Profile Scraper calls. Pass A (free, instant) parses the occupation field returned by the engager scraper — patterns like "Director of Asset Management at ABC" — to extract company names using separator detection (at, -, |). Pass B (PhantomBuster) enriches only the profiles where company extraction failed, at a rate of 40 profiles per day during UK business hours (9am–5pm) with randomised 15–45 second delays between actions. This two-pass approach saves approximately 40–50% of enrichment API calls.
Stage 4 — Cross-Reference and Score (Python). Enriched profiles are matched against a target list using a three-tier fuzzy matching algorithm: exact match, substring match, and suffix-stripped match. Job titles are evaluated against 21 regex patterns covering the client's ICP roles. A composite intent score is calculated:
- Works at a target organisation: +10 points
- Job title matches target role pattern: +10 points
- Commented on a topic post: +5 points
- Liked a topic post: +3 points
- Also found in Stage 1a people search: +5 points
- Engaged with multiple topic posts: +3 per additional post
Stage 5 — CRM Delivery (HubSpot). Qualified leads (score ≥ 10, matched organisation required) are pushed to HubSpot via the CRM API. Six custom contact properties are auto-created on first run: intent_topic (dropdown), intent_last_signal_date (date), intent_interaction_type (dropdown: commented/liked/both), intent_signal_detail (post snippet or comment text, truncated to 200 words), intent_score (number), and intent_source (text label). Smart update logic prevents overwriting existing standard fields and only updates intent scores upward — a contact whose score was 20 last week will not be downgraded to 15 this week.
Operational Controls
The pipeline is built with operational discipline that separates it from typical prospecting automation:
- Resumable execution at every stage. Every step saves progress after each action. The people search saves after every 25 queries. The engager scraper saves after every post. The enrichment step saves after every 5 profiles. Re-running the same command after any interruption — network error, rate limit, cookie expiration, manual kill — picks up exactly where it left off. No work is repeated, no data is lost.
- Conservative LinkedIn rate limiting. PhantomBuster operations are governed by five safety parameters: 40 profile views per day, 50 total actions per day, UK business hours only (9am–5pm), randomised delays of 15–45 seconds between actions, and a maximum of 10 posts per scraping run. These limits are deliberately well below LinkedIn's thresholds. A full pipeline run is designed to span multiple days — the system optimises for sustainability over speed.
- PhantomBuster orchestration resilience. The multi-agent scraping sequence handles the specific failure modes discovered during development: 429 parallel-execution limits (retry with exponential backoff, 15s × attempt number), agent status polling via the correct endpoint (
agents/fetch-output, notagents/fetchwhich returns stale data), multiple S3 download URL patterns with automatic fallback, and output log parsing to extract the actual result URL when CDN endpoints fail. - Configurable query budgets. SearchAPI.io usage is capped per run via the
--daily-limitflag (default: 2,000). The pipeline tracks query consumption across all stages and halts gracefully when the budget is exhausted, saving full progress for the next run. This prevents accidental overspend.
Key Benefits and Results
The pipeline delivers five concrete outcomes that manual prospecting and generic intent platforms cannot match:
- Intent-first lead generation. Every contact delivered to HubSpot has demonstrably engaged with a topic the client sells into — not inferred from website visits or firmographic signals, but directly observed as a comment or like on a specific LinkedIn post. The contact record includes what they engaged with, what they said (if they commented), and which topic it maps to. This enables hyper-personalised outreach: "I saw your comment on the Warm Homes Plan discussion about fabric-first retrofit — at ABC we're working with housing associations on exactly this."
- Continuous ICP enforcement. The 21-pattern regex filter and three-tier organisation matching apply identical qualification criteria to every contact, every run. ICP drift — the gradual loosening of qualification standards that degrades manual prospecting quality over time — is structurally eliminated.
- Operational resilience. The pipeline is designed to run unattended over multiple days, resuming from exactly where it left off after any interruption. Progress is saved incrementally. Rate limits are enforced conservatively. Every external API interaction (SearchAPI.io, PhantomBuster, HubSpot) has independent error handling. An expired LinkedIn cookie — the most common and hardest-to-detect failure mode — produces zero results rather than corrupted data, and the pipeline logs the anomaly rather than pushing empty batches.
Scaling and Operational Outlook
The pipeline is designed for continuous weekly operation. A recommended two-tier architecture combines Tier 1 (the automated intent pipeline, answering "who is actively talking about our topics right now?") with Tier 2 (LinkedIn Sales Navigator for direct prospecting, answering "who are the decision-makers at target org X?"). Sales Navigator handles people-by-role search more effectively than Google scraping, eliminates the 6,316-query people search entirely, and provides native HubSpot integration — while the intent pipeline provides the unique signal that no off-the-shelf tool delivers.