Build an AI Lead Generation Tool (Not Another Scraper)
By Ergini, Software & AI Developer in Pristina, Kosovo
TL;DR
Apollo and ZoomInfo sell lists. A real AI lead-gen system reads signals and prioritizes who is actually buying. Here is the architecture for one - intent detection, enrichment, scoring, drafting - with code and prompts.
Most things sold as an "AI lead generation tool" in 2026 are the same scraper from 2018 with a GPT wrapper on the email step. They still pull 50,000 contacts from a firmographic filter, still hammer the list with mail-merge variations, still kill your domain reputation, and still report a 1.4% reply rate as a win. That game is over. Inbox providers got better, recipients got numbed, and the volume race ended the moment everyone got access to the same tools.
The system that replaces it is signal-first. You stop hunting for who matches your ICP on paper and start hunting for who is showing buying behavior right now - a new VP of Engineering, a Series B, a job posting that mentions your competitor, a stack change visible in BuiltWith. The list shrinks 100x. The personalization gets real. The reply rate jumps an order of magnitude. The same stack that powers VC Automation, my outbound system for venture capital outreach, is what this post walks through. By the end you will have an architecture, real code for each stage, the legal guardrails, and the cost math to build a real one - not another scraper.
Why scrapers killed cold email
Cold email used to work because volume was scarce. In 2018, sending 2,000 personalized-looking emails per week from a single inbox was novel and a 4 to 6% reply rate was normal. Then Apollo, ZoomInfo, Lusha, and twenty others commoditized the data layer. Then the sequencing tools (Outreach, Salesloft, Lemlist) commoditized the sending layer. Then GPT commoditized the "personalization" layer. Every B2B inbox now receives 30 to 80 outbound emails per week, most of which follow the same {first_name} + {company} +{recent_news} template.
The result: average cold email reply rates collapsed from ~4% in 2019 to under 1% in 2026. Inbox providers got smarter - Google and Microsoft now classify based on engagement signals (opens, replies, marks-as-spam) and silently route low-engagement senders to Promotions or Spam without warning. Domain reputation, once a niche concern, became existential. Send 500 emails to an unengaged list and your primary domain's deliverability dies for months.
The new advantage is signal, not volume. If you can find the 50 accounts this week where something specific just happened that makes them a buyer, and you send each one a three-sentence email that references that thing, you bypass the entire deliverability war. You also bypass the prospect's "outbound email" pattern-match, because the email reads like it was written by a person who actually paid attention. That is the system worth building.
What a real AI lead-gen system looks like in 2026
The architecture decomposes into five stages. Every production system I have shipped has the same shape - different tools at each stage, but the same flow:
- Source. Pull raw signals from public data - funding announcements, job postings, hiring patterns, GitHub activity, tech-stack changes, podcast appearances, conference speaker lists.
- Enrich. For each candidate account, fire an LLM research pass that pulls a 200-word company profile, identifies the decision-maker, and extracts intent context.
- Score. Run a structured-output classifier that assigns a 0 to 100 intent score across four dimensions: stack fit, hiring signal, trigger event, ICP match.
- Personalize. Generate a 3-sentence first email that references something real from the enrichment - not a {first_name} mail merge, an actual observation tied to the trigger.
- Route. Confidence-based router: high-intent goes to a human for direct outreach, mid-intent enters an automated nurture sequence, low-intent gets dropped.
Each stage has its own evaluation surface and its own failure modes. The mistake teams make is collapsing this into a single "AI agent that does cold email" - you lose every checkpoint where you could catch a bad lead before it costs you reputation, money, or a salesperson's afternoon. Keep the stages separate. Score them independently.
Sourcing - signal beats volume
The sourcing layer is where most builds quietly become scrapers again. The temptation is to pull a 50K-row Apollo export and call it the source. Resist. Apollo is a fine database for lookup - pulling contact details once you know an account is interesting - but it is not a signal source. Signals come from events: someone hired, funded, posted, switched, launched.
The signals worth chasing depend on your ICP, but here is the menu I work from with clients:
| Signal | Source | What it means |
|---|---|---|
| Funding round | Crunchbase, PitchBook, TechCrunch RSS | Budget unlocked, hiring incoming, vendor decisions in 90 days |
| Key hire (VP/Director) | LinkedIn jobs, The Org, public announcements | New decision-maker likely to evaluate vendors |
| Job posting mentioning competitor | Indeed, LinkedIn, Greenhouse boards | Active use of a tool you replace - direct displacement opportunity |
| Tech-stack change | BuiltWith, Wappalyzer, public GitHub | Active migration - they are already buying in your category |
| GitHub activity | GitHub API, public repo events | Engineering org maturity, dev-tool fit, OSS adoption signal |
| Podcast / conference appearance | Listen Notes, conference programs | Personal warm-open - reference what they said |
| Public price/plan changes | Pricing-page diff trackers, Wayback Machine | Strategic shift - often a procurement signal |
Build vs buy at the source layer is straightforward. Clay is the leader for stitching 50+ signal providers into a no-code canvas - if your monthly Clay spend stays under $2K, do not rebuild it. Apollo is the cheap contact-data layer underneath. Roll your own scrapers only for signals nobody else has - that is the genuine moat. Mine for VC Automation is a tracker for partner content (new podcast episodes, Substack posts, conference talks) tied back to firms; nothing in the market did it the way I needed.
Enrichment with LLM-driven research
Once a signal fires, you need to know enough about the account to write something specific. Off-the-shelf enrichment fields (industry, headcount, revenue) are not enough - they tell you who they are, not what they care about right now. The fix is an LLM research pass: give the model a domain plus the triggering signal, let it browse the public web, and have it return a structured 200-word profile.
The pattern below uses the Vercel AI SDK with a web-search tool. The model gets the company URL, the trigger, and access to search the web. It returns a typed profile with stack, team size, recent moves, and a free-text "intent context" field that gets used downstream for personalization.
// src/enrich.ts
import { generateObject, tool } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
import { webSearch } from "./tools/web-search.js";
const ProfileSchema = z.object({
oneLineDescription: z.string(),
industry: z.string(),
estimatedHeadcount: z.string(),
stack: z.array(z.string()).max(10),
recentMoves: z.array(z.string()).max(5),
intentContext: z
.string()
.describe("Free-text observation tied to the trigger signal."),
decisionMakerHypothesis: z.object({
role: z.string(),
rationale: z.string(),
}),
});
export async function enrichAccount(input: {
domain: string;
signal: { type: string; description: string; url?: string };
}) {
const { object } = await generateObject({
model: openai("gpt-5-mini"),
schema: ProfileSchema,
tools: { webSearch },
maxSteps: 6,
system: `You research B2B companies for a sales team.
You will be given a domain and a triggering signal.
Use web search to gather public info from the company site,
LinkedIn, press, and engineering blogs. Be concrete, never invent.
If you cannot verify a fact, omit it.`,
prompt: `Domain: ${input.domain}
Trigger: ${input.signal.type} - ${input.signal.description}
${input.signal.url ? `Source: ${input.signal.url}` : ""}
Research this company. Focus on what makes the trigger meaningful
and what a salesperson should know before reaching out.`,
});
return object;
}Three things to call out. First, structured output (Zod schema) keeps the downstream stages typed and reliable - if you want the full treatment on structured outputs, see my OpenAI API cost post and the structured-output one. Second, maxSteps: 6 caps the research loop - without it, the model can spiral into 30 tool calls and burn $0.50 per lead. Third, the "intentContext" field is the load-bearing field for personalization downstream. Treat it like a one-sentence elevator pitch the salesperson would write after 5 minutes of LinkedIn stalking.
Intent scoring - beyond firmographics
Firmographic scoring (50-200 employees, SaaS, US-based) is necessary but it tells you who matches your ICP on paper, not who is buying. The scoring layer I ship rates each account on four independent dimensions:
- Stack fit (0-25). Does their current tech stack imply pain you solve? A team running Hubspot + Intercom + a stale knowledge base is a different score than a team running custom everything.
- Hiring signal (0-25). Are they hiring roles adjacent to your buyer? A new VP of Customer Experience is a strong signal for support tooling; a new Head of RevOps is a strong signal for sales tools.
- Trigger event (0-25). How fresh and how relevant is the triggering signal? A Series B announced yesterday outscores a board appointment from 6 months ago.
- ICP match (0-25). Classic firmographic fit - size, industry, geography, business model.
The classifier is a single structured-output LLM call that takes the enriched profile and emits the four sub-scores plus a one-sentence rationale for each. Total score lives in 0 to 100 - the routing layer downstream uses it directly.
// src/score.ts
import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";
const ScoreSchema = z.object({
stackFit: z.object({ score: z.number().min(0).max(25), rationale: z.string() }),
hiringSignal: z.object({ score: z.number().min(0).max(25), rationale: z.string() }),
triggerEvent: z.object({ score: z.number().min(0).max(25), rationale: z.string() }),
icpMatch: z.object({ score: z.number().min(0).max(25), rationale: z.string() }),
});
export async function scoreAccount(profile: Profile, icpDefinition: string) {
const { object } = await generateObject({
model: openai("gpt-5-mini"),
schema: ScoreSchema,
system: `You score B2B accounts for sales prioritization.
Be conservative - most accounts should land in the 30-60 range.
Reserve 80+ for accounts with a fresh, specific trigger AND strong fit.
Always cite evidence in rationale; never invent facts.`,
prompt: `ICP: ${icpDefinition}
Account profile:
${JSON.stringify(profile, null, 2)}`,
});
const total =
object.stackFit.score +
object.hiringSignal.score +
object.triggerEvent.score +
object.icpMatch.score;
return { ...object, total };
}Two non-obvious moves. Rationale strings are mandatory - they let a human spot-check why an account scored what it did, which catches the model's biases inside the first 50 leads. And the "be conservative" instruction is load-bearing: without it, LLMs inflate scores into the 70-90 range for everything, which destroys the signal. Spend an afternoon hand-labeling 30 accounts (high/mid/low) and tune the prompt until the model matches your labels.
Personalization - beyond {{first_name}}
The personalization stage is where the system earns its reply rate. The goal is a 3-sentence email that references something specific from the enrichment - a hire, a fundraise, a stack signal, a podcast episode - and ties it to a concrete value prop. No flattery, no long-form "just wanted to reach out", no AI tells like "I noticed your impressive growth."
I keep a small tone library - three or four voice profiles I have hand tuned for different ICPs - and the generator picks one based on the account profile. Founders get a peer-to-peer tone, enterprise buyers get a measured consultant tone, technical buyers get a no-fluff engineer tone. The structured output forces the model to commit to a specific reference, a value prop, and a single CTA.
// src/personalize.ts
import { generateObject } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";
const DraftSchema = z.object({
subject: z.string().max(60),
specificReference: z
.string()
.describe("One concrete thing from the enrichment we are referencing."),
valueProp: z.string(),
cta: z.enum([
"15-min call",
"share a 2-min loom",
"send a 1-pager",
"intro by mutual",
]),
body: z.string().max(400),
});
const TONES = {
founder: "Peer-to-peer, founder-to-founder, no buzzwords, plain English.",
enterprise: "Measured, consultant-like, respect their time, hint at peers.",
technical: "No fluff, lead with a technical observation, terse.",
};
export async function draftFirstTouch(input: {
profile: Profile;
score: Score;
senderName: string;
product: string;
}) {
const tone =
input.profile.decisionMakerHypothesis.role.match(/founder|ceo/i)
? TONES.founder
: input.profile.decisionMakerHypothesis.role.match(/eng|cto|dev/i)
? TONES.technical
: TONES.enterprise;
const { object } = await generateObject({
model: anthropic("claude-sonnet-4-6"),
schema: DraftSchema,
system: `You write first-touch sales emails for ${input.senderName}.
Product: ${input.product}.
Tone: ${tone}.
Rules:
- 3 sentences max in the body.
- Reference one specific thing from the profile. Never invent.
- One clear CTA. Never multiple asks.
- No flattery, no "hope this finds you well", no "just reaching out".
- Subject line is lowercase, under 6 words, no clickbait.`,
prompt: `Account profile:
${JSON.stringify(input.profile, null, 2)}
Score rationale:
${input.score.triggerEvent.rationale}`,
});
return object;
}Always require human review on the first 50 to 100 drafts before sending. The model will get the tone wrong, get the trigger wrong, or invent a fact roughly 10 to 20% of the time in the first batch. A quick approval queue (Slack, Linear, or a custom dashboard) catches those before they hit a prospect's inbox. See my human-in-the-loop post for the four approval patterns that actually ship.
Routing - high to sales, mid to nurture, low to drop
The router is the dumbest part of the system and also the most important. Given the total intent score, it splits leads into three buckets:
- Score 75+: route to a human. Notify the assigned salesperson in Slack with the profile, the score rationale, and the AI-drafted first touch. They decide whether to send as-is, tweak, or send a custom note. No autosend at this tier.
- Score 45-74: route to a nurture sequence. Add to a 3-touch email sequence in Smartlead or similar. Autosend with caps (30 to 50 emails per inbox per day, 4 to 7 day cadence between touches).
- Score under 45: drop or log for later. Do not email. Log the signal so if the account ever scores higher (new trigger, new hire), it surfaces back into the pipeline.
The mechanics are a router function that reads the score and fires a webhook to Slack, your CRM, or your sequencing tool. Wire it into n8n or Make if you want the no-code version; wire it into a simple TypeScript function if you want the maintainable version. The point is that the routing is rule-based and inspectable - not another LLM call.
Anti-spam guardrails
A perfect lead gen system that torches your domain reputation is worth zero. The deliverability layer is non-negotiable. The defaults I ship with every client:
- Dedicated sending domain. Buy a separate .com (or .co, .io) for cold outbound - never use your primary. If deliverability dies, you swap the domain, not your business.
- Warm for 4 to 6 weeks before real volume. Tools like Smartlead and Instantly automate inbox-to-inbox warmup. Ramp from 5 to 50 emails per day per inbox over the warmup window.
- Send caps per inbox. 30 to 50 per day per inbox is the safe ceiling in 2026. Rotate across 3 to 10 inboxes for any serious volume.
- SPF, DKIM, DMARC properly aligned. Non-negotiable. DMARC at p=quarantine minimum. Tools like easydmarc and Smartlead help; otherwise hire a deliverability person for one week.
- Reply-rate and complaint-rate monitoring. If reply rate drops below 5% on a sequence, pause it. If complaint rate exceeds 0.1%, stop and audit the list.
- Opt-out enforcement. Every email includes a one-click unsubscribe (or a clear "reply STOP"). Honor it inside 24 hours. Suppression list lives in your database, checked before every send.
Smartlead is my default sender. It handles warmup, rotation, send caps, reply detection, and unsubscribe in one stack. Instantly and Lemlist are close competitors. Do not build the sender layer yourself - the deliverability ops surface is full-time work and there are products that already solve it for $97 a month.
The legal angle
Cold outreach is regulated and the regulations differ sharply by jurisdiction. The short version, with caveats:
- United States (CAN-SPAM). B2B cold email is legal if you have a valid physical postal address in the email, an accurate From line, a clear opt-out mechanism, and honor opt-out within 10 business days. CAN-SPAM applies to consumer mail too - there is no B2B exemption - but enforcement against B2B with clean compliance is rare.
- EU and UK (GDPR + ePrivacy). B2B cold email to a work address is generally permissible under GDPR Article 6(1)(f) (legitimate interest) if you document the balancing test, only process minimal data, and provide opt-out. ePrivacy Directive (and PECR in the UK) layers additional consent requirements for some marketing - but B2B-to-business is the easier path. Mailing personal Gmail/Yahoo addresses in the EU without consent is a meaningful risk.
- Article 14 notice (GDPR). When you collect personal data from third-party sources (not from the data subject directly), you owe them a privacy notice - typically a link in the first email. Most B2B senders fold this into the opt-out language.
- Canada (CASL). Express or implied consent required for commercial electronic messages. Stricter than CAN-SPAM. Implied consent for existing business relationships, public business contacts, and inquiries.
- Australia (Spam Act). Consent required (express, inferred, or implied through existing relationship), valid From info, functional opt-out honored within 5 working days.
Practical defaults that keep you compliant in most jurisdictions: mail only role-based or work email addresses (no personal Gmail), include a physical postal address and opt-out in every email, honor opt-out within 24 hours (not the regulatory maximum), suppress unsubscribed contacts globally across all sender domains, and never enrich data on individual consumers - only businesses and their public business roles. If you operate in healthcare, finance, or any regulated industry, get a lawyer to review the sequence - the cost is trivial compared to a regulatory complaint.
Stack: build vs Clay vs Apollo vs Smartlead
Different stages of the pipeline have different build-vs-buy answers. Here is the comparison I run with every client:
| Stage | Tool | Monthly cost | Best for |
|---|---|---|---|
| Contact data | Apollo | $60 to $300 | Cheapest reliable database, decent enrichment API |
| Signal aggregation | Clay | $350 to $2,000+ | No-code signal stitching, 100+ providers in one canvas |
| Custom enrichment | Vercel AI SDK + GPT-5-mini | $50 to $500 | Bespoke profiles, novel signals, deep ICP fit |
| Intent scoring | Custom (this post) | $30 to $200 | Tuned to your ICP - never trust a generic scorer |
| Email drafting | Vercel AI SDK + Claude Sonnet 4.6 | $50 to $300 | Tone control, structured drafts, version control |
| Sending and warmup | Smartlead / Instantly | $97 to $400 | Warmup, inbox rotation, deliverability ops |
| Routing and CRM sync | n8n / Make / custom webhook | $0 to $50 | Rule-based, transparent, dirt cheap |
The default stack I deploy: Apollo for contact data, Clay for signal aggregation (until spend exceeds $2K), custom Vercel AI SDK pipeline for enrichment + scoring + drafting, Smartlead for sending, and a small n8n flow for CRM sync. Total all-in monthly cost for a ~2,000 leads-per-month operation: $800 to $1,500 in tools plus $200 to $500 in LLM tokens. Compare that to a single SDR at $80K fully loaded and the math is obvious - but the system is an SDR multiplier, not an SDR replacement.
Real client case - VC Automation
VC Automation is the outbound system I built for venture capital outreach - finding the right partner at the right firm to pitch a round to. The constraint: partners get hundreds of cold emails per week, so the only way through is hyper-specific personalization tied to something the partner publicly cares about.
The signal sources I built are non-standard. Instead of funding-round firehose, I track partner content: new podcast appearances, Substack posts, tweets that get organic engagement, conference talks. The enrichment step pulls the actual content (transcript for podcasts, full text for posts) and extracts what the partner argued for. The scoring step weights heavily on thesis-fit (does the partner's recent content map to the founder's stage and category) and recency (post from this week vs post from 6 months ago).
The personalization step generates a 3-sentence email that opens with a specific reference to the partner's content ("Your point on consumption-based pricing in {podcast} is exactly why..."), ties it to the founder's thesis in one sentence, and ends with a single CTA - usually "15-min call next week, here is my Calendly." No deck attached, no long pitch, no flattery.
The numbers, anonymized: across the last 6 months, the system generated ~180 first-touch emails per week to scored partner accounts. Average reply rate sits at 18 to 24% (versus a baseline of 1 to 3% for generic VC outreach). Positive reply rate (asking for call or deck) runs 11 to 15%. Meetings booked per week: 18 to 25. The deflection rate - accounts that scored under 45 and got dropped - is roughly 65% of all signal-sourced accounts, which keeps the volume manageable and the inbox reputation pristine.
Cost math
Per-lead economics for a fully loaded qualified lead with enrichment, scoring, and a personalized first draft:
| Stage | Cost per lead | Notes |
|---|---|---|
| Signal sourcing | $0.01 to $0.05 | Clay credits or self-hosted scraper compute |
| LLM enrichment (research) | $0.02 to $0.10 | GPT-5-mini with web search, ~4K tokens in, ~600 out |
| Intent scoring | $0.005 to $0.02 | GPT-5-mini, ~2K tokens in, ~400 out |
| Personalized first-touch draft | $0.01 to $0.10 | Claude Sonnet 4.6, ~2K tokens in, ~300 out |
| Send (warmup + sequencing) | $0.005 to $0.02 | Smartlead amortized over volume |
| Total per qualified lead | $0.05 to $0.30 | Drops to lower bound at scale with prompt caching |
Two scaling tricks cut the cost in half. Prompt caching on Anthropic and OpenAI (the same trick from my OpenAI API cost breakdown) drops input cost by ~90% on the enrichment and scoring stages, which share large stable system prompts. Batch API on OpenAI cuts the per-token rate by 50% if you can tolerate a 24-hour turnaround - fine for nurture leads, not for time-sensitive triggers like fresh funding rounds.
Anti-patterns I see weekly
The same five mistakes show up in every "AI lead gen" project I get asked to audit. None of them are technical - they are product decisions made too early:
- Same template for everyone. The whole point of AI-driven personalization is that each email is different. If you built a system that produces three variants of one template, you built a more expensive Mailmerge. Force the model to commit to a specific reference per email and validate it differs across leads.
- No opt-out enforcement. The unsubscribe list lives in someone's head, gets honored sporadically, and the same contact gets mailed twice. Move suppression into your database, run it as a pre-send check, and audit weekly. One regulatory complaint costs more than the entire engineering bill.
- No human review on day one. Autosend on a new pipeline is malpractice. The first 50 to 100 drafts will have tonal misfires, invented facts, or wrong references that a human would catch in seconds. Approval queue first; relax to autosend once you have ground truth.
- Scoring blind to context. A 90/100 account from 6 months ago is not a 90/100 account today. Trigger freshness must be a decay function in your scoring - drop scores by 30% per month since the trigger event, drop by 60% after 90 days.
- No tracking of what worked. The system spits out drafts, you send them, you have no idea which references drove replies. Log the structured fields (reference type, CTA, tone) and tie them to reply outcome. The system gets sharper every week.
Where this fits in your wider AI stack
A signal-driven lead gen tool is one of three outbound automations that compound when wired together. The others are AI email automation for triage on the inbound side (so the meetings you book do not get lost in an unread inbox), and AI sales automation for the follow-up and pipeline-management layer that turns booked meetings into closed deals.
If you are scoping a lead gen build and want a senior engineer who has actually shipped one, AI workflow automation and AI integration cover this scope end to end. I work with teams worldwide and you can also hire an AI developer in Kosovo directly. Same person who built VC Automation and runs it daily.
Frequently asked questions
What is an AI lead generation tool?
An AI lead generation tool is a system that uses LLMs, web data, and intent signals to find, qualify, and prioritize prospects who are actually likely to buy - instead of scraping a giant list and blasting it. In 2026 the bar is signal-first sourcing (funding rounds, job postings, stack changes), enrichment with LLM-driven research, structured intent scoring, and personalized first-touch drafts that reference something real. The output is a prioritized queue, not a list - high-intent leads route to a salesperson, mid-intent into a nurture sequence, low-intent gets dropped.
How is AI lead generation different from a scraper or Apollo list?
Apollo, ZoomInfo, and scrapers sell volume - you get a list of 50,000 contacts that match firmographic filters and you mail all of them. AI lead generation reads signals to find the 200 accounts inside that list who are actually showing buying behavior right now: a new VP of Engineering, a Series B announcement, a job posting that mentions your competitor, a public GitHub repo using a tool you replace. The list shrinks 100x, the response rate goes up 5 to 20x, and your domain reputation does not get torched.
How much does it cost to build an AI lead generation system?
Per-lead cost lands at roughly $0.05 to $0.30 fully loaded for a qualified, enriched, scored lead with a personalized first draft. Breakdown: $0.01 to $0.05 for sourcing signals (Clay credits, scraper compute), $0.02 to $0.10 for LLM enrichment (web research, profile generation), $0.005 to $0.02 for scoring, and $0.01 to $0.10 for personalized draft generation. Build cost for the whole pipeline ranges from $15K to $60K depending on whether you wire it into Clay, build custom on top of the Vercel AI SDK, or sit it on Smartlead for sending.
Is AI lead generation legal under GDPR and CAN-SPAM?
Yes, but with constraints. In the US, CAN-SPAM allows cold B2B email if you have a valid physical address, a clear opt-out, and accurate sender info. In the EU and UK, GDPR Article 6(1)(f) (legitimate interest) covers B2B outreach to business addresses if you can document the balancing test and honor opt-out within 30 days. Mailing personal Gmail accounts in the EU without consent is risky. The practical rule: only mail role-based or work addresses, always include opt-out, honor it within 24 hours, and never enrich data on consumers - only businesses and their public business contacts.
Build vs Clay vs Apollo vs Smartlead - which one wins?
They solve different problems and the right answer is usually a stack. Apollo provides the contact database. Clay handles signal enrichment and AI-driven research as a no-code canvas. Smartlead handles sending, warmup, and deliverability. Custom code earns its keep when you need a signal Clay does not have, when monthly Clay spend exceeds $2K, or when you want the scoring and routing logic to live inside your own product. My default stack for clients: Apollo or Clay for data, custom Vercel AI SDK pipeline for enrichment and scoring, Smartlead for sending.
What response rates should I expect from an AI lead gen tool?
Mass cold email in 2026 averages 0.5 to 2% reply rate, with most replies being negative. A signal-driven AI lead gen pipeline targeting 200 hand-scored accounts per week typically hits 8 to 25% positive reply rate on the first touch, and 30 to 50% positive on a 3-touch sequence - because every email references something specific (a hire, a fundraise, a stack signal). The volume is much smaller, the conversion much higher, the meetings booked per week roughly the same.
Will AI-generated cold emails hurt deliverability?
Only if you send too many, too fast, from a domain that is not warmed up. The AI-generated part is not the problem - Google and Microsoft do not care that an email was drafted by an LLM. They care about send volume per day, reply rate, spam complaints, and DMARC alignment. The safer pattern: dedicated sending domain (not your primary), warm it for 4 to 6 weeks, cap at 30 to 50 emails per day per inbox, rotate across 3 to 10 inboxes, and monitor reply rate and complaint rate daily. Smartlead and Instantly automate most of this.
Can I use an AI lead gen tool without a salesperson on the receiving end?
Not really. The point of the system is to surface the 5 to 20 accounts per week that need human follow-up - discovery calls, custom demos, contract negotiation. If you have no one to take those calls, the high-intent leads die in a Gmail inbox and the whole pipeline ROI evaporates. The minimum viable team is one founder or AE who can take 10 to 15 booked meetings per week. Below that, focus on inbound, content, or partner channels instead.