feedstock

Introduction

High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright.

Feedstock is a TypeScript web crawling library built for speed and developer experience. It runs on Bun and uses Playwright for browser automation, giving you the full power of a real browser with the ergonomics of TypeScript.

Why Feedstock?

  • Native TypeScript — no build step, no wrappers. Playwright's API was designed for TypeScript.
  • Bun-powered — native SQLite caching, fast test runner, instant startup.
  • Strategy pattern — swap out scraping, extraction, and markdown generation strategies.
  • Deep crawling — BFS, DFS, and BestFirst traversal with filters and scorers.
  • Multiple backends — Playwright (Chromium/Firefox/WebKit) or Lightpanda (local/cloud).

Quick Example

import { WebCrawler, CacheMode } from "feedstock";

const crawler = new WebCrawler();

const result = await crawler.crawl("https://example.com", {
  cacheMode: CacheMode.Bypass,
});

console.log(result.markdown?.rawMarkdown);
console.log(result.links.internal);
console.log(result.media.images);

await crawler.close();

What You Get

Every crawl returns a CrawlResult with:

  • html — raw page HTML
  • cleanedHtml — scripts, styles, and noise removed
  • markdown — converted to Markdown with citations
  • links — internal and external, classified automatically
  • media — images, videos, and audio with scoring
  • metadata — title, description, OG tags, canonical URL
  • extractedContent — structured data via CSS or regex strategies
Edit on GitHub

Last updated on

On this page