feedstock

Chunking

Split content into segments for extraction pipelines.

Chunking strategies split text into bounded segments, useful for feeding content into extraction pipelines or processing large pages in parts.

Strategies

IdentityChunking

Returns the entire text as a single chunk (no splitting).

import { IdentityChunking } from "feedstock";

const chunks = new IdentityChunking().chunk("Full text here");
// ["Full text here"]

RegexChunking

Splits by regex patterns. Defaults to splitting on double newlines (paragraphs).

import { RegexChunking } from "feedstock";

const chunks = new RegexChunking().chunk("Para 1\n\nPara 2\n\nPara 3");
// ["Para 1", "Para 2", "Para 3"]

// Custom separator
const chunks = new RegexChunking([/---/]).chunk("A---B---C");
// ["A", "B", "C"]

SlidingWindowChunking

Splits by word count with overlap between windows.

import { SlidingWindowChunking } from "feedstock";

const chunker = new SlidingWindowChunking(
  500,  // words per window
  50,   // overlap words
);
const chunks = chunker.chunk(longText);

FixedSizeChunking

Splits by character count with overlap.

import { FixedSizeChunking } from "feedstock";

const chunker = new FixedSizeChunking(
  2000, // characters per chunk
  200,  // overlap characters
);
const chunks = chunker.chunk(longText);

Custom Strategy

Extend ChunkingStrategy:

import { ChunkingStrategy } from "feedstock";

class SentenceChunking extends ChunkingStrategy {
  chunk(text: string): string[] {
    return text.split(/(?<=[.!?])\s+/);
  }
}
Edit on GitHub

Last updated on

On this page