HypStack / hypvector

hypvector

Vector search over Parquet, no server·v0.2.1·MIT

You already have embeddings, and the only thing you actually want is nearest-neighbor search over them. The usual answer is a vector database: another always-on service to run, secure, and pay for by the hour, sitting warm around the clock whether you query it once a day or ten times a second. hypvector skips that service. It stores the embeddings in a Parquet file in object storage and computes similarity in the client, reading only the bytes a query needs over HTTP range requests, so the Parquet file itself is the index.

That trades a little latency for zero idle cost and nothing to operate, which is the right deal for a static or slow-changing dataset. The file is self-describing: dimension, metric, normalization, and the cluster centroids that drive approximate search all live in the Parquet metadata, so a reader opens it and knows how to search it. The same code runs in the browser and in Node.

$ npm install hypvector

// search a Parquet file on S3, no server in the path
import { searchVectors } from 'hypvector'

const results = await searchVectors({
  source: 'https://example.com/vectors.parquet',
  query: queryVec,  // Float32Array
  topK: 10,
})

1 What it does

hypvector runs similarity search over a set of embeddings stored as Apache Parquet. Each row holds an id and a vector, and the file carries its own configuration in Parquet metadata: the dimension, the metric, whether vectors were normalized on write, and any cluster centroids. Because the file describes itself, you point a reader at it and search without telling it how the data was laid out.

The search runs in the client against the file directly. The source can be a URL, a local path, or a pre-opened buffer, and when it is a URL backed by HTTP range support, hypvector fetches only the pages a query touches rather than the whole object. For exact search it scans the vectors; for approximate search it reads a compact binary signature per vector, narrows to the most promising clusters, and reranks the survivors against their full-precision vectors. Either way there is no server to keep warm: you pay for storage and the per-query reads, and nothing between queries.

2 Quickstart

Write a Parquet index from any iterable of embeddings, then search it. On write, hypvector L2-normalizes by default and records the dimension and metric into the file:

// build the index once
import { writeVectors } from 'hypvector'

await writeVectors({
  writer: fileWriter('vectors.parquet'),
  dimension: 384,
  normalize: true,            // L2-normalize on write
  vectors: myEmbedder(),      // (async) iterable of { id, vector }
})

Then query it from the browser straight off a CDN, with no build step and no backend. Tune the accuracy/bandwidth trade-off with algorithm, probe, and rerankFactor:

// search anywhere, no install
const { searchVectors } =
  await import('https://cdn.jsdelivr.net/npm/hypvector/src/index.js')

const results = await searchVectors({
  source: 'https://example.com/vectors.parquet',
  query: queryVec,            // Float32Array
  topK: 10,
  algorithm: 'auto',         // 'auto' | 'exact' | 'binary'
  probe: 0.25,               // fraction of clusters to scan
  rerankFactor: 10,          // candidate pool = topK x factor
})

To inspect a file from the command line, point the CLI at it and it prints the format version, vector count, dimension, metric, whether a binary signature is present, the cluster count, and the metadata overhead:

$ npx hypvector vectors.parquet

3 Features

A Parquet file is the database. Embeddings live in standard Parquet in object storage. There is no separate index service to run, scale, or pay for between queries.
HTTP range requests. Search reads only the pages a query touches. At 3.2M vectors a query pulls about 6 MB across roughly 160 fetches, not the whole file.
Exact and approximate search. Choose an exact scan, or the approximate path that uses a binary signature plus clusters and reranks against full-precision vectors. algorithm: 'auto' picks based on the file.
Self-describing files. Dimension, metric, normalization, and cluster centroids are stored in Parquet metadata, so a reader knows how to search a file without external config.
Tunable recall. probe sets how many clusters to scan and rerankFactor sizes the candidate pool, so you can move the recall-versus-bandwidth line per query.
Metric choice. Cosine, dot product, or Euclidean. The file records a default metric and a query can override it.
Browser and Node. The same ES module runs in both, over a URL, a local file, or any pre-opened async buffer. Ships TypeScript definitions.

4 API

Three entry points cover the lifecycle: write an index, search it, and read it back. All take a plain options object and resolve to promises or async iterables.

searchVectors(options)

The query path. Takes source (a URL, file path, or pre-opened async buffer), a query Float32Array, and topK. Optional algorithm ('auto', 'exact', or 'binary'), probe (cluster fraction from 0 to 1, or an absolute count), rerankFactor (candidate pool as a multiple of topK; set to 0 for an exact scan), and metric to override the file’s stored metric. Resolves to the ranked matches.

writeVectors(options)

Builds the Parquet index. Takes a writer, the dimension, an (async) iterable of { id, vector } as vectors, and normalize (default true, L2-normalizes on write). It records the configuration into the file metadata so search needs no separate config.

readVectors(options)

Streams the stored vectors back. Takes a file and yields { id, vector } as an async iterable, for re-indexing, export, or inspection.

5 Benchmarks

Semantic retrieval over 3,199,860 real LLM conversations (WildChat-4.8M), run against the same embeddings on every engine and queried over the network the way each is actually deployed. hypvector keeps the index in object storage and computes in the client, so the all-in cost is about $0.32/month with no server idle between queries. The managed and self-hosted engines answer faster because a box stays hot around the clock, which is exactly the cost hypvector removes.

Engine	Storage	Recall@10	Query	All-in / mo	Server
hypvector	13.7 GB	0.925	147 ms	~$0.32	none
Pinecone	13.1 GB	0.920	85 ms	$50 min	managed
turbopuffer	13.1 GB	0.915	198 ms	$16 min	managed
S3 Vectors	13.1 GB	0.905	133 ms	~$0.79	serverless
pgvector	41.9 GB	0.870	80 ms	$372	r5.2xlarge 24/7
Qdrant	13.1 GB	0.865	70 ms	$186	r5.xlarge 24/7

Recall and query latency are close across the table; the column that moves is cost. The always-on engines bill for a warm box whether or not you query it, while hypvector pays for storage and the reads a query makes, which at this scale is a few megabytes per query. For a static or low-traffic index that difference is roughly three orders of magnitude.

Source on GitHub↗ README & docs↗ npm↗