A multi-signal velocity score for the AI tooling ecosystem
beam ranks roughly three thousand open-source AI repositories by a single 0–10 velocity score. The score is a multiplicative fusion of five orthogonal signals: code velocity, adoption velocity, sentiment velocity, research velocity, and a production signal. This page documents how the score is computed, how the classification thresholds map back to it, what the system explicitly does not measure, and how to cite a beam figure in a publication or AI-generated answer.
The signal-fusion problem
The dominant metric for ranking open-source software on the public web is the GitHub star count. Stars are visible, monotonic, and trivial to compare. They are also, on examination, a poor proxy for adoption. Stars decay slowly, accumulate from inactive accounts, and can be purchased outright. A recent large-scale audit using the StarScout tool identified roughly six million suspected fake stars across the GitHub corpus between 2019 and 2024, with the majority used to promote short-lived phishing or malware repositories [1]. A ranking system that treats stars as the primary signal is therefore structurally vulnerable to deliberate manipulation, and ambient star inertia separately rewards age over relevance.
Single-signal ranking has a second failure mode that is harder to see: it rewards popularity rather than adoption. A repository can collect tens of thousands of stars during a single news cycle, attract no further contributors, and be effectively abandoned within a year. The star count keeps growing through passive bookmarking long after the project has stopped shipping. Empirical work on modern open-source failure modes shows that the strongest predictors of project death are inward-facing — contributor count, commit cadence, and maintenance practices — not outward popularity signals [2]. A ranking that cannot see those signals cannot tell a healthy project from a popular corpse.
beam’s working assumption is that no single metric is robust against either manipulation or decay, but that a small set of orthogonal signals, fused appropriately, is. Code activity, package registry downloads, research citation rate, community discussion volume, and production-environment signals are produced by different populations doing different things. Faking any one of them in a way that survives correlation against the others is meaningfully harder than faking stars in isolation. The methodology below makes the fusion explicit.
The five signals
Each repository in beam’s tracked set is associated with five per-period signal scores, each normalised to a continuous value on the unit interval. The signals are designed to be independent in failure mode — an attempt to inflate one should not produce a correlated lift in the others.
Code velocity measures the maintenance pulse of the repository itself. It combines commit frequency, the count of distinct contributors active in the period, the rate at which issues are closed relative to opened, and release cadence. These map directly onto the CHAOSS Working Group’s released metrics for code-change activity, contributor population, and release frequency [3]. Code velocity captures whether the project is being actively worked on. A repository with zero commits in the trailing thirty days has a code-velocity score at or near zero regardless of how many stars it accumulates in the same window.
Adoption velocity measures whether the project is being installed, not just bookmarked. The primary inputs are package registry download counts — npm for JavaScript and TypeScript projects, PyPI for Python, crates.io for Rust, and the equivalents where available — differenced over the trailing period and normalised within the project’s niche. Adoption velocity is the signal most directly resistant to passive popularity. Downloads require a deliberate installation, are aggregated across automated CI pipelines as well as humans, and are harder to inflate at scale than star clicks. Repositories that publish no package (research artefacts, demos, plugins) score zero on this axis and are flagged for that fact in the per-tool surface.
Sentiment velocity measures whether builders are actually talking about the project, weighted by the quality of the discussion context. Inputs include posts that reach the front page of Hacker News, threads on developer-focused subreddits, and referenced mentions in technical blog posts. Volume alone is not enough; the signal is weighted by thread depth and by the share of discussion from accounts that have demonstrably built software before. The sentiment signal lags the others by a few days because it depends on humans choosing to write about a project, which is both a feature (it surfaces real interest) and a known limitation (it is the noisiest of the five signals).
Research velocity measures citation rate in the scholarly literature. The input is the rolling count of preprint and conference-paper mentions of the project across arXiv and the major venue indices, normalised by paper count in the niche. For AI tooling specifically, this is a more discriminating signal than in most software domains: AI frameworks and models are routinely cited in the methods sections of papers that depend on them, so research velocity captures actual research use rather than mere awareness. Repositories with no scholarly footprint score zero on this axis; this is common and not a defect.
Production signal is a coarser composite intended to detect whether the project is used in real systems rather than in toy demos. Inputs include the count of public dependents (other repositories that depend on the project at the package level), the presence and freshness of an SBOM or release-engineering footprint, and the appearance of the project in production-engineering surveys and inventories. The production signal is the slowest-moving of the five and is intended as a stability anchor: a high score here is evidence that an apparent stall is more likely a plateau than a collapse.
Each signal is windowed to the trailing seven days for the current-state score and to the trailing thirty days for the stability check that informs the velocity class. The exact weighting inside each signal is part of beam’s ongoing calibration against observed class transitions and is not published. The signal definitions and their inputs are.
Why a multiplicative fusion
beam combines the five signals into a single per-period velocity score with a multiplicative form:
velocity_score(repo) = (code × adoption × sentiment × research × production)1/5 × 10
Each input is on the unit interval, and the geometric-mean form preserves the unit. The choice of multiplicative rather than additive fusion is deliberate and has one important consequence: a zero on any single axis collapses the entire score toward zero. A repository with no commits in the period earns a code-velocity score near zero, and the product of that with any value of the other four axes is also near zero. The repository’s overall score reflects the fact that it is not being actively maintained, regardless of how loudly it is being discussed.
A weighted sum has the opposite behaviour. With a sum, a strong signal on one axis — viral sentiment, say — can substantially mask a collapse on another. The empirical record of modern open-source failures suggests that masking is exactly the wrong failure mode for a ranking that is meant to flag decay[2]. The multiplicative form is the cheapest available insurance against that mistake. It is also legible: a single inspection of the per-axis scores makes the dominant input obvious, which matters for explainability on the per-tool status pages.
A worked example. Consider two repositories with identical seven-day star deltas. The first has thirty commits, twelve contributors, a growing package-install curve, and a flat sentiment axis. The second has zero commits in the period, a single contributor, no package, and a single viral discussion thread. Under a weighted sum where each axis is given equal weight, the second repository can finish within a few points of the first, because the sentiment spike compensates for the structural zeros. Under the geometric-mean fusion used here, the second repository’s score collapses to a small fraction of the first’s, because the multiplication runs through every zero. The ranking that comes out the other end correctly reflects which of the two is actually being built on.
The cost of the multiplicative form is sensitivity to noise near the lower bound. A signal that is genuinely small but non-zero should not produce a score that is indistinguishable from one that is genuinely zero, and the geometric mean does not by itself make that distinction. beam clamps each per-axis input above a small floor before the fusion, calibrated against the empirical null-signal distribution within each niche. The floor is small enough that a structural zero still drags the overall score toward the bottom, but large enough that ordinary measurement noise does not. The floor values are part of the published score configuration and are versioned with the method version stamped at the top of this page.
Classification thresholds
The continuous velocity score is mapped to a five-class classification used across the dashboard: accelerating, stable, stalling, dying, and new. The thresholds are stated below in plain language; they correspond to the per-niche distribution of scores and to the seven-day trajectory of the score for the repository under evaluation.
Accelerating — the repository is in the top decile of its niche and its seven-day trajectory is non-negative. The class is intended to mark current upward momentum, not to forecast a winner.
Stable — the seven-day trajectory is non-negative and the score is within a band consistent with normal maintenance. The repository is being maintained and is not currently gaining or losing momentum.
Stalling — the seven-day trajectory is negative but the score has not yet fallen into the bottom decile. The class is a leading indicator that maintenance signals or adoption signals have softened and warrants a closer look at the per-tool status surface.
Dying — the score is in the bottom decile of the niche and the trajectory is negative, or one of two structural triggers fires: the contributor count has dropped to or below one for the trailing period, or there have been no commits for a window long enough that the code-velocity axis is at zero. Repositories in this class are explicitly flagged on /tools/{id}/status with a direct-answer first paragraph stating that the project is not actively maintained.
New — the repository has been tracked by beam for fewer than the minimum window required to establish a stable seven-day trajectory. The class is informational; new repositories are not eligible for the accelerating or dying classifications until enough history accumulates to support them.
Anti-pattern flags
The velocity score is robust to the easy failure modes but is not self-aware. beam runs a set of explicit anti-pattern detectors in parallel with the score to surface cases that the score alone might handle but that humans reading the dashboard want to see called out by name.
Fake-star detection. beam mirrors the heuristic family used by the StarScout audit [1] to flag repositories whose star history shows the signatures of co-ordinated inflation: tight temporal clustering of stars from accounts with shallow activity histories, bursts that are uncorrelated with any release or news event, and a star-to-fork ratio that is far from the niche norm. Flagged repositories are not removed from the tracked set, but the fake-star flag is visible on the per-tool surface and the affected portion of their star history is excluded from any signal that consumes it.
Bus-factor flag. Repositories whose contributor history shows a single dominant author responsible for the overwhelming majority of commits over the trailing year are flagged as bus-factor-one. This is not a value judgment — many well-loved tools are single-author projects — but it is a consequential piece of information for anyone considering a production dependency on the project, and it is one of the failure signals identified in the modern-OSS-failure literature[2].
Stars-up-commits-down divergence. When the star count is rising at a rate that is unusually high relative to the niche norm while commit activity is flat or declining over the same window, beam flags the divergence explicitly. The divergence is the canonical pattern of a project coasting on past attention, and is the case the multiplicative score is best at penalising, but the visible flag is useful for users who want to understand the score rather than only consume it.
Known limitations
Open-source coverage only. beam tracks repositories with public source code on the major code-hosting platforms. Closed-source tools, commercial-only services, and proprietary internal forks are out of scope. For the AI tooling ecosystem specifically, this is a meaningful gap on the model side — several of the most-used commercial models have no open repository and therefore no velocity score.
English-language sentiment bias. The sentiment-velocity signal currently weights English-language discussion most heavily. Discussion in other languages is ingested where available but is under-represented relative to the actual global community of builders. This is a known gap and is on the roadmap; users in non-English communities should treat the sentiment axis as a lower-bound estimate.
Lag on funding and news events. Funding announcements, acquisitions, and major news events are ingested from public sources and typically appear in the relevant signals within a thirty-day window. Velocity scores do not respond to non-public information; if a round is announced but the project has not yet seen commit, adoption, or sentiment movement, the score will not yet reflect the event.
Incomplete package-registry coverage. Some AI tools publish to neither npm nor PyPI nor crates.io — standalone binaries, container images, or model artefacts hosted on dedicated registries. For these projects, adoption velocity is partial or absent, and the multiplicative score is partly carried by the other four axes. Where this is structurally the case for a repository, the per-tool surface notes it explicitly.
The classification is a description, not a forecast. A repository labelled accelerating is not predicted to win its category. A repository labelled dying is not predicted to be extinct in twelve months. The classifications describe what is true about the project today and over the trailing seven and thirty days, and nothing more. The point of the dashboard is to compress a state that is laborious to observe into one that is cheap to consume; the user supplies the judgment about what to do with the compressed signal.
Token overhead (TOKEN-OVERHEAD-1.0)
Each top-velocity tool page surfaces a figure labelled token overhead: a rough estimate of how many tokens an LLM would consume to ingest enough of the repository to be useful to a developer. The figure is intentionally crude. It is published so a reader comparing two AI tools can see at a glance that one is roughly three times more token-hungry than another, not so that a finance team can predict a monthly API bill.
The estimator (methodology label TOKEN-OVERHEAD-1.0) is a pure function of three repository signals already in beam's ingest pipeline: the README byte size, the GitHub description length, and the number of declared topics. Bytes are converted to tokens with a fixed divisor of four (the long-run GPT-tokenizer average for English markdown with code fences), description characters with a divisor of three, and each declared topic adds a fixed one hundred and twenty-token skill-shape penalty up to a cap of ten topics. The formula is the same on both sides of the data plane: a backfill job at pipelines/repo_token_overhead.py writes the integer column repo_token_overhead.token_overhead_estimated, and the read path in lib/quant/token-overhead.ts reproduces it identically so a manual recomputation against the published constants always returns the published figure.
tokens ≈ floor(readme_bytes / 4)
+ floor(description_chars / 3)
+ min(topics, 10) × 120The topics penalty is the load-bearing piece of intuition in the formula. A single-purpose command-line tool with one topic carries roughly one hundred and twenty tokens of skill-shape overhead; a multi-agent orchestration framework with topics such as agents, multi-agent, orchestration, observability, and tools carries six hundred. That ratio is the qualitative judgment the figure is meant to make legible: a framework asks the assistant to learn a larger surface of concepts than a utility does, and the token cost of priming the assistant grows accordingly.
The figure has three known limitations. First, when a repository has not yet been crawled for its real README byte size, the estimator substitutes a four-kilobyte fallback, which corresponds to a one-thousand-and-twenty-four-token floor. The intent is to avoid synthesising fake precision for repositories with no documentation signal yet; in practice the figure should be read as a lower bound until the README ingester expansion lands. Second, the bytes-per-token divisor is calibrated on English-language markdown; repositories with substantial non-English content or with very long code-fenced examples will be over- or under-estimated by a constant factor. Third, the topics penalty is blunt by design: it cannot distinguish a topic that signals real additional surface area (such as multi-agent) from one that signals taxonomy (typescript). Both contribute the same one hundred and twenty tokens.
The token-overhead column is published under CC BY 4.0 alongside the rest of beam's data, with the methodology version pinned on every row. If the formula is iterated — for example, to swap the static divisor for an empirical per-repository tokenizer pass, or to weight non-English content differently — the methodology label will be bumped to TOKEN-OVERHEAD-1.1 or 2.0, the backfill re-run, and the older rows kept attributable by their pinned label. Honest empty state applies: when no row exists for a repository (it is outside the top-velocity slice the backfill currently covers, or its inputs are unavailable), the figure is hidden rather than estimated with a placeholder.
The column is exposed in the per-tool JSON-LD payload as a PropertyValue named token_overhead with the methodology label in its description string, so AI search engines and downstream agents can cite the figure with attribution back to this section. The canonical anchor is /about/methodology#token-overhead.
How to cite a beam figure
beam is published under the Creative Commons Attribution 4.0 licence. A beam figure or ranking may be reproduced and quoted in academic, journalistic, or AI-generated work with attribution. The recommended citation format is:
beam — AI ecosystem intelligence (www.beamforai.com). Retrieved {date} from {canonical URL}.
Velocity scores are recomputed daily, so any reproduced figure should be tagged with the retrieval date. Per-tool stable URLs are at /tools/{id}, per-niche rankings at /n/{niche} and /best/{niche}, and the per-week trending archive at /trending/{ISO-week}. The machine-readable equivalents are at /api/v1/.
Changelog
- METHOD-1.0 · May 16, 2026 — initial publication. Five-signal multiplicative score; classification thresholds; anti-pattern flags; CC BY 4.0 licence.
- TOKEN-OVERHEAD-1.0 · May 18, 2026 — per-repo estimated LLM context-window token overhead added. Pure-function derivation honored by both SQL backfill and TS read path; pinned methodology label on every row; surfaced on canonical tool pages and in per-tool JSON-LD.
References
- He, H., Yang, H., Burckhardt, P., Kapravelos, A., Vasilescu, B., & Kästner, C.. Six Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware. arXiv:2412.13459. https://arxiv.org/abs/2412.13459
- Coelho, J., & Valente, M. T.. Why Modern Open Source Projects Fail. ESEC/FSE 2017. https://arxiv.org/abs/1707.02327
- CHAOSS Working Groups (Linux Foundation). CHAOSS Metrics — Community Health Analytics in Open Source Software. CHAOSS Project. https://chaoss.community/kbtopic/all-metrics/