🚀 Meteor 3.5 (RC Available): Change Streams & Performance improvements

nachocodoner · June 4, 2026, 12:53pm

filipenevola · June 11, 2026, 12:06pm

I opened an issue with production evidence from our 3.5-rc.1 rollout:

Meteor 3.5-rc.1 Change Streams caused production method/pub latency regression; pinning to oplog/polling restored normal performance

opened 12:05PM - 11 Jun 26 UTC

## How this report was produced This report was assembled with Codex using: - …Monti APM data queried through the open-source [`quavedev/montiapm-mcp`](https://github.com/quavedev/montiapm-mcp) MCP server. - Quave ONE environment/deployment metadata queried through the [Quave ONE MCP](https://docs.quave.one/mcp/). - MongoDB profiler and `currentOp` data queried from a [Quave ONE](https://quave.one) managed MongoDB deployment through remote access over VPN. Sensitive values such as connection strings, credentials, hostnames, and raw profiler documents are intentionally omitted or redacted. ## Summary After upgrading a production Meteor app from the previous Meteor 3.x line to `METEOR@3.5-rc.1`, we observed a large production regression in Meteor method and publication response times. The regression started immediately after the `3.5-rc.1` deployment. The strongest evidence points to the new Mongo Change Streams observer path: - The same app recovered after explicitly pinning Mongo reactivity to `["oplog", "polling"]` and restoring `MONGO_OPLOG_URL`. - Before mitigation, MongoDB `currentOp` showed hundreds of active `$changeStream` cursors. - During a later reconnect/restart spike, MongoDB profiler captured slow `$changeStream` / `getMore` operations with `fullDocument: "updateLookup"` and `fullDocumentBeforeChange: "whenAvailable"`. - After mitigation rollout completed, `currentOp` showed `0` active Change Stream cursors and Monti APM response times returned to normal. We do not yet have a minimal reproduction. This report is production evidence from an app running during the `3.5-rc.1` phase so the Meteor team can investigate likely failure modes before stable release. This may be related to, but is not identical to, https://github.com/meteor/meteor/issues/14452. That issue reports a write fence stuck after a Change Stream observer stops. Our symptom was broader production latency and Change Stream cursor/slow-profile growth, not a confirmed permanent method hang. ## Environment - Meteor release: `METEOR@3.5-rc.1` - Mongo package: expected `mongo@2.4.0-rc350.1` from the RC - MongoDB server: `8.2.0` - Mongo deployment: replica set, 3 members - App workload: production Meteor app with DDP methods and many publications - Monitoring: Monti APM, MongoDB profiler, MongoDB `currentOp` - Profiler status during investigation: enabled, `slowms: 100`, `sampleRate: 1` The application has many account-scoped publications and operational dashboard publications. Some high-cardinality or frequently-changing collections were involved; their real collection names are omitted below and replaced with stable `collection-*` identifiers. ## Timeline All times are UTC. - `2026-06-10T21:14:19.854Z`: `METEOR@3.5-rc.1` deployment went live. - `2026-06-10T21:14:19.854Z` to `2026-06-10T23:14:19.854Z`: immediate post-deploy window showed a major latency regression in Monti APM. - `2026-06-11T04:18:00Z`: Mongo profiler captured a spike of slow Change Stream operations. - `2026-06-11T11:32:28Z`: mitigation applied by restoring `MONGO_OPLOG_URL` and pinning reactivity to `["oplog", "polling"]`. - `2026-06-11T11:34:00Z` to `2026-06-11T11:40:00Z`: post-mitigation Monti/Mongo window was normal. ## Configuration change that mitigated the issue The production app was mitigated with the documented rollback configuration from the Meteor 3.5 RC announcement: ```json { "packages": { "mongo": { "reactivity": ["oplog", "polling"] } } } ``` We also restored `MONGO_OPLOG_URL`, which had existed before the `3.5-rc.1` deployment. ## Monti APM evidence Method, publication, and HTTP endpoint names are included because they are already visible to clients through normal DDP/browser tooling. Mongo collection names and infrastructure details are anonymized with stable `collection-*` identifiers because they come from server-side production profiling data. ### Before/after summary This table compares the `pre_4h` window before `METEOR@3.5-rc.1` went live with the first `post_2h` window after the deployment, when Change Streams were active. These are weighted averages across the breakdown entries returned by Monti APM for each surface. The weighting uses each entry's throughput, so higher-volume methods/publications/endpoints contribute more than rare calls. This is useful as an overall app-level signal, but the per-method and per-publication tables below are more useful for debugging specific shapes. | Surface | Before `3.5-rc.1` / oplog | After `3.5-rc.1` / Change Streams | Absolute increase | Percent increase | | --- | ---: | ---: | ---: | ---: | | Meteor methods | `1726ms` | `7073ms` | `+5347ms` | `+310%` | | Meteor publications | `884ms` | `1415ms` | `+531ms` | `+60%` | | HTTP/API endpoints | `110ms` | `224ms` | `+114ms` | `+104%` | After pinning the same app back to `["oplog", "polling"]`, the first clean post-mitigation window dropped to `38ms` for methods, `116ms` for publications, and `14ms` for HTTP/API endpoints. ### Named method before/after comparison This table compares method names that appeared in both the `pre_4h` and `post_2h` Monti breakdowns. | Method | Before | After | Absolute change | Percent change | Throughput before | Throughput after | | --- | ---: | ---: | ---: | ---: | ---: | ---: | | `appEnvApplyChanges` | `488ms` | `120490ms` | `+120002ms` | `+24616%` | `0.01/min` | `0.04/min` | | `appEnvSave` | `2925ms` | `88103ms` | `+85178ms` | `+2912%` | `0.02/min` | `0.03/min` | | `logs.startStreamSession` | `5037ms` | `12907ms` | `+7869ms` | `+156%` | `0.20/min` | `0.58/min` | | `checkCnameRecords` | `2369ms` | `5331ms` | `+2962ms` | `+125%` | `0.20/min` | `0.12/min` | | `integrations/checkGitHubAccessToken` | `3717ms` | `4214ms` | `+497ms` | `+13%` | `0.19/min` | `0.68/min` | | `login` | `1076ms` | `3600ms` | `+2524ms` | `+235%` | `0.94/min` | `0.59/min` | | `changeAnonymousToUserId` | `1449ms` | `2182ms` | `+734ms` | `+51%` | `0.88/min` | `0.54/min` | | `addEvent` | `1549ms` | `1752ms` | `+203ms` | `+13%` | `0.53/min` | `0.52/min` | | `getAppEnvCname` | `617ms` | `1124ms` | `+506ms` | `+82%` | `0.23/min` | `0.13/min` | | `redeployAppContent` | `177ms` | `858ms` | `+682ms` | `+386%` | `0.01/min` | `0.02/min` | | `getDashboardUrls` | `53ms` | `36ms` | `-16ms` | `-31%` | `0.01/min` | `0.10/min` | Methods that appeared as slow only in the `post_2h` top breakdown: | Method | After | Throughput after | | --- | ---: | ---: | | `getLogs` | `8583ms` | `0.02/min` | | `clearScroll` | `1802ms` | `0.01/min` | ### Named publication before/after comparison This table compares publication names that appeared in both the `pre_4h` and `post_2h` Monti breakdowns. | Publication | Before | After | Absolute change | Percent change | Sub rate before | Sub rate after | | --- | ---: | ---: | ---: | ---: | ---: | ---: | | `clusterPoolPower` | `1535ms` | `3196ms` | `+1662ms` | `+108%` | `0.04/min` | `0.02/min` | | `currentAccountData` | `1244ms` | `3140ms` | `+1896ms` | `+152%` | `1.62/min` | `3.08/min` | | `userAppsAndEnvs` | `1522ms` | `2844ms` | `+1321ms` | `+87%` | `0.57/min` | `0.48/min` | | `userApp` | `1486ms` | `1991ms` | `+506ms` | `+34%` | `0.74/min` | `0.57/min` | | `userAccounts` | `619ms` | `1623ms` | `+1004ms` | `+162%` | `1.16/min` | `0.61/min` | | `accountAppEnvContentData` | `1438ms` | `1589ms` | `+151ms` | `+10%` | `0.70/min` | `0.51/min` | | `userData` | `751ms` | `1536ms` | `+784ms` | `+104%` | `1.16/min` | `0.66/min` | | `contentsCount` | `612ms` | `1513ms` | `+901ms` | `+147%` | `0.70/min` | `0.51/min` | | `agent.llmProviderKeys.byContext` | `1092ms` | `1487ms` | `+395ms` | `+36%` | `0.57/min` | `0.48/min` | | `currentAppContentAndCurrentStatus` | `646ms` | `1311ms` | `+665ms` | `+103%` | `12.90/min` | `14.61/min` | | `meteor_autoupdate_clientVersions` | `628ms` | `1294ms` | `+666ms` | `+106%` | `0.98/min` | `0.63/min` | | `meteor.loginServiceConfiguration` | `628ms` | `1294ms` | `+666ms` | `+106%` | `0.98/min` | `0.63/min` | | `accountNotifications` | `739ms` | `1177ms` | `+437ms` | `+59%` | `1.44/min` | `0.84/min` | | `latestAppEnvMetrics` | `1021ms` | `1115ms` | `+94ms` | `+9%` | `12.90/min` | `14.61/min` | | `myClusterPoolAccess` | `474ms` | `915ms` | `+441ms` | `+93%` | `1.42/min` | `0.82/min` | | `appEnvDeploymentPods` | `2943ms` | `1418ms` | `-1525ms` | `-52%` | `0.20/min` | `0.24/min` | | `userAppEnv` | `1735ms` | `1335ms` | `-400ms` | `-23%` | `0.74/min` | `0.52/min` | | `userAccountSecrets` | `1903ms` | `1204ms` | `-699ms` | `-37%` | `0.45/min` | `0.29/min` | Publications that appeared as slow only in the `post_2h` top breakdown: | Publication | After | Sub rate after | | --- | ---: | ---: | | `userAppEnvs` | `4638ms` | `0.03/min` | | `accountRegionsBillingMonth` | `2929ms` | `0.03/min` | | `accountBillingMonth` | `2909ms` | `0.03/min` | | `userAccount` | `2900ms` | `0.03/min` | | `accountPendingBillingMonths` | `2896ms` | `0.03/min` | ### Named HTTP/API before/after comparison | Endpoint | Before | After | Absolute change | Percent change | Throughput before | Throughput after | | --- | ---: | ---: | ---: | ---: | ---: | ---: | | `DELETE-/app-api/logs/session` | `2861ms` | `5909ms` | `+3047ms` | `+107%` | `0.60/min` | `0.66/min` | | `GET-<static file>` | `16ms` | `22ms` | `+5ms` | `+33%` | `7.34/min` | `4.17/min` | | `GET-<app>` | `14ms` | `13ms` | `-1ms` | `-7%` | `13.41/min` | `13.75/min` | ### Before deploy: `pre_4h` Window: - Start: `2026-06-10T17:14:19.854Z` - End: `2026-06-10T21:14:19.854Z` Summary: - Methods weighted average response time: `1726ms` - Publications weighted average response time: `884ms` - HTTP weighted average response time: `110ms` Top methods by response time: | Method | Avg response time | Throughput | | --- | ---: | ---: | | `triggerAppEnvBuild` | `12294ms` | `0` | | `listAccountEvents` | `7554ms` | `0.01/min` | | `logs.startStreamSession` | `5037ms` | `0.20/min` | | `listAccountMembersForFilter` | `4759ms` | `0.01/min` | | `countAccountEvents` | `4040ms` | `0.01/min` | | `integrations/checkGitHubAccessToken` | `3717ms` | `0.19/min` | | `setCurrentAccount` | `3551ms` | `0` | | `listAccountAppsForFilter` | `3338ms` | `0.01/min` | | `appEnvSave` | `2925ms` | `0.02/min` | | `integrations/listUserInstallations` | `2846ms` | `0.01/min` | Top publications by response time: | Publication | Avg response time | Sub rate | | --- | ---: | ---: | | `appEnvDeploymentPods` | `2943ms` | `0.20/min` | | `userAccountSecrets` | `1903ms` | `0.45/min` | | `userAppEnv` | `1735ms` | `0.74/min` | | `clusterPoolPower` | `1535ms` | `0.04/min` | | `userAppsAndEnvs` | `1522ms` | `0.57/min` | | `userApp` | `1486ms` | `0.74/min` | | `accountAppEnvContentData` | `1438ms` | `0.70/min` | | `currentAccountData` | `1244ms` | `1.62/min` | | `alerts.byAppEnv` | `1221ms` | `0.05/min` | | `agent.llmProviderKeys.byContext` | `1092ms` | `0.57/min` | ### After deploy: `post_2h` Window: - Start: `2026-06-10T21:14:19.854Z` - End: `2026-06-10T23:14:19.854Z` Summary: - Methods weighted average response time: `7073ms` - Publications weighted average response time: `1415ms` - HTTP weighted average response time: `224ms` This is a `4.1x` increase in weighted method response time and a `1.6x` increase in weighted publication response time compared with the `pre_4h` window. Top methods by response time: | Method | Avg response time | Throughput | | --- | ---: | ---: | | `appEnvApplyChanges` | `120490ms` | `0.04/min` | | `appEnvSave` | `88103ms` | `0.03/min` | | `getDatabaseHostCredentialUrl` | `36759ms` | `0.02/min` | | `getDecodedSecret` | `15658ms` | `0.05/min` | | `logs.startStreamSession` | `12907ms` | `0.58/min` | | `getLogs` | `8583ms` | `0.02/min` | | `checkCnameRecords` | `5331ms` | `0.12/min` | | `integrations/checkGitHubAccessToken` | `4214ms` | `0.68/min` | | `login` | `3600ms` | `0.59/min` | | `changeAnonymousToUserId` | `2182ms` | `0.54/min` | Top publications by response time: | Publication | Avg response time | Sub rate | | --- | ---: | ---: | | `userAppEnvs` | `4638ms` | `0.03/min` | | `clusterPoolPower` | `3196ms` | `0.02/min` | | `currentAccountData` | `3140ms` | `3.08/min` | | `accountRegionsBillingMonth` | `2929ms` | `0.03/min` | | `accountBillingMonth` | `2909ms` | `0.03/min` | | `userAccount` | `2900ms` | `0.03/min` | | `accountPendingBillingMonths` | `2896ms` | `0.03/min` | | `userAppsAndEnvs` | `2844ms` | `0.48/min` | | `userApp` | `1991ms` | `0.57/min` | | `userAccounts` | `1623ms` | `0.61/min` | HTTP routes: | Route | Avg response time | Throughput | | --- | ---: | ---: | | `DELETE-/app-api/logs/session` | `5909ms` | `0.66/min` | | `GET-<static file>` | `22ms` | `4.17/min` | | `GET-<app>` | `13ms` | `13.75/min` | | `GET-/robots.txt` | `1ms` | `0.08/min` | ### After mitigation Window: - Start: `2026-06-11T11:34:00.000Z` - End: `2026-06-11T11:40:00.000Z` Summary: - Methods weighted average response time: `38ms` - Publications weighted average response time: `116ms` - HTTP weighted average response time: `14ms` - Traces above `500ms`: `0` - Mongo pool checkout delay: `0` - Mongo pool pending checkouts: `0` Top methods after mitigation: | Method | Avg response time | Throughput | | --- | ---: | ---: | | `integrations/checkGitHubAccessToken` | `72ms` | `0.50/min` | | `getDashboardUrls` | `29ms` | `0.17/min` | | `logs.startStreamSession` | `25ms` | `0.50/min` | | `addEvent` | `9ms` | `0.33/min` | Top publications after mitigation: | Publication | Avg response time | Sub rate | | --- | ---: | ---: | | `currentAppContentAndCurrentStatus` | `193ms` | `4.83/min` | | `latestAppEnvMetrics` | `107ms` | `4.83/min` | | `userAppsAndEnvs` | `44ms` | `0.17/min` | | `userAppEnv` | `20ms` | `0.17/min` | | `userApp` | `14ms` | `0.17/min` | | `accountAppEnvContentData` | `9ms` | `0.17/min` | | `contentsCount` | `8ms` | `0.17/min` | | `agent.llmProviderKeys.byContext` | `8ms` | `0.17/min` | | `currentAccountData` | `7ms` | `1.50/min` | | `meteor.loginServiceConfiguration` | `1ms` | `0.17/min` | ## MongoDB evidence ### Before mitigation MongoDB `currentOp` showed hundreds of active Change Stream cursors. One snapshot before mitigation: - Total cursor count: `692` - Change Stream cursor count: `690` Top collections by active Change Stream cursors: | Collection | Active Change Stream cursors | | --- | ---: | | `collection-01` | `362` | | `collection-02` | `53` | | `collection-03` | `47` | | `collection-04` | `40` | | `collection-05` | `38` | | `collection-06` | `38` | | `collection-07` | `31` | | `collection-08` | `19` | | `collection-09` | `14` | | `collection-10` | `11` | A later sample during rollout overlap showed even more active Change Stream cursors, but we do not treat that sample as final because old and new pods were briefly overlapping. ### Profiler spike At approximately `2026-06-11T04:18:00Z`, MongoDB profiler captured a spike of slow Change Stream operations: - `168` slow profile entries around that spike - `85` slow entries in one minute - Slow entries included `$changeStream` / `getMore` with `fullDocument: "updateLookup"` and `fullDocumentBeforeChange: "whenAvailable"` Examples of slow Change Stream profile entries: | Collection | Max duration | Docs examined | | --- | ---: | ---: | | `collection-11` | `8305ms` | `402875` | | `collection-10` | `8110ms` | not captured in summary | | `collection-08` | `7609ms` | `201638` | | `collection-09` | `5978ms` | `206976` | Those `docsExamined` counts were far larger than the normal collection sizes we expected for the specific observed data. This looked more like Change Stream resume/history scanning than a single missing index on a normal query shape. ### After mitigation After the oplog/polling mitigation rollout completed: - `currentOp` cursor count: `0` - `currentOp` Change Stream cursor count: `0` - Slow Change Stream profile entries since mitigation: `0` Profiler since mitigation only showed a few small `collection-12` entries around the restart: | Namespace | Op | Plan | Count | Max duration | | --- | --- | --- | ---: | ---: | | `collection-12` | `command` | n/a | `1` | `221ms` | | `collection-12` | `update` | `IXSCAN { startedAt: 1 }` | `5` | `120ms` | | `collection-12` | `query` | `IXSCAN { startedAt: 1 }` | `5` | `117ms` | No slow Change Stream entries remained in that post-mitigation window. ## Why this looks Change Streams-specific 1. The regression started immediately after the `3.5-rc.1` deployment, where Change Streams became the default reactivity mechanism for MongoDB 6+ replica sets. 2. Mongo profiler did not show a matching normal-query problem immediately after deploy, but `currentOp` showed hundreds of active Change Stream cursors. 3. A later restart/reconnect spike produced slow Change Stream profile entries, including `getMore`-like behavior and large `docsExamined` counts. 4. Pinning back to oplog/polling restored normal response times and removed active Change Stream cursors. 5. The same methods and publications became fast again without application code changes. ## Hypotheses for Meteor core investigation These are hypotheses, not confirmed root cause: 1. Change Stream observer startup/resume may be expensive for apps with many active publications and many per-user/per-account cursor variants. 2. Some publication shapes may create too many active Change Stream cursors or too little observer reuse compared with the previous oplog path. 3. Reconnect/restart behavior may cause a Change Stream resume/catch-up storm. 4. `fullDocument: "updateLookup"` and `fullDocumentBeforeChange: "whenAvailable"` may amplify cost on some collection shapes or resume windows. 5. The issue may be related to observer lifecycle/catch-up handling, possibly adjacent to https://github.com/meteor/meteor/issues/14452, even though our production symptom was latency rather than a confirmed stuck method. ## Follow-up investigation plan We are sharing this production evidence during the RC phase so the Meteor team and the community can investigate in parallel if needed. Our next step is not to assume every publication or method is equally affected. Instead, we plan to use the named Monti breakdowns above to narrow the problem into smaller shapes that can become a useful reproduction. The plan: 1. Compare the same method/publication names across pre-change, Change Streams, and post-mitigation windows. 2. For publications, add PubSub-specific dimensions such as sub rate, active subs, observer reuse, created/deleted observers, total observer handlers, lifetime, and trace DB/compute/wait breakdown. 3. Cross-reference the publication/method regression with MongoDB `currentOp` Change Stream cursor groups and profiler spikes. 4. Identify whether the regression is concentrated in a few publication shapes or appears broadly across unrelated publications. 5. Reduce the affected shapes into a smaller beta reproduction, then ideally a standalone reproduction for `meteor/meteor`. Community members seeing similar symptoms can run the same triage on their apps: compare before/after Monti breakdowns by name, inspect PubSub observer reuse/churn, and sample MongoDB `currentOp` for active `$changeStream` cursors. Collections associated with slow profile entries or active Change Stream cursor counts are anonymized in this report as `collection-*` identifiers. We can share more precise shapes privately or in a smaller sanitized reproduction if needed. ## Suggested beta reproduction plan We plan to use a non-production environment to narrow this down without destabilizing production: 1. Keep production pinned to `["oplog", "polling"]`. 2. In a non-production environment, record a baseline with oplog/polling: - Monti method/publication breakdown - PubSub observer metrics - slow traces above `500ms` - MongoDB `currentOp` - MongoDB profiler 3. Enable Change Streams only in that non-production environment. 4. Drive the same user flows: - login - authenticated dashboard load - detail page with several live publications - streaming endpoint start/stop - representative write workflow on a safe test object 5. Compare named publications/methods by: - response time - sub rate / throughput - active subs - lifetime - observer reuse - created/deleted observers - trace DB/compute/wait breakdown - active Change Stream cursor counts by collection If a small set of publications regress, we can try to build a minimal reproduction around those shapes. If many unrelated publications regress together, that points more toward a broad Change Stream observer lifecycle or resume behavior issue. ## What we cannot provide yet - We do not yet have a standalone minimal reproduction. - We cannot share production Mongo connection strings, hostnames, tokens, or full profiler documents because they may contain sensitive operational data. - We can share redacted aggregate outputs and can run targeted beta experiments if Meteor maintainers suggest specific instrumentation or flags. ## References - Meteor contributing guide asks for clear bug reports and ideally a minimal reproduction: https://github.com/meteor/meteor/blob/devel/CONTRIBUTING.md - Meteor 3.5 RC announcement says Change Streams are default on MongoDB 6+ replica sets and documents the `["oplog", "polling"]` rollback path: https://forums.meteor.com/t/meteor-3-5-rc-available-change-streams-performance-improvements/64461 - Related existing issue: https://github.com/meteor/meteor/issues/14452

Short version: after moving to 3.5-rc.1, we saw a significant method/publication latency regression in production. Pinning Mongo reactivity back to ["oplog", "polling"] and restoring MONGO_OPLOG_URL brought latency back to normal. The issue includes Monti APM before/after data, MongoDB currentOp/profiler evidence, and our follow-up investigation plan.

cc @italojs @nachocodoner

filipenevola · June 11, 2026, 12:12pm

Another one that could be helpful to check if you are testing Change Streams with MONGO_OPLOG_URL still present: Meteor 3.5-rc.1: Change Streams did not appear to activate until MONGO_OPLOG_URL was removed · Issue #14454 · meteor/meteor · GitHub

filipenevola · June 11, 2026, 2:35pm

A public reproduction repo is now available at quavedev/meteor-changestream-fanout-repro. It reproduces the Change Streams fanout shape with many distinct publication selectors observing the same hot collection.

cc @nachocodoner @italojs

dupontbertrand · June 11, 2026, 3:41pm

My 2 cents in the issue Meteor 3.5-rc.1 Change Streams caused production method/pub latency regression; pinning to oplog/polling restored normal performance · Issue #14453 · meteor/meteor · GitHub
(Doens’t see your repro app @filipenevola before posting mine but nvm )

paulishca · June 12, 2026, 5:31am

DX comment: it would be nice to be able to simply use oplog instead of CS in development (and CS in production). I already run multiple local MongoDB servers for multiple projects and doing replica sets would be a nightmare.

paulishca · June 12, 2026, 2:36pm

Has anyone managed to deploy 3.5 with MUP?!

italojs · June 12, 2026, 3:16pm

Oplog requires a replicatset too, if you run a standalone mongo intance locally, it should to try use CS → fallback to oplog → fallback to pooling
isnt it hapenning for you?

paulishca · June 12, 2026, 9:12pm

I haven’t set this and when I started my project I got a lot of errors related to the replica set in the server console and my M5 Macbook Pro started to ventilate like crazy.