## How this report was produced
This report was assembled with Codex using:
- …Monti APM data queried through the open-source
[`quavedev/montiapm-mcp`](https://github.com/quavedev/montiapm-mcp) MCP
server.
- Quave ONE environment/deployment metadata queried through the
[Quave ONE MCP](https://docs.quave.one/mcp/).
- MongoDB profiler and `currentOp` data queried from a
[Quave ONE](https://quave.one) managed MongoDB deployment through remote
access over VPN.
Sensitive values such as connection strings, credentials, hostnames, and raw
profiler documents are intentionally omitted or redacted.
## Summary
After upgrading a production Meteor app from the previous Meteor 3.x line to
`METEOR@3.5-rc.1`, we observed a large production regression in Meteor method
and publication response times. The regression started immediately after the
`3.5-rc.1` deployment.
The strongest evidence points to the new Mongo Change Streams observer path:
- The same app recovered after explicitly pinning Mongo reactivity to
`["oplog", "polling"]` and restoring `MONGO_OPLOG_URL`.
- Before mitigation, MongoDB `currentOp` showed hundreds of active
`$changeStream` cursors.
- During a later reconnect/restart spike, MongoDB profiler captured slow
`$changeStream` / `getMore` operations with `fullDocument: "updateLookup"`
and `fullDocumentBeforeChange: "whenAvailable"`.
- After mitigation rollout completed, `currentOp` showed `0` active Change
Stream cursors and Monti APM response times returned to normal.
We do not yet have a minimal reproduction. This report is production evidence
from an app running during the `3.5-rc.1` phase so the Meteor team can
investigate likely failure modes before stable release.
This may be related to, but is not identical to,
https://github.com/meteor/meteor/issues/14452. That issue reports a write fence
stuck after a Change Stream observer stops. Our symptom was broader production
latency and Change Stream cursor/slow-profile growth, not a confirmed permanent
method hang.
## Environment
- Meteor release: `METEOR@3.5-rc.1`
- Mongo package: expected `mongo@2.4.0-rc350.1` from the RC
- MongoDB server: `8.2.0`
- Mongo deployment: replica set, 3 members
- App workload: production Meteor app with DDP methods and many publications
- Monitoring: Monti APM, MongoDB profiler, MongoDB `currentOp`
- Profiler status during investigation: enabled, `slowms: 100`, `sampleRate: 1`
The application has many account-scoped publications and operational
dashboard publications. Some high-cardinality or frequently-changing collections were involved; their real collection names are omitted below and replaced with stable `collection-*` identifiers.
## Timeline
All times are UTC.
- `2026-06-10T21:14:19.854Z`:
`METEOR@3.5-rc.1` deployment went live.
- `2026-06-10T21:14:19.854Z` to `2026-06-10T23:14:19.854Z`:
immediate post-deploy window showed a major latency regression in Monti APM.
- `2026-06-11T04:18:00Z`:
Mongo profiler captured a spike of slow Change Stream operations.
- `2026-06-11T11:32:28Z`:
mitigation applied by restoring `MONGO_OPLOG_URL` and pinning reactivity to
`["oplog", "polling"]`.
- `2026-06-11T11:34:00Z` to `2026-06-11T11:40:00Z`:
post-mitigation Monti/Mongo window was normal.
## Configuration change that mitigated the issue
The production app was mitigated with the documented rollback configuration
from the Meteor 3.5 RC announcement:
```json
{
"packages": {
"mongo": {
"reactivity": ["oplog", "polling"]
}
}
}
```
We also restored `MONGO_OPLOG_URL`, which had existed before the `3.5-rc.1`
deployment.
## Monti APM evidence
Method, publication, and HTTP endpoint names are included because they are
already visible to clients through normal DDP/browser tooling. Mongo collection
names and infrastructure details are anonymized with stable `collection-*`
identifiers because they come from server-side production profiling data.
### Before/after summary
This table compares the `pre_4h` window before `METEOR@3.5-rc.1` went live
with the first `post_2h` window after the deployment, when Change Streams were
active.
These are weighted averages across the breakdown entries returned by Monti APM
for each surface. The weighting uses each entry's throughput, so higher-volume
methods/publications/endpoints contribute more than rare calls. This is useful
as an overall app-level signal, but the per-method and per-publication tables
below are more useful for debugging specific shapes.
| Surface | Before `3.5-rc.1` / oplog | After `3.5-rc.1` / Change Streams | Absolute increase | Percent increase |
| --- | ---: | ---: | ---: | ---: |
| Meteor methods | `1726ms` | `7073ms` | `+5347ms` | `+310%` |
| Meteor publications | `884ms` | `1415ms` | `+531ms` | `+60%` |
| HTTP/API endpoints | `110ms` | `224ms` | `+114ms` | `+104%` |
After pinning the same app back to `["oplog", "polling"]`, the first clean
post-mitigation window dropped to `38ms` for methods, `116ms` for publications,
and `14ms` for HTTP/API endpoints.
### Named method before/after comparison
This table compares method names that appeared in both the `pre_4h` and
`post_2h` Monti breakdowns.
| Method | Before | After | Absolute change | Percent change | Throughput before | Throughput after |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| `appEnvApplyChanges` | `488ms` | `120490ms` | `+120002ms` | `+24616%` | `0.01/min` | `0.04/min` |
| `appEnvSave` | `2925ms` | `88103ms` | `+85178ms` | `+2912%` | `0.02/min` | `0.03/min` |
| `logs.startStreamSession` | `5037ms` | `12907ms` | `+7869ms` | `+156%` | `0.20/min` | `0.58/min` |
| `checkCnameRecords` | `2369ms` | `5331ms` | `+2962ms` | `+125%` | `0.20/min` | `0.12/min` |
| `integrations/checkGitHubAccessToken` | `3717ms` | `4214ms` | `+497ms` | `+13%` | `0.19/min` | `0.68/min` |
| `login` | `1076ms` | `3600ms` | `+2524ms` | `+235%` | `0.94/min` | `0.59/min` |
| `changeAnonymousToUserId` | `1449ms` | `2182ms` | `+734ms` | `+51%` | `0.88/min` | `0.54/min` |
| `addEvent` | `1549ms` | `1752ms` | `+203ms` | `+13%` | `0.53/min` | `0.52/min` |
| `getAppEnvCname` | `617ms` | `1124ms` | `+506ms` | `+82%` | `0.23/min` | `0.13/min` |
| `redeployAppContent` | `177ms` | `858ms` | `+682ms` | `+386%` | `0.01/min` | `0.02/min` |
| `getDashboardUrls` | `53ms` | `36ms` | `-16ms` | `-31%` | `0.01/min` | `0.10/min` |
Methods that appeared as slow only in the `post_2h` top breakdown:
| Method | After | Throughput after |
| --- | ---: | ---: |
| `getLogs` | `8583ms` | `0.02/min` |
| `clearScroll` | `1802ms` | `0.01/min` |
### Named publication before/after comparison
This table compares publication names that appeared in both the `pre_4h` and
`post_2h` Monti breakdowns.
| Publication | Before | After | Absolute change | Percent change | Sub rate before | Sub rate after |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| `clusterPoolPower` | `1535ms` | `3196ms` | `+1662ms` | `+108%` | `0.04/min` | `0.02/min` |
| `currentAccountData` | `1244ms` | `3140ms` | `+1896ms` | `+152%` | `1.62/min` | `3.08/min` |
| `userAppsAndEnvs` | `1522ms` | `2844ms` | `+1321ms` | `+87%` | `0.57/min` | `0.48/min` |
| `userApp` | `1486ms` | `1991ms` | `+506ms` | `+34%` | `0.74/min` | `0.57/min` |
| `userAccounts` | `619ms` | `1623ms` | `+1004ms` | `+162%` | `1.16/min` | `0.61/min` |
| `accountAppEnvContentData` | `1438ms` | `1589ms` | `+151ms` | `+10%` | `0.70/min` | `0.51/min` |
| `userData` | `751ms` | `1536ms` | `+784ms` | `+104%` | `1.16/min` | `0.66/min` |
| `contentsCount` | `612ms` | `1513ms` | `+901ms` | `+147%` | `0.70/min` | `0.51/min` |
| `agent.llmProviderKeys.byContext` | `1092ms` | `1487ms` | `+395ms` | `+36%` | `0.57/min` | `0.48/min` |
| `currentAppContentAndCurrentStatus` | `646ms` | `1311ms` | `+665ms` | `+103%` | `12.90/min` | `14.61/min` |
| `meteor_autoupdate_clientVersions` | `628ms` | `1294ms` | `+666ms` | `+106%` | `0.98/min` | `0.63/min` |
| `meteor.loginServiceConfiguration` | `628ms` | `1294ms` | `+666ms` | `+106%` | `0.98/min` | `0.63/min` |
| `accountNotifications` | `739ms` | `1177ms` | `+437ms` | `+59%` | `1.44/min` | `0.84/min` |
| `latestAppEnvMetrics` | `1021ms` | `1115ms` | `+94ms` | `+9%` | `12.90/min` | `14.61/min` |
| `myClusterPoolAccess` | `474ms` | `915ms` | `+441ms` | `+93%` | `1.42/min` | `0.82/min` |
| `appEnvDeploymentPods` | `2943ms` | `1418ms` | `-1525ms` | `-52%` | `0.20/min` | `0.24/min` |
| `userAppEnv` | `1735ms` | `1335ms` | `-400ms` | `-23%` | `0.74/min` | `0.52/min` |
| `userAccountSecrets` | `1903ms` | `1204ms` | `-699ms` | `-37%` | `0.45/min` | `0.29/min` |
Publications that appeared as slow only in the `post_2h` top breakdown:
| Publication | After | Sub rate after |
| --- | ---: | ---: |
| `userAppEnvs` | `4638ms` | `0.03/min` |
| `accountRegionsBillingMonth` | `2929ms` | `0.03/min` |
| `accountBillingMonth` | `2909ms` | `0.03/min` |
| `userAccount` | `2900ms` | `0.03/min` |
| `accountPendingBillingMonths` | `2896ms` | `0.03/min` |
### Named HTTP/API before/after comparison
| Endpoint | Before | After | Absolute change | Percent change | Throughput before | Throughput after |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| `DELETE-/app-api/logs/session` | `2861ms` | `5909ms` | `+3047ms` | `+107%` | `0.60/min` | `0.66/min` |
| `GET-<static file>` | `16ms` | `22ms` | `+5ms` | `+33%` | `7.34/min` | `4.17/min` |
| `GET-<app>` | `14ms` | `13ms` | `-1ms` | `-7%` | `13.41/min` | `13.75/min` |
### Before deploy: `pre_4h`
Window:
- Start: `2026-06-10T17:14:19.854Z`
- End: `2026-06-10T21:14:19.854Z`
Summary:
- Methods weighted average response time: `1726ms`
- Publications weighted average response time: `884ms`
- HTTP weighted average response time: `110ms`
Top methods by response time:
| Method | Avg response time | Throughput |
| --- | ---: | ---: |
| `triggerAppEnvBuild` | `12294ms` | `0` |
| `listAccountEvents` | `7554ms` | `0.01/min` |
| `logs.startStreamSession` | `5037ms` | `0.20/min` |
| `listAccountMembersForFilter` | `4759ms` | `0.01/min` |
| `countAccountEvents` | `4040ms` | `0.01/min` |
| `integrations/checkGitHubAccessToken` | `3717ms` | `0.19/min` |
| `setCurrentAccount` | `3551ms` | `0` |
| `listAccountAppsForFilter` | `3338ms` | `0.01/min` |
| `appEnvSave` | `2925ms` | `0.02/min` |
| `integrations/listUserInstallations` | `2846ms` | `0.01/min` |
Top publications by response time:
| Publication | Avg response time | Sub rate |
| --- | ---: | ---: |
| `appEnvDeploymentPods` | `2943ms` | `0.20/min` |
| `userAccountSecrets` | `1903ms` | `0.45/min` |
| `userAppEnv` | `1735ms` | `0.74/min` |
| `clusterPoolPower` | `1535ms` | `0.04/min` |
| `userAppsAndEnvs` | `1522ms` | `0.57/min` |
| `userApp` | `1486ms` | `0.74/min` |
| `accountAppEnvContentData` | `1438ms` | `0.70/min` |
| `currentAccountData` | `1244ms` | `1.62/min` |
| `alerts.byAppEnv` | `1221ms` | `0.05/min` |
| `agent.llmProviderKeys.byContext` | `1092ms` | `0.57/min` |
### After deploy: `post_2h`
Window:
- Start: `2026-06-10T21:14:19.854Z`
- End: `2026-06-10T23:14:19.854Z`
Summary:
- Methods weighted average response time: `7073ms`
- Publications weighted average response time: `1415ms`
- HTTP weighted average response time: `224ms`
This is a `4.1x` increase in weighted method response time and a `1.6x`
increase in weighted publication response time compared with the `pre_4h`
window.
Top methods by response time:
| Method | Avg response time | Throughput |
| --- | ---: | ---: |
| `appEnvApplyChanges` | `120490ms` | `0.04/min` |
| `appEnvSave` | `88103ms` | `0.03/min` |
| `getDatabaseHostCredentialUrl` | `36759ms` | `0.02/min` |
| `getDecodedSecret` | `15658ms` | `0.05/min` |
| `logs.startStreamSession` | `12907ms` | `0.58/min` |
| `getLogs` | `8583ms` | `0.02/min` |
| `checkCnameRecords` | `5331ms` | `0.12/min` |
| `integrations/checkGitHubAccessToken` | `4214ms` | `0.68/min` |
| `login` | `3600ms` | `0.59/min` |
| `changeAnonymousToUserId` | `2182ms` | `0.54/min` |
Top publications by response time:
| Publication | Avg response time | Sub rate |
| --- | ---: | ---: |
| `userAppEnvs` | `4638ms` | `0.03/min` |
| `clusterPoolPower` | `3196ms` | `0.02/min` |
| `currentAccountData` | `3140ms` | `3.08/min` |
| `accountRegionsBillingMonth` | `2929ms` | `0.03/min` |
| `accountBillingMonth` | `2909ms` | `0.03/min` |
| `userAccount` | `2900ms` | `0.03/min` |
| `accountPendingBillingMonths` | `2896ms` | `0.03/min` |
| `userAppsAndEnvs` | `2844ms` | `0.48/min` |
| `userApp` | `1991ms` | `0.57/min` |
| `userAccounts` | `1623ms` | `0.61/min` |
HTTP routes:
| Route | Avg response time | Throughput |
| --- | ---: | ---: |
| `DELETE-/app-api/logs/session` | `5909ms` | `0.66/min` |
| `GET-<static file>` | `22ms` | `4.17/min` |
| `GET-<app>` | `13ms` | `13.75/min` |
| `GET-/robots.txt` | `1ms` | `0.08/min` |
### After mitigation
Window:
- Start: `2026-06-11T11:34:00.000Z`
- End: `2026-06-11T11:40:00.000Z`
Summary:
- Methods weighted average response time: `38ms`
- Publications weighted average response time: `116ms`
- HTTP weighted average response time: `14ms`
- Traces above `500ms`: `0`
- Mongo pool checkout delay: `0`
- Mongo pool pending checkouts: `0`
Top methods after mitigation:
| Method | Avg response time | Throughput |
| --- | ---: | ---: |
| `integrations/checkGitHubAccessToken` | `72ms` | `0.50/min` |
| `getDashboardUrls` | `29ms` | `0.17/min` |
| `logs.startStreamSession` | `25ms` | `0.50/min` |
| `addEvent` | `9ms` | `0.33/min` |
Top publications after mitigation:
| Publication | Avg response time | Sub rate |
| --- | ---: | ---: |
| `currentAppContentAndCurrentStatus` | `193ms` | `4.83/min` |
| `latestAppEnvMetrics` | `107ms` | `4.83/min` |
| `userAppsAndEnvs` | `44ms` | `0.17/min` |
| `userAppEnv` | `20ms` | `0.17/min` |
| `userApp` | `14ms` | `0.17/min` |
| `accountAppEnvContentData` | `9ms` | `0.17/min` |
| `contentsCount` | `8ms` | `0.17/min` |
| `agent.llmProviderKeys.byContext` | `8ms` | `0.17/min` |
| `currentAccountData` | `7ms` | `1.50/min` |
| `meteor.loginServiceConfiguration` | `1ms` | `0.17/min` |
## MongoDB evidence
### Before mitigation
MongoDB `currentOp` showed hundreds of active Change Stream cursors.
One snapshot before mitigation:
- Total cursor count: `692`
- Change Stream cursor count: `690`
Top collections by active Change Stream cursors:
| Collection | Active Change Stream cursors |
| --- | ---: |
| `collection-01` | `362` |
| `collection-02` | `53` |
| `collection-03` | `47` |
| `collection-04` | `40` |
| `collection-05` | `38` |
| `collection-06` | `38` |
| `collection-07` | `31` |
| `collection-08` | `19` |
| `collection-09` | `14` |
| `collection-10` | `11` |
A later sample during rollout overlap showed even more active Change Stream
cursors, but we do not treat that sample as final because old and new pods were
briefly overlapping.
### Profiler spike
At approximately `2026-06-11T04:18:00Z`, MongoDB profiler captured a spike of
slow Change Stream operations:
- `168` slow profile entries around that spike
- `85` slow entries in one minute
- Slow entries included `$changeStream` / `getMore` with
`fullDocument: "updateLookup"` and
`fullDocumentBeforeChange: "whenAvailable"`
Examples of slow Change Stream profile entries:
| Collection | Max duration | Docs examined |
| --- | ---: | ---: |
| `collection-11` | `8305ms` | `402875` |
| `collection-10` | `8110ms` | not captured in summary |
| `collection-08` | `7609ms` | `201638` |
| `collection-09` | `5978ms` | `206976` |
Those `docsExamined` counts were far larger than the normal collection sizes
we expected for the specific observed data. This looked more like Change Stream
resume/history scanning than a single missing index on a normal query shape.
### After mitigation
After the oplog/polling mitigation rollout completed:
- `currentOp` cursor count: `0`
- `currentOp` Change Stream cursor count: `0`
- Slow Change Stream profile entries since mitigation: `0`
Profiler since mitigation only showed a few small `collection-12` entries around
the restart:
| Namespace | Op | Plan | Count | Max duration |
| --- | --- | --- | ---: | ---: |
| `collection-12` | `command` | n/a | `1` | `221ms` |
| `collection-12` | `update` | `IXSCAN { startedAt: 1 }` | `5` | `120ms` |
| `collection-12` | `query` | `IXSCAN { startedAt: 1 }` | `5` | `117ms` |
No slow Change Stream entries remained in that post-mitigation window.
## Why this looks Change Streams-specific
1. The regression started immediately after the `3.5-rc.1` deployment, where
Change Streams became the default reactivity mechanism for MongoDB 6+
replica sets.
2. Mongo profiler did not show a matching normal-query problem immediately
after deploy, but `currentOp` showed hundreds of active Change Stream
cursors.
3. A later restart/reconnect spike produced slow Change Stream profile entries,
including `getMore`-like behavior and large `docsExamined` counts.
4. Pinning back to oplog/polling restored normal response times and removed
active Change Stream cursors.
5. The same methods and publications became fast again without application code
changes.
## Hypotheses for Meteor core investigation
These are hypotheses, not confirmed root cause:
1. Change Stream observer startup/resume may be expensive for apps with many
active publications and many per-user/per-account cursor variants.
2. Some publication shapes may create too many active Change Stream cursors or
too little observer reuse compared with the previous oplog path.
3. Reconnect/restart behavior may cause a Change Stream resume/catch-up storm.
4. `fullDocument: "updateLookup"` and
`fullDocumentBeforeChange: "whenAvailable"` may amplify cost on some
collection shapes or resume windows.
5. The issue may be related to observer lifecycle/catch-up handling, possibly
adjacent to https://github.com/meteor/meteor/issues/14452, even though our
production symptom was latency rather than a confirmed stuck method.
## Follow-up investigation plan
We are sharing this production evidence during the RC phase so the Meteor team
and the community can investigate in parallel if needed. Our next step is not to
assume every publication or method is equally affected. Instead, we plan to use
the named Monti breakdowns above to narrow the problem into smaller shapes that
can become a useful reproduction.
The plan:
1. Compare the same method/publication names across pre-change, Change Streams,
and post-mitigation windows.
2. For publications, add PubSub-specific dimensions such as sub rate, active
subs, observer reuse, created/deleted observers, total observer handlers,
lifetime, and trace DB/compute/wait breakdown.
3. Cross-reference the publication/method regression with MongoDB `currentOp`
Change Stream cursor groups and profiler spikes.
4. Identify whether the regression is concentrated in a few publication shapes
or appears broadly across unrelated publications.
5. Reduce the affected shapes into a smaller beta reproduction, then ideally a
standalone reproduction for `meteor/meteor`.
Community members seeing similar symptoms can run the same triage on their
apps: compare before/after Monti breakdowns by name, inspect PubSub observer
reuse/churn, and sample MongoDB `currentOp` for active `$changeStream` cursors.
Collections associated with slow profile entries or active Change Stream cursor
counts are anonymized in this report as `collection-*` identifiers. We can share
more precise shapes privately or in a smaller sanitized reproduction if needed.
## Suggested beta reproduction plan
We plan to use a non-production environment to narrow this down without
destabilizing production:
1. Keep production pinned to `["oplog", "polling"]`.
2. In a non-production environment, record a baseline with oplog/polling:
- Monti method/publication breakdown
- PubSub observer metrics
- slow traces above `500ms`
- MongoDB `currentOp`
- MongoDB profiler
3. Enable Change Streams only in that non-production environment.
4. Drive the same user flows:
- login
- authenticated dashboard load
- detail page with several live publications
- streaming endpoint start/stop
- representative write workflow on a safe test object
5. Compare named publications/methods by:
- response time
- sub rate / throughput
- active subs
- lifetime
- observer reuse
- created/deleted observers
- trace DB/compute/wait breakdown
- active Change Stream cursor counts by collection
If a small set of publications regress, we can try to build a minimal
reproduction around those shapes. If many unrelated publications regress
together, that points more toward a broad Change Stream observer lifecycle or
resume behavior issue.
## What we cannot provide yet
- We do not yet have a standalone minimal reproduction.
- We cannot share production Mongo connection strings, hostnames, tokens, or
full profiler documents because they may contain sensitive operational data.
- We can share redacted aggregate outputs and can run targeted beta experiments
if Meteor maintainers suggest specific instrumentation or flags.
## References
- Meteor contributing guide asks for clear bug reports and ideally a minimal
reproduction: https://github.com/meteor/meteor/blob/devel/CONTRIBUTING.md
- Meteor 3.5 RC announcement says Change Streams are default on MongoDB 6+
replica sets and documents the `["oplog", "polling"]` rollback path:
https://forums.meteor.com/t/meteor-3-5-rc-available-change-streams-performance-improvements/64461
- Related existing issue:
https://github.com/meteor/meteor/issues/14452