SSR memory leaks

captainn · September 20, 2019, 5:05am

I’m getting a severe memory leak on Galaxy if I leave my SSR code. It causes my app to restart every half our or so. This is after only a handful of SSR renders (or even just one). I’m sure it’s something simple, but how do I get access to some kind of memory dump when this happens - preferably from the server running on my local machine (I’m sure it’s happening here too, I just have a lot more ram and restart more regularly development).

Hmm, can I just hook up the Chrome debugger (haven’t done that in a while, heh)

minhna · September 20, 2019, 6:51am

Is there anything in logs? I deploy my apps on my own server, I can see logs via
journalctl command.

alawi · September 20, 2019, 7:04am

I had memory leak with SSR before and I had to nullify all variables after each request and/or eliminate any file scoped constants to ensure GC is done properly. It could be some query result (or just user doc) being accumulated in memory after each request.

I’ve used heap dump in the past to capture the heap and send to S3 in a remote machine (basically a fork of heap save). I’ve not done it locally, but from this article it doesn’t seem to hard to achieve.

captainn · September 21, 2019, 4:02am

Okay something else weird - it only leaks in production. If I put the app up on Staging (and then hammer it, with more traffic than we get on the main site) it doesn’t leak memory. The only difference between staging and production is the database they each connect to (both on Atlas, both Replica Sets with 3 nodes, on Version 4.0.12 - staging is a “sandbox”), and on production we use a “professional” container to get access to APM - but the memory leaks and restarts were a problem before that.

Obviously, the production database is much larger (Atlas says 1.8Gig, which seems larger than it should be), and the number of connections - I get 30 on production, and only 4 on staging. I have no idea what that means - I guess I have my reading cut out for me.

rjdavid · September 21, 2019, 7:52am

We experienced something similar before wherein if a link of a page is posted in a facebook page, a handful of facebook bots plus traffic can kill an aws ec2 instance before it can spawn due to memory use.

We did not think it was a memory leak that time but we end up caching our ssr pages through redis. We are using your react-loadable fork for ssr

captainn · September 21, 2019, 3:31pm

I guess it could be bots. This particular site gets almost no traffic according to the connections count in Galaxy (there are literally 0 right now). I’ll try sharing from staging and see what happens.

I’m also going to try heap save and see what that reveals.

Thanks for the tips all!

captainn · October 1, 2019, 5:54am

Another thing I just noticed - The CPU when the server is leaking memory spikes up to 60% every 30 seconds on the production server. This doesn’t happen on the staging server.

rjdavid · October 3, 2019, 1:24am

Did you see anything in the APM accessing or executing every 30 seconds?

captainn · October 3, 2019, 1:38am

Yes! It only does that on production though. I put the same code connected to the same database on staging (another galaxy site), and no leak there. I put in a support ticket last night, but no response yet.

Oh, do you mean a specific function? I don’t know how to check that - APM doesn’t seem as useful as I remember back in the day. Maybe I’m just not finding the right buttons.

minhna · October 3, 2019, 2:49am

I had this issue once. Had to replace one for loop by map function. Then solve the issue. I really don’t know why but it works. Haha.

minhna · October 3, 2019, 2:52am

It looks like it can handle limited of async/await levels.

captainn · October 3, 2019, 10:38pm

I wonder if this can be causing my leaks. https://github.com/facebook/react/issues/13854

I’m going to try [this package(https://github.com/panter/meteor-fiber-save-react-context), which seems to patch for that.

minhna · October 4, 2019, 2:43am

I have that problem too. Still looking for a solution. Thank you.

captainn · October 4, 2019, 4:34am

It didn’t work, still leaking. But I think that issue may have something to do with it. Right before I run out of memory, I get this message:

2019-10-03 21:15:56-04:00Error: Minified React error #321; visit https://reactjs.org/docs/error-decoder.html?invariant=321 for the full message or use the non-minified dev environment for full errors and additional helpful warnings.
2019-10-03 21:15:56-04:00 at W (/app/bundle/programs/server/npm/node_modules/react/cjs/react.production.min.js:20:386)
2019-10-03 21:15:56-04:00 at useContext (/app/bundle/programs/server/npm/node_modules/react/cjs/react.production.min.js:22:416)
2019-10-03 21:15:56-04:00 at Loadable (packages/npdev:react-loadable/react-loadable-server.js:48:21)
2019-10-03 21:15:56-04:00 at d (/app/bundle/programs/server/npm/node_modules/react-dom/cjs/react-dom-server.node.production.min.js:36:498)
2019-10-03 21:15:56-04:00 at Za (/app/bundle/programs/server/npm/node_modules/react-dom/cjs/react-dom-server.node.production.min.js:39:16)
2019-10-03 21:15:56-04:00 at a.b.render (/app/bundle/programs/server/npm/node_modules/react-dom/cjs/react-dom-server.node.production.min.js:44:476)
2019-10-03 21:15:56-04:00 at a.b.read (/app/bundle/programs/server/npm/node_modules/react-dom/cjs/react-dom-server.node.production.min.js:44:18)
2019-10-03 21:15:56-04:00 at renderToString (/app/bundle/programs/server/npm/node_modules/react-dom/cjs/react-dom-server.node.production.min.js:54:364)
2019-10-03 21:15:56-04:00 at Promise.asyncApply (server/ssr.js:40:12)
2019-10-03 21:15:56-04:00 at /app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:u5408:

That error means a hook was called outside of a component - and it was not (it appears to have been called inside the Loadable component).

Hmmm, Loadable uses an internal psuedo-global store for determining if Loadables have been loaded. I wonder if the Fibers could be messing with that.

minhna · October 10, 2019, 4:08am

@captainn how is it going?

captainn · October 10, 2019, 8:09pm

I’m a bit stuck, and simply out of time to work on this for the moment. I think there is some kind of bad interaction happening between Fibers and React hooks, but I’m not 100% certain. The only real feedback I have going so far is the stack trace I posted earlier. It shows something happening in my Loadable component, which uses hooks and the root of the stack trace comes from the fiber pool. … I just thought of something. Maybe my loadable is leaking. Investigating…

captainn · October 10, 2019, 10:15pm

Could this be creating a memory leak?

const LoadableContext = createContext(false)
export const LoadableCaptureProvider = ({ handle, children }) => {
  if (!handle.loadables) {
    handle.loadables = [] // this in particular
    handle.toEJSON = () => ( // or maybe these closures
      EJSON.stringify(handle.loadables)
    )
    handle.toScriptTag = () => (
      `<script type="text/ejson" id="__preloadables__">${EJSON.stringify(handle.loadables)}</script>`
    )
  }
  return createElement(LoadableContext.Provider, { value: handle }, children)
}

That provider sets a number of values meant to help with capturing some data when running in SSR. It runs each time the server-render package’s onPageLoad is envoked, to create a new set of state for the current render. Is it possible that something about this pattern is preventing the array or other enclosed values from being collected?

macrozone · October 10, 2019, 10:31pm

hmmm… some thoughts:

the cpu spikes when memory is short could be garbage collecting and / or swapping, so probably not a problem on its own
the context issue mentioned above with React.createContext is surely not the problem (and fixed), it just lead to wrong context values
Maybe you have some client-code that stores stuff in global variables which is now run in the server and every request might fill that global variable. This can happen for code that is not really optimized for server run. Or has to be treated differently on server. E.g. take some cache like apollo-cache. On the client, you can initialize apolloClient and its cache globally, as some singleton. But you have to be careful when doing SSR: you should treat every request as a separate entity, so you should initialize the apolloClient and its cache in this request and its function. Once this request has sent data to the client, everything initialized should get garbage collected. So i would check if you initialize someting globally and not within the server-render sink. Check if you have anything that might cache stuff (e.g. a i18n-library). Meteor-collection can be global on server, because they are isomophically designed, but maybe you use some library around it, that is not aware of SSR.
maybe it has to do with the loadable code above, but it does not seam to add so much data… the code is a bit weird. how does LoadableCaptureProvider receive a handle?
try to reproduce it locally, maybe copy the whole database and check memory consumption after every render? there must be something that you can reproduce

moberegger · October 10, 2019, 10:34pm

Not sure if this is your problem or if this will be helpful, but https://blog.meteor.com/an-interesting-kind-of-javascript-memory-leak-8b47d2e7f156 is an article I refer to every once in awhile whenever I see a weird memory leak.

It’s old and perhaps no longer relevant… it’s just something that stuck with me because it looked like such an easy thing to do.

captainn · October 11, 2019, 2:40am

I’m fairly sure I’m not creating any global (or package level) variables. This seems to be leaking out of the Mongo queries. I’m using my collections package, which uses a separate code base for the server and the client. On the server I’m using Mongo queries directly, which means behind the scenes it’s using Fibers. Since the fiber pool file is at the bottom of the stack trace (the error I get right before “out of memory” in server logs) I have been thinking it has something to do with some interplay between Fibers and React SSR.

Could parts of the React render be getting garbage collected before the hook is run? It seems weird that React tells me I’m calling a hook outside of the component tree, when the component I’m calling it from is right there in the stack.

Reproducing it has been challenging - I actually can’t even reproduce it on the same service (Galaxy) when hooked up to the same production Mongo database, with a different URL. That’s very frustrating.

If I do let it run locally and grab a heap dump, I see a lot of Mongo driver allocations, but nothing super obvious.

The handle is created in the SSR code, and then handed to the Provider, which sets up the handy tools, like toScriptTag. I assumed that would make it GC safe, but I’m not so sure.

import { WebApp } from 'meteor/webapp'
import React from 'react'
import s2s from 'string-to-stream'
import sq from 'streamqueue'
import { StaticRouter } from 'react-router'
import { renderToString } from 'react-dom/server'
import { FastRender } from 'meteor/staringatlights:fast-render'
import { LoadableContext, preloadAllLoadables } from 'meteor/npdev:react-loadable'
import { HelmetProvider } from 'react-helmet-async'
import { ServerStyleSheets, ThemeProvider } from '@material-ui/styles'
import App from '/imports/App'
import theme from '/imports/ui/common/theme'
import { DataCaptureProvider } from 'meteor/npdev:collections'
import { EJSON } from 'meteor/ejson'

preloadAllLoadables().then(() => FastRender.onPageLoad(async sink => {
  const context = {}
  const helmetContext = {}
  const dataHandle = {}
  const loadableHandle = { loadables: [] }

  const sheets = new ServerStyleSheets()

  const app = <ThemeProvider theme={theme}>
    <HelmetProvider context={helmetContext}>
      <LoadableContext.Provider value={loadableHandle}>
        <DataCaptureProvider handle={dataHandle}>
          <StaticRouter location={sink.request.url} context={context}>
            <App />
          </StaticRouter>
        </DataCaptureProvider>
      </LoadableContext.Provider>
    </HelmetProvider>
  </ThemeProvider>

  let html
  try {
    html = renderToString(sheets.collect(app))
  } catch (e) {
    console.error(e)
    WebApp.addHtmlAttributeHook(() => ({
      lang: 'en'
    }))
    return
  }

  const { helmet } = helmetContext
  const meta = helmet.meta.toString()
  meta && sink.appendToHead(meta + '\n')
  const title = helmet.title.toString()
  title && sink.appendToHead(title + '\n')
  const link = helmet.link.toString()
  link && sink.appendToHead(link + '\n')

  WebApp.addHtmlAttributeHook(() => (
    Object.assign({
      lang: 'en',
      'xmlns:og': 'http://ogp.me/ns#'
    }, helmet.htmlAttributes.toComponent())
  ))
  // :TODO: Figure out how to do helmet.bodyAttributes...

  const css = sheets.toString()
  sink.appendToHead(`<style id="jss-server-side">\n${css}\n</style>`)

  // :HACK: The meteor css bundle should come after the JSS output, so we just move it manually
  sink.appendToHead('\n<script id="css-fixer">elm=document.getElementsByClassName("__meteor-css__")[0];elm.parentNode.appendChild(elm);elm=document.getElementById("css-fixer");elm.parentNode.removeChild(elm);delete elm</script>')

  const queuedStreams = sq(
    () => s2s(html),
    () => s2s(`<script type="text/ejson" id="__preloadables__">${EJSON.stringify(loadableHandle.loadables)}</script>`),
    () => s2s(dataHandle.toScriptTag())
  )
  sink.renderIntoElementById('root', queuedStreams)
}))

Actually, this is half way through a refactoring I’m doing to test whether my libraries are leaking the memory.