Acceptance and load testing

vblagomir · February 21, 2017, 12:15pm

I have created a rather complex tool for fixtures generation for my app and thought about adapting it for testing purposes, which could potentially work as acceptance and both load testing.

For that, on the contrary of using meteor test (which still could be used for unit testing) I thought about creating a separate application that would consists specifically from instructions for nightmare to go to the app and in a virtual browser check/interact with specific dom elements, enter the fixtures data (that is generated reusing the code for fixtures generators), and thus ‘emulate’ behaviour of actual users.

Before I start doing this, asking for some help from community:

Is it worth doing this in general? I assume I won’t be able to do this on a CI server, right? Meaning this approach won’t allow me to actually test the app before pushing new code to master/live, right?
Could this approach help to perform a load testing? I could theoretically program several servers to create thousands of real users-like calls to a pre-production app on galaxy and see how well it will perform, to estimate how many and which servers I might need under which load.
Have you done something like this or maybe I should consider anything different?

janka · February 21, 2017, 4:33pm

I think this covers your 3.:

We use chimp with cucumber for e2e testing - and fixture generating.

Each cucumber feature is depending on another feature by a special comment at the top of the file. This way, we create a tree of features.

After each feature is run, we do a database dump (feature dump).
Before each feature is run, we load the feature dump created by the feature it depends on.
The top most feature depends on nothing, and starts running on an empty database.

This way, our fixtures are always created by the actual app, and therefore never gets out of sync with the code.

Not usable for load testing I guess…

vblagomir · February 21, 2017, 5:07pm

Thanks for your reply @janka, could you please elaborate on why to create separate database dumps for each feature? it sounds like an overhead… Why not to have one set of fixtures-data that is generated before testing and is identical on the structure to the final data, so that the ‘tests’ are performed against this ‘complete’ set of data?

vblagomir · February 21, 2017, 8:49pm

As a proof of concept I have managed to create a separate ‘slave’ app and connect the master app to it over DDP.connect.

So the master app calls the method on the slave app, whereas the slave app opens electron browser and browses the master app. And I could run multiple browses from the slave app so that master app registers those multiple connections. The only concern I have is that there might be physical limitations on the simultaneous multiple opened electron browsers as when I tried to make 50 connections, I started to get timeout errors from the watchdog/electron… :-/

But so far the approach looks promising maybe not for a proper load testing but for acceptance testing at least.

janka · February 21, 2017, 9:27pm

There is virtually no overhead in dumping and restoring the database. Our test script does however make sure not to restore the db if the previously run feature is the one the next is depending on.

Having a dump of each feature also means the developer is able to load a set of fixtures fitting for developing a certain thing, and/or writing a feature that branches of from that point and not having to run the entire feature path to get to that db state.

Regarding fixtures: The whole point of doing it this way is to avoid having a set of fixtures to maintain. Now, we generate fixtures while testing the app, so we get it for free!

And, just as important, we know for sure that our app is actually capable of generating the data we have. When maintaining a set of fixtures by hand or otherwise, that is not a given and you risk weirdness

vblagomir · February 22, 2017, 12:53am

I am doing the same stuff actually, it is just since data is interconnected (creation of a document also creates another document for example) the fixtures are generated not per feature but according to ‘natural flow’ of how users would normally create data by themselves…

By the way, I think that the original idea with using nightmare/electron for load testing won’t work. Simply because to render each page in electron seem to take drastically more computing power than to actually process the request (or someone could correct me if I am wrong). Thus I am thinking about performing load testing only on methods and simulate somehow publications (like request the same amount of data as a publication would ‘normally’ publish…) it wont be scientific however because I don’t know how much resources the server uses for any tasks other than methods… Do we have any measurements?

awatson1978 · February 22, 2017, 12:58am

Nightwatch + TestArmada + Magellen

Originally developed by Walmart to test Node/React apps (aka Electrode).

vblagomir · February 22, 2017, 11:03am

So I have investigated the topic further. Here are some thoughts:

It seems that PhantomJS is ‘getting old’ and can not render properly some webpages, so headless browsers alternatives are Electron (works well and said to be fast) and potentially some headless Chrome which I have not got much info about.
Nightwatch is great that it is basically a one-in-one combination of Mocha+Selenium and is mostly suitable for local machine automated browsers testing. There seem to be technical possibility to support Electron. The advantage in comparison to Nightmare is that it is full test package, that allows to write ‘assert’ or ‘expect’ functions right within the browsing/clicking scenario. Whereas Nightmare seem to firstly browse, something, only then it can check result. So checking a landing page with 10 links for example would take one script on Nightwatch and 10 separate scripts on Nightmare (that is what I understand from the docs). And Nightwatch docs are much more structured (and there is potential option that they would come up with a cloud testing solution).
I did not get the point of TestArmada yet. Mostly because of costs of parallel acceptance testing. Either you try to setup a ‘farm’ of testing computers in an office or rent the machines from SauceLabs and other providers. But the cost would be ridiculous if you’d like to do it on scale (if I understand it correctly). So instead of trying to emulate users behaviour through browsers, for load testing I would still prefer to call the methods and not open one thousand browser windows. Unless one computer can safely open at least 100 browser instances at once… @awatson1978, how does it work in your setup?

So the plan seem to be following - try to install Nightwatch and make it work with Electron (so that acceptance tests could be run on server). And then potentially add TestArmada with Magellan.

Update: Won’t work as even for Electron the Nightwatch would require the Selenium java server running… which seem to be an overhead… :-/

vblagomir · February 22, 2017, 1:13pm

So for Acceptance testing I have decided to finally proceed with Nightwatch instead of Nightmare, because:

there is much better syntax of chained browser commands and right away testing clauses - ‘go there, press this, check if it equals this, then go there and try that, check…’ (versus having to try to combine Nightmare browsing commands and write more complex Mocha tests per each Nightmare action)
open in real different browsers on local machine (would be good if it could work with windows and linux in virtual machines), versus heaving Electron-only testing, which may not be identical to real world browsers
potentially integrate with browserstack.com and run the same automated tests against all the possible browsers
potentially upgrade the setup with TestArmada and enter next dimension of customization (but still reusing the same Nightwatch-syntax tests)

DB load testing to be done by calling methods with fixtures generator and emulating publications through a separate mirror app that is connected over DDP.
Heavy real world like ‘brute force’ load testing using Loader.io or anything similar

The only downside is a need to run separate java Selenium Server (if to test on different platforms simultaneously, otherwise Chrome or Firefox can have their built in servers), but overall this one-time setup seem to be lesser evil to Nightmare+Mocha more complex syntax.

awatson1978 · February 23, 2017, 12:23am

Relevant documentation:

Nightwatch - Configuration & Setup
Acceptance Testing (with Nightwatch)
Continuous Integration & Device Clouds (with Nightwatch)

Nightwatch scales up to use BrowserStack and SauceLabs, so it can connect to device farms for testing different browsers. ChromeDriver support is very good, and my personal favorite for local testing and CI testing.

evolross · August 17, 2017, 2:30am

I had some luck with PhantomJS and service called www.redline13.com to simulate real, headless browsers hitting your app using WebDriver tests. I found this is pretty much required to get an accurate load test of a Meteor app. HTTP JMeter style tests don’t really load test your Meteor server at all. This was done pretty quick though. I’m interested in researching Nightwatch and Magellan more for a more complete solution.

Here’s a thread about it if you’re curious:

https://forums.meteor.com/t/poor-galaxy-meteor-performance-serving-small-bursts-of-users-load-test/38671

vblagomir · September 27, 2018, 1:13pm

I have double checked the redline13.com and it really seem to be the proper way of load testing Meteor apps. Especially with their webdriver package https://www.npmjs.com/package/redline13-webdriver and an example of a typical test here https://gist.github.com/richardfriedman/3df9be3ae82e24386a7dd171c1d5fb38 the job could be done well.

Their website and documentation is a mess, a proper page to start reading would probably be this: https://www.redline13.com/blog/2017/02/selenium-webdriver-cloud-performance-testing/

The question is if there are any alternatives to Redline13 to compare to?

alawi · November 29, 2018, 11:15am

@evolross any chance you would could share a snippet/info of a simple test again a Meteor app for redline13?

I was able to phantomJS tests against the app but the APM is not registering the session. What I’m tying to do is hit the app url, login and just hold there for the session be tracked by the monitoring agent.

vblagomir · November 29, 2018, 11:31am

@alawi It could happen because the connection is very fast, just give it few sec to load, here is the example that worked for me:

// include redline
const redline = require( 'redline13-webdriver' )

// load your driver via redline13
const browser = redline.loadBrowser('chrome')

// also login and pass for basic auth if you have one
browser.get( "https://login:password@your-app-domain.com" )

// important to wait a bit as otherwise APM won't register connection
browser.sleep(4000);

// if you have pro account you can have screens saved
redline.snap('screenshot.png')

I have noticed stable performance when selected Headless Chrome when using about 20 connections per single m3 instance.

alawi · November 29, 2018, 11:37am

Thanks @vblagomir for the quick quick response.

I did put a sleep for around 5min actually. Any idea how to open new tab to simulate many users ramping up?

vblagomir · November 29, 2018, 11:41am

I did not dig deeper at this moment (but might do within next weeks). So far just tried to create around 300 simultaneous connections and have noticed almost all of them in APM (and examined the screens created in redline13). Feel free to share your findings as well, I would start examining Selenium scripting for the “tabs opening”…

alawi · November 29, 2018, 11:42am

Alright hopefully @evolross has some insight as well. I thin it’d be useful to have tutorial on this once we get it going.

evolross · November 30, 2018, 12:06am

Actually Redline13 put out a blog post covering my use-case and they mention a lot of the tips I included in the above Meteor post. Here’s the blog post:

A Case Study: Load Testing with Galaxy, Meteor

Yeah Redline13 is pretty bare-bones and definitely has some quirks and rough-edges you have to work around, but they’ve improved it since I used it. A lot of tests used to just fail for no reason or never start. Once you use it a lot you get used to knowing what to look for. Most of my tips are in the article about two-thirds down the page. The cost is just amazing though. Compared to other tools. Their free tier is literally just passing on the AWS cost to your AWS account. It’s dirt cheap for the power they give. It’s basically a fairly thin wizard/UI wrapped around AWS.

Some tips off the top of my head:

You can push a lot of users to some of those big AWS machines. I was regularly doing tests of up to 4000 simultaneous users with (for example) 8 M524XL EC2 Spot Instances (so about 500 PhantomJS users/instances per AWS machine) hitting about 12 Galaxy Quad containers and measuring performance in Kadira/Meteor APM. I also did a lot of tests with various settings of redis-oplog (enabled or not), MongoDB Atlas (what tier of service/machines), etc.
Always use spot instances, it’s cheaper.
One strange quirk I noticed was that with PhantomJS every user shared the same localStorage so I had to tweak my code that uses localStorage because all 500 users per AWS machine were sharing the same values for some reason. This must be a setting or something somewhere that may or may not be able to be set in the free tier of Redline13.
I could never get my Selenium tests to do much with conditionals (e.g. like trying to determine what page of the app the test is on). The most I could get would be like hitting a URL, filling out a field, pressing a button (to create a doc - which as I mention in the blog is a good way to test how many users actually ran), waiting, pressing another button. I had to know the sequence ahead of time. I could never get Selenium to do if statements based on different templates in my app. If I tried to do conditionals on what view/template the app was showing it would crash or have a bug. I’m no expert though.
I could never initially hit the server with all the users at near the same time - which is a use-case I have. Redline13 and all those PhantomJS users/instances take a while to get created and start. It’s fairly quick (once the test begins which can takes several minutes) but it’s not the same as like 2500 all loading your app at the same time. However, I did add some logic that I could trigger from an admin page in my app that would cause all the users to do something at the same time once they we’re all loaded in and waiting. But the initial hit of all the users is a slow ramp.

I’m happy to answer any questions as you proceed.

alawi · November 30, 2018, 6:19am

Thanks @evolross for the tips.

Wow! I must be doing something wrong because I’m unable to simulate more than 40 users per instance. I’m not sure how are you simulating the concurrent users in phantomJS. I’ve used browser.executeScript('window.open("https://target.app.com");') to open multiple tabs per browser session. This is causing two issues:

The CPU get maxed out after around 40 tabs preventing any new tabs from being created.
If there is an exception on one of the pages, it’ll crush the whole browser with all its sessions. The exception I’m sometimes getting is a timeout exception at executeScript. So please any tips on how you managed to overcome those issues are really appreciated.

The other hard limit I hit is the 20 instances per zone, but if you’re able to simulate 500 users per server I don’t think you’re running into this.

In interim, I did manage to get some load/spike testing using puppeteer on linux machine on Digital Ocean with custom booting scripts. I’m still only getting around 40 tabs per vCPU so I’m trying to run those in parallel but I’ve managed to simulate 500 users using 10 instances, still nothing near your 500 per test instance!

evolross · December 1, 2018, 1:55am

I never tried to add any additional users, tabs, logins, etc. in my actual Selenium test. Just one user in the Selenium code. You achieve your scale of simultaneous users within the Redline13 UI. You set the number of machine instances, users per machine, and then it calculates the total users.

So you’re literally running (as I understand it) 500 instances/threads of PhantomJS per machine. That’s why you need the beefy machine or it will crap out.

That’s why I was so puzzled by the shared localStorage issue between the instances. Because it’s 500 separate running instances. Perhaps PhantomJS simulates localStorage to some shared location on the hard-drive. As I said above, I’m sure there’s a setting for that but it’s probably behind the “curtain” of Redline13’s UI (at least the free tier).