The State of Velocity and Testing in Meteor

who say’s I’m not relaxed? :smile: and I totally get it. I just think this kind of ranting about gender/tech belongs in it’s own post, as it has nothing to do with the stated topic.

to get back to the subject - What do you guys think Meteor will advocate for their testing section in the Meteor guide?

Searching for .mp4 in the source and pasting that link in a new window solved it here.
(There’s a related blog post here.)

I’ve just started using Chimp with Cucumber and WebDriverIO (Selenium) for a few complicated, automated UX tests. Works great for that and was very quick to set up even though I only have (basic) experience with Mocha (using mike:mocha(-package)).

With just those few tests in place, I’m already wanting to decouple test logic as much as possible, to reuse within that and other testing solutions.

On the value and scope of testing (i.e. pyramid structure), I think everyone agrees it’s very context-dependent. Maybe 70/20/10 is a good starting point until your project informs adjustment. Here’s a timed link to a lucid, mere mortal summary of the DHH, Martin Fowler and Kent Beck hangouts, which touches on some of that.

1 Like

Hmmm… I must be the only one whose sarcasm detector was going off the charts.

edit: I stand corrected :confused:

This is the same video here: https://vimeo.com/149564297 - it’s very similar to the blog post, but it shows a LOT more detail of how to do it in reality.

Emily is absolutely right and has a better idea on the problem than DHH does in my opinion. I really like the TextTest tool she created. You can achieve some very similar results by using subcutaneous testing to capture the data that will be used by the UI, instead of testing the UI itself. This can’t work if you put a ton of logic into the view, however if you write your code using UI components, this is awesome as you can unit test the UI components like crazy (with something like React Test for example). Subcutaneous testing draws the boundary above units and below the UI therefore they are considered integration / acceptance tests, without the penalty of being sensitive to UI changes.

Emily was also right in saying that the education needs to happen both with novice and advanced developers to show how acceptance testing is properly done and how it’s different to end-to-end testing.

Just my 2c!

3 Likes

I’m really curious on your thoughts of using an ‘hourglass shape’ that heavily relies on unit tests and end-to-end tests to make sure the units are ‘plugged together’ correctly. I’ve tried both and the hourglass shape has (so far) lead to faster iteration and more confident releases (well more unit tests than e2e but few integration tests). The video snippet at the bottom is what prompted me to share as it seems some other people use this too.

Here are my experiences:

tl;dr
Functional programming has made unit testing faster, easier and more reliable, and therefore don’t require as much integration tests. e2e test have become less flaky in recent years and not testing design/CSS/DOM in e2e tests helps keep it less brittle. It ends up looking like 90% / 5% / (all of the happy paths & a few critical failure paths)


end-to-end testing has gotten a lot less 'flaky' (for me) in the past year. Currently I don't have near the number of issues I did 3 years ago trying to get my 'working feature' to pass green. With the right abstractions tests (mostly) only broke when user behavior changed (as expected). Keeping the CSS out of them helped the most.

Having the right abstractions for clicking a button or filling an input also helped. Filling an input by it’s label name or clicking a button from it’s text name (instead of JQ/XPath selectors) led to less tests failing due to design changes. Though here if the text of those change, it breaks. However, these only take seconds to update and keep the DOM out of the tests so that the HTML structure can change as needed so I consider it the lesser of two evils.

80% of the time my integration tests gave me the most issues. Perhaps this is because of Meteor. They seemed to be the most brittle with refactoring and provided a false sense of security that my ‘units’ were working together correctly. Trying to test/mock out all the dependancies correctly was tedious as well (it’s clear what to mock/stub in a unit test).

Some pieces like models are great for integration tests because I want to make sure it’s working with my DB correctly or things like sending out an email when creating a user can’t be tested with end-to-end tests.

Migrating to functional programming has made my test code the most maintainable and made testing easy in general. It allowed me to use unit testing very heavily and effectively as I didn’t have to worry about globals and instances of objects that I couldn’t test.

React and Redux are also very functional and allowed me to unit test the tricky bits of UI with heavy logic and Redux allows one to unit test user interactions without the UI.

Some counter arguments may be that my apps were had more user facing functionality and perhaps a more backend service heavy app would require more integration tests. Also I’ve only been using this approach for 8 months so you could argue that the e2e tests haven’t had enough time to let atrophy sink in (although the UI has changed quite a bit).

Anyhow i’d love to hear any pushback/poking holes in this as it will help me build better tests :thumbsup:


[How to Stop Hating your Test Suite](https://www.youtube.com/watch?v=VD51AkG8EZw) *(queued up to the hourglass part)*

2 Likes

Because there is a year-long history of some Velocity team members playing favorites, scrubbing contributions from contributors in a biased manner, adding sexist and harassing language to package names and APIs, and making Velocity inappropriate for use in enterprise environments. There was a blatant power grab around the ‘officially sanctioned’ testing framework, and it played out in a typical sexist boys club manner. Typical.

Now that StarryNight can produce basic FDA compatible documentation, and provides an isomorphic API that is committed to avoiding non-inclusive language, I’m beginning a call-to-action to scrub any documents that go into the guide of language that can get flagged by an HR department.

I respect the fact that there are some people with egos and reputations on the line who want to save face. So, I’m not doubling down on the muckraking, and am giving them time to update APIs, documentation, and package names. But they certainly didn’t respect my contributions during Velocity development, so I’m not backing down either. Calling out inappropriate language is necessary in order to raise the bar with regard to professional language.

And I’m not just complaining from the peanut gallery. We’re providing an alternative solution, with working examples and documentation via an entire release track. Velocity may have been designed with industry ‘best-practices’ in mind; but if industry best practices mean bro-culture and harrassing language, you better damn believe that some of us will rally the funds and resources necessary to produce an alternative option that complies with federal regulatory approval processes and has inclusive language.

Having worked 15 years in QA and testing, and been involved with clinical trials and FDA approval of drugs and devices, in all that time, I never once had to ‘spy’ or ‘mock’ anything, nor use ‘cucumbers’ or ‘jasmine’ in a hospital or clinic. Not before Meteor. So I’m calling that language out as not being a best practice. It’s certainly not the healthcare industry’s best practice.

The state of the testing ecosystem is that there is now an FDA ready testing solution available. Some of us want to see the language cleaned up in the broader testing ecosystem; so we’re beginning the process of stating grievances and proposing an alternative solution. From here on out, there is a higher standard available.

ps: I personally don’t have any particular problem with chimp, or the technologies behind velocity and chimp. StarryNight uses most all of the same technologies, in fact. I and my clients do care about federal regulation 21CFR820.75, having verification and validation tests, a clean isomorphic API, printing test results to white paper, and being able to run the test runner on common 3rd-party continuous integration service providers. We’re starting from the assumption that there will necessarily be thousands of validation test scripts as part of any app, and clients will want to shop around for CI servers. That’s how our industry operates. But since the self-appointed leaders of the ‘official testing framework’ decided those concerns weren’t valid, were speaking up, putting our money where our mouth was, and providing an alternative testing solution for the clinical track.

1 Like

Justin reminds me of Jesse Pinkman from Breaking Bad :slight_smile:

Someone sent that presentation to me a while back, and I had it on my to-watch list, so I just watched the whole thing now and have to say it’s one of the best presentations about testing that I’ve ever watched, only second to Konstantin’s Modeling by Example breakthroughs. Let me explain why (which will also answer your questions)

tl;dr
You can still maintain the testing pyramid and reap all the benefits of Justin’s presentation if you start with acceptance tests to drive the development of your app at the domain layer, then create a UI on top and write a handful of UI based tests.


The testing pyramid came from a measurement that took into account the proportions of test type in relation to one another. A direct correlation was observed between testing strategies that did not have the proportions of a pyramid, and those that were painful, difficult to maintain and slow. The testing pyramid is a guideline that is based on the symptoms of test strategies. It does not really tell you a huge amount about the cause at a glance.

Justin’s presentation is awesome at talking about the cause and is 100% correct in every single thing he said. In particular, he speaks about what frameworks provide and that doing integration testing is tightly-coupled to what the infrastructure that a framework provides. In Rails it may be Active Record and in Meteor it would be Pub/Subs. He stipulates that applying testing at this level creates redundancy with unit testing for very little gain, if any. In that respect, he’s right that it is pointless and better to trust that a framework is doing it’s job. Of course the units are not enough so he says you should absolutely have some e2e tests that make sure the whole thing works.

It’s what Justin didn’t say that I find most interesting! He has in my opinion missed out the domain. All of the tiered application architectures speak of some sort of domain layer (AKA a service layer) which handles business or domain logic. This layer is what uses the framework infrastructure as a means to an end, which is creating value for the end consumer (user or system).

The domain consists of models that abstract the business domain of the real-world into concepts and entities, and the functions (or services) that transform the models based on interactions (user or systems) with the the business logic. You are already doing this in your functional programming approach which is awesome! What I love about document-based db’s like Mongo is that they usually model the domain nicely without requiring an ORM - but I digress!

If you imagine an application that has a completely speech based UI, then you can imagine simply talking to the domain and it responding to you. The UI therefore is a “joystick” to control the domain (as Konstantin mentions in his talk). So by using the domain, you get the value of the application and the UI is one way for you to get that value.

Acceptance testing is about making sure the service layer is using the domain entities and that the infrastructure provided by framework is working. This is much more than infrastructure and configuration testing, which is what Justin is advising against. Konstantin talks about this in the same talk and refers to the infrastructure leakage that occurs when you try to model more than the domain.

On the subject of tools being much better these days, and on your approach of using UI text as the locator strategy instead of the more brittle CSS and ID rules, these are 100% awesome practices to be employing and indeed reduce flakiness. Ultimately, what you’re doing is you’re doing your domain testing through the UI. This will work out for you, especially if you are not in a huge team where the shared understanding about what tests are doing is more difficult to communicate. Typically tests that run through the UI and click a few buttons don’t tell you WHAT an app is doing, they tell you HOW it’s doing it. This creates a translation cost for the next developer to pick up, where as the domain language does not have this cost. So there is more to UI testing than flakiness.

I was a HUGE proponent of outside-in testing up until quite recently when I have discovered there’s a much better way. When creating new features, to start a discovery process at the domain as we cover our domain fully, then we paint the UI on top. Using some polymorphism, the exact same acceptance tests can have a UI based version that actually goes through the UI. This ends up with a triangle and not an hour glass.

1 Like

Does that even matter? If they don’t want to accept contributions, isn’t it their right?

Could you provide some examples? I haven’t heard of these, and I want to understand what you’re talking about :confused:

Why do you care about the “offical test framework” title, and do you feel like there’s a better candidate than Velocity? It REALLY hard for me to actually believe that the fact that you’re a woman has any influence on any of this. Then again, I don’t want to impose ignorance on a real issue - it’s just really hard for me to believe people can be clever enough to write code, and still be dumb enough to think women are inequal to men.

3 Likes

I think the best thing is to split your app into packages and use test-packages

The todos example seems to use a special package just for its own testing?

https://github.com/meteor/todos/blob/master/packages/lists/package.js#L23

which uses some other test packages.
https://github.com/meteor/todos/blob/master/packages/todos-test-lib/package.js

Is that a best practice or just the most practical way to decouple the shifting landscape around tests?

Yeah we thought it was an easy way to list the testing packages once rather than reproducing them in every dependency list. You can see we did the same with the “todos-lib” package.

Hey all,

So I think most of the details have come out in this thread already but once more, all in one spot, here’s the current plan regarding the testing article of the guide:

Firstly, we are going to wait for 1.3 support for app testing (which should look a fair bit like the idea I outlined) before writing up this article [1]. Once that lands, we expect to recommend something like the following:

Unit + Integration tests

To run and report tests, we’ll recommend a system built off the above simple changes to core, combined with the Mocha runner and practicalmeteor’s driver package. All in all it’s a pretty simple system so I expect once it’s better understood what a test driver’s job is that we’ll pretty quickly have other options for reporters and test frameworks slot in easily. All in all, the differences between the popular unit test frameworks and assertion libraries seem pretty cosmetic so the concepts should be pretty applicable whatever flavour you prefer.

I handwavily wrote up what I thought this might look like in 1.3 (note a lot of this does not yet work) here if you are interested.

Mocks, Stubs, Spies, Factories, Fixtures etc

I actually think the orthogonal question of how you actually mock things out and test them sensibly in Meteor context is more interesting than the mechanical question of how you run tests. We have a fair few ideas that we’ve tinkered with to build the test suite for the Todos app but I think there’s more shared code that we could all produce that would make unit testing Meteor stuff easier. I’d love to hear more about what everyone’s doing in this direction in their own tests.

End to end tests

This part is simpler I think. Apart from the need to provide hooks into the app to prepare data and initialize the test environment [2], it’s really just the same question as how to run e2e tests against any modern websocket-driven web app. At this stage we are looking to recommend Chimp because it supports Mocha (for consistency with the above) and has some nice Meteor-specific features like watching the DDP connection for app changes. But at the end of the day, whatever webdriver based tool you prefer to drive client sessions will probably work equally well.

Continuous Integration

Seems pretty important, so a command-line reporter for whatever test suite you are using seems useful. We’re recommending practicalmeteor:mocha-console-runner. You can point a phantom instance at it easily with spacejam. The Todos app’s circle.yml shows how it’s all pretty simple to hook up once the pieces are in place.

Velocity

As @sashko alluded to above, the goals of the Velocity project were a lot more ambitious than what I’ve outlined above (a multi-framework reporter and a mirror system for parallel builds). Right now we can only consider those features as nice-to-have so we’re looking at what can be achieved without it.

[1] We do have a fair few unit tests in the current Todos example app, but it’s all built off a all-package approach and we’re sure that’s not what we are going to recommend in 1.3 so it makes sense to wait. If people need to write tests right now and are already using an all-package approach, then it makes sense to go for it in the same way we did. You can refer to the outline of our package testing approach if that’s helpful.

[2] Right now the best way to do this seems to be defining those hooks in a debugOnly package and running your tests in dev mode. This works although there’s probably something a bit nicer we can work out eventually…

8 Likes

I can’t quite figure out why, but for some reason whenever I start writing test cases for my Meteor apps again, I feel like I’m about to be an invalidated Tracker.Computation …

Sounds great to me! :+1:

Hello,
Why should we dismiss starrynight and Nightwatch for end-to-end testing ?

What are the benefits of starrynight and nightwatch over the approaches Tom suggested above?

1 Like

Can you provide some samples? I am a bit suprised, because I never saw anything like that.

I am not a testing expert, but at least “mock” is a common term in computer science for mimicking something that has not been implemented yet. And also spying is used quite a lot to illustrate that you inspect something from the outside, I don’t see any problems with this term.

Regarding ‘cucumbers’ and ‘jasmine’: Are these really terms the Velocity team created? I always thought that these were 3rd party frameworks they had integrated.

2 Likes

Thank you for weighing in, Tom.

All this looks great. Ronen has had the most sensible approach to backwards compatibility for well over a year now; and it was completely baffling when his contributions were rejected from the Velocity architecture. For those of us looking to generate audit documents for the entire stack, not just our own individual apps, it was obviously an integral part of the overall testing solution. StarryNight has had support for SpaceJam and Ronen’s approach for the past 6 or 9 months now; and my clients and I are 100% on board with this approach.

I’ve truly been trying to keep chill on this matter, and have let things slide for far longer than I should have.

Thank you for framing the discussion in this manner. FDA approval doesn’t require any of these items, per se; so my clients and I are fairly agnostic to it all. If other people jive on these problems; we’re happy to accept other people’s recommendations.

However, my clients and I do have HR departments to worry about. I’ve been truly trying to keep chill on this topic; but it irritates me to no end. I’m trying to take the compassionate view here, that other people in the Meteor development community may not be aware of what might get flagged by HR departments in the healthcare industry. Some of the developers aren’t native english speakers. Or haven’t had administrative experience sitting in HR review panels and hiring/firing people.

Suffice it to say that ‘Mocks’ and ‘Spies’ are inappropriate terms for an API within the healthcare industry. Perhaps these terms are okay in the gaming or military-industrial industries. But in healthcare, they’ll get flagged.

We’re fine with Stubs, Factories, and Fixtures. We’re fine with the underlying technologies of creating pseudo objects and observing them. But the Spy and Mock terms are problematic when it comes to team management and the sales pipeline.

Nightwatch has supported Mocha since 8.0. Gagarin also supports Mocha. Between these two, StarryNight is completely compatible with Mocha (and Chai) as an isomorphic API. StarryNight has a ton of Meteor specific features.

All this seems great.

Maybe this holds true for the US, but in the rest of the world, people are way more relaxed. Even in the health and pharma industry. Been there, done that.

1 Like

I’m suggesting starrynight and nightwatch in addition to what Tom is suggesting. They’re complementary technologies.

Must everything be a competition? Truly, having the leaderboard as the default app that people test with has created a toxic culture of competition in the Meteor community. It’s not a zero-sum game.

Look, it’s all well and good to have unit testing. And truth be told, my clients and I really don’t give a crap about how the unit testing is implemented. All of those details are obviously in good hands. Tinytest was sufficient for our verification testing needs; so anything more is just icing on the cake.

However, the acceptance testing and end-to-end integration… that’s simply never been up to par with what we need. So that’s where we’ve been concentrating our work. We have federal regulations that we have to pass… FDA, JCAHO, CLIA, CCHIT… all these regulatory agencies look to the FDA testing requirements as a guide. And it simply doesn’t matter if Google or Facebook use 100% unit tests, and have written off end-to-end testing as a dead practice.

The federal government requires validation tests for any software devices involved in food or medicine; and therefore my clients and I have to support it. And that means long ass test scripts. 15 years ago, there would have been a QA team that would go through those tests by hand. That’s how I started out in testing… once a week, team QA meeting, and afternoons spent running through hundreds of EMR test scripts by hand. Nowdays we can automate it. But it still needs to be a test script that a human could walkthrough. Click here, input this text, check this loads, etc. Hundreds of pages of them. That’s what’s standard in the healthcare industry, and expected by federal regulatory agencies.

And the simple fact of the matter is that Nightwatch has long had the most sophisticated end-to-end and acceptance testing around. It’s method chaining syntax, custom commands, and the nightwatch.json file put it in a league of it’s own. It’s an expert system, for sure. Not for beginners new to testing. But for complex testing scenarios, I have yet to see anything match it’s configurability. I’ve run scripts with other a thousand commands in a single method chain. Rock solid. Is that the right testing solution for everybody? No. But it’s right for us; and gets us our validation tests that we need for regulatory approval.

Plus, Nightwatch is in active development, has a strong community, a clean API, a fairly balanced gender representation, and excellent documentation. And every time we see an alternative solution announce a new feature, it’s usually already been implemented by Nightwatch 6 months earlier.

Starrynight extends the Nightwatch testing story by adding:

  • scaffolding of default Nightwatch test files for Meteor
  • .meteor/nightwatch.js generation
  • continuous integration config file templates
  • custom nightwatch commands tailored to Meteor
  • package level acceptance testing commands
  • atom.io IDE integration
  • environment debugging
  • sharing of ChromeDriver between Gagarin and Nightwatch
  • integration tests and all the benefits of Gagarin
  • package verification tests
  • etc.

To put it in perspective, practicalmeteor:mocha, velocity, and chimp are running 100 and 400 meter dashes, and focusing on winning sprints. But my clients and I are marathon runners, doing a completely different kind of race. We truly don’t care how the people running sprints want to train or what techniques they use. They have the best tool available for unit testing? God bless. But for the sprinters to self-appoint themselves leaders of the local Track and Field Athletics club, and to claim that they’ve got long-distance running covered by their 1600 meter relay… and scrub any contributions from the marathoners? That behavior is going to get pushback.

And that’s why I’ve been holding my tongue. The problem is endemic to the broader Javascript and tech industry. Lets drop this, before I dig up more skeletons and mud.

Sure. But my clients and I are generally in the US, trying to get projects FDA certified. They’re trying to sell into Stanford, UC San Francisco, Harvard Chan School of Public Health, Harvard Medical School, University of Chicago, New York Presbyterian Healthcare System, UPenn, etc.