Using built-in MongoDB schemas

davidsavoie1 · March 4, 2021, 4:18pm

I know that MongoDB allows for the definition of a built-in schema that will validate document writes before committing them. I suppose that documents would be validated even when modified from outside the application, unlike when using a library (like Simpl-Schema).

Does anybody know if there is a way to use this validation method?

Thanks for your help!

illustreets · March 4, 2021, 7:20pm

I don’t know of a way to do this using the collections API available in Meteor, in part because I never needed to (see below). But it shouldn’t be difficult to us the Mongo driver and create collections like this: Schema Validation — MongoDB Manual

Why not use GitHub - Meteor-Community-Packages/meteor-collection2: A Meteor package that extends Mongo.Collection to provide support for specifying a schema and then validating against that schema when inserting and updating., which basically enforces Simpl-Schema checks on inserts and updates?

davidsavoie1 · March 5, 2021, 2:06pm

I guess (and hope!) that MongoDB would prevent writes from any origin if they don’t respect the schema, whereas Collection2 would do it for a single application only. If multiple applications share the same DB, each must define their collections the same way, with the same Simpl-Schema, to ensure consistency. Trouble might arise if one says that foo should be a string, and the other a number.

I just think that validation should be done the closest to the insertion point. And I was wondering if anyone had experience using the built-in schemas.

illustreets · March 5, 2021, 2:31pm

Good point! Now I understand what you meant by “outside the application” in your original post. I actually took an interest in the matter as a result of your question, because schema checks as close as possible to the database are important. Besides, I think there is the performance question - Simpl-Schema can be quite slow on large objects.

We also have several services writing to MongoDB and pre 3.6 we got away with it by sharing schemas as git submodules.

While searching, I came across several writeups, like this one; they all point to the fact that Mongo will respect the rules you enforce through $jsonSchema. This includes even preventing the insertion of new fields.

Also there seems to be a validationLevel field which dictates how validation should be enforced when updating existing documents; more info here.

Coming back to Meteor. I would assume you get a reference to the database your collection is using by calling rawDatabase, then write something like this:

db.createCollection("MyCollection", {
   validator: {
      $jsonSchema: { ... }
   }
}

After that just use the Meteor.Collection API to declare your collection the usual way. Try insert / update some documents and I’d expect it to work as advertised.

Let us now the results. Good luck!

radekmie · March 5, 2021, 5:14pm

I’ve been using this feature for quite some time now (as I checked, October 2018) and I can say it works as advertised (described in the documentation). As I really prefer JSON Schema over any other validation solution, having one schema for both the app and the DB was even better.

I do use a highly restrictive validation (validationAction: 'error' and validationLevel: 'strict'), you don’t have to. Just to make sure everything works fine, start with warnings only (docs) or bypass validation completely.

Performance-wise, I haven’t seen any performance impact. I believe it impacts the performance a little and it may be crucial for a DB with a high amount of writes. Just keep that in mind.

However, while SERVER-20547 is technically closed, it’s not released yet - it’s scheduled for MongoDB 5.0 (mid-2021). It means that you won’t know, why the validation failed. That’s why you have to validate the document in your app anyway. At least as long as you care about meaningful error messages.

Here's a script I'm using to update the schemas on application startup:

import get from 'lodash/get';
import isEqual from 'lodash/isEqual';

async function updateSchema(name: string, schema: object) {
  const definition = await db.command({ listCollections: 1, filter: { name } });
  const prevSchema = get(
    definition,
    'cursor.firstBatch.0.options.validator.$jsonSchema',
  );

  if (!isEqual(schema, prevSchema)) {
    await db.command({
      collMod: name,
      validationAction: 'error',
      validationLevel: 'strict',
      validator: { $jsonSchema: schema },
    });
  }
}

davidsavoie1 · March 5, 2021, 9:34pm

Wow, thanks guys! I’ll have a look at what you shared. So basically, if I understand well enough, I would:

create the mongoDB collections with validators on startup;
create the Meteor collections normally.

Then, writes would be prevented, but without much details about why it failed, right?

Use case

Simpl-Schema is able, from what I understand, to interpret a modifier object and determine if the result should still yield a valid document when the update completes, without having to fetch each document beforehand. However, Simpl-Schema is really hard to work with when trying to define sophisticated checks referencing multiple fields or with varying schemas.

So, in my projet, I use a more robust validation library. Validating on insert with it is pretty easy, since I have the entire document in my hands before comit. Validating on updates is trickier, since I need to:

fetch the existing document;
simulate the impact of applying the modifier on it (with a Minimongo utility function)
validate the document with validation library.

It’s all right when I update one document through a form, for example. However, I also use after update hooks on other collections to trigger updates. Those can target a large amount of documents at the same time. If I need to fetch each document, simulate modification, validate before commit, it’s a very big overhead for the application server and the DB.

Hence, I would be happy to validate in methods with my fancy library on single document updates coming from the client, ensuring changes are coherent, while using a simplified schema attached to the Mongo collection itself just to ensure data integrity in case I miss something crucial in an after update hook.

Would you have other suggestions?

radekmie · March 6, 2021, 9:34pm

[…] Then, writes would be prevented, but without much details about why it failed, right?

Yep.

Would you have other suggestions?

IMO a schema in the DB is a last resort. It’s good to have one, but you shouldn’t rely on it like on a data validation - it’s more of an integrity validation. Do your best with the validation in your app (whatever it means to you) and don’t be afraid to return a “database said your data is bad” error. It’s much better than inconsistent data.

davidsavoie1 · March 9, 2021, 6:06pm

@radekmie, when you get the current validator to check if it has changed, could you use the db.getCollectionInfos() function instead of using the path described in your code snippet? Just wondering…

radekmie · March 9, 2021, 6:25pm

Is it actually accessible in the MongoDB Node driver? I couldn’t find it.

davidsavoie1 · March 9, 2021, 9:17pm

I’m not sure, I didn’t try it yet… I just read about it in the docs, thought it might be relevant for you.

jam · January 27, 2024, 5:46pm

Take a look at jam:easy-schema. I think it has what you’re looking for. Let me know what you think!