Meteor database architecture, prefer static or dynamic creation of mongo collections, what do you prefer?

Hello,

I’m dynamically creating forms and I would like to ask about your preferences. User is able to create and design structure of his forms. I have already implemented it and it works, but I’m tstill thinking about the mongo data design. If user will create new form, which variant do you prefer?

Variant1:
Are you prefer store data from all forms into the single collection for example “formData” with documents from all the user created forms (but the documents inside will be have another data structure) - mongo collections will stay fixed - unchanged?

Variant2:
…or do you prefere dynamically creating of new collections - each separate collection for each form.

*notice: I’m using tenant, therefore I will need to use namespace prefix before collection name.

**I’m for example thinking about the situation, when I will be removing form (and it’s related collection), which can be for example subscribed on frontend …or I don’t know, maybe there are another disadvantages…

Which approach do you prefere in your projects?

THANKS A LOT guys for an each help!

I think it depends on how you use the data. If you only need to fetch all the form data from a form using its ID, then I prefer Variant 1.
Variant 2 may look more organized, but it complicates your system.

1 Like

Hello @minhna, how are you? Thank you for your view.

I’m currently using variant 1. But: the “anti-pattern” starts for example, when I have “fixed developed features” in separate collections (for example some more complicated features) and I need to used it in forms (DropDownLists) in “dynamically created forms” (stored in single collection).

I’m not telling, that it’s imossible, but I’m probably connects 2 different approaches. Therefore this is the reason, why I’m thinking about “change” my current “model” and split data into separate collections.

BUT: there is also another point, because, I’m using tenant, therefore, It will make me some “mess” in my dynamically generated collection names, which I must “prefix” with tenant namespace.

Thanks a lot for each your advice.

caveat: I do not have experience in dynamically creating collections/tables

As with my caveat above, I will go with a single collection containing the form structure and another collection containing the data.

On top of my head, the biggest issue I can think of (without knowing the other features of your app) is a performant search on the form inputs when you already have millions of data. If this is an issue, instead of creating dynamic collections, I will send the data to something like opensearch and create dynamic object fields when creating dynamic indices for each form. i.e., MongoDB holds the structure and raw data, while opensearch holds a per-form index for easy searching.

1 Like

When you have large number of documents you can use sharding future to distribute your data across replica set. You cannot do it when you use dynamic collections.

1 Like

HI @klaucode here is how I do it:

  • I use react-hook-form
  • I use multiple static collections. For example, I have a collection of form components, one for text-lengthy disclaimers, one for packages (when you want to add tiers of service/product to a form) and one collection for the actual forms, where users save the forms they create.

Form data is … json returned from MongoDB, it can come from anywhere. In react-hook-form I pull things such as registration data of a person and populate (as default values) in the React Form.

How a form looks like:

{
  "_id": "sBa2es2AReTwPnevm",
  "type": "form",
  "form": [
    {
      "type": "form",
      "id": "Rthf2bCqnDPrqPaFP"
    },
    {
      "type": "form",
      "id": "eSalx5y7SXiT5mL3Q" // this is detailed below.
    },
    {
      "type": "form",
      "id": "oPelQ3eZxavEWfinr"
    },
    {
      "type": "form",
      "id": "Azhh2bCqnFarqTa21"
    }
  ],
  "entity": "orgs",
  "conditions": {},
  "name": "Agency Player Form",
  "description": "Registration or update of player",
  "locale": "en",
  "id": "cZbRM9ENz6sY3dYFR"
}

Here is a form component that includes multiple form components from the same collection (think from area and form fields)

{
  "_id": "eSalx5y7SXiT5mL3Q",
  "id": "cZbRM9ENz6sY3dYFR",
  "type": "org",
  "name": "Player Details",
  "description": "FirstName, LastName, DOB, Nationality",
  "components": [
    {
      "id": "ArLorfzDS55gtisSs",
      "type": "text",
      "field": "firstName",
      "label": "First Name",
      "placeholder": "Enter first name",
      "size": "small",
      "xs": 12,
      "md": 6,
      "lg": 6,
      "rules": {
        "required": true
      }
    },
    {
      "id": "sfeii3Xxxq6NnS2Wk",
      "type": "text",
      "field": "lastName",
      "label": "Last Name",
      "placeholder": "Enter last name",
      "size": "small",
      "xs": 12,
      "md": 6,
      "lg": 6,
      "rules": {
        "required": true
      }
    },
    {
      "id": "7LiderotrqSvhmNNG",
      "type": "date",
      "field": "dob",
      "label": "Date of Birth",
      "placeholder": "Date of Birth",
      "size": "small",
      "xs": 12,
      "sm": 6,
      "md": 6,
      "lg": 6,
      "rules": {
        "required": true
      },
      "openTo": "year",
      "inputFormat": "DD-MMM-YYYY"
    },
    {
      "id": "oMsJua3czDSsPjiub",
      "type": "select",
      "field": "passport",
      "label": "Nationality",
      "placeholder": "Select Nationality",
      "size": "small",
      "xs": 12,
      "sm": 6,
      "md": 6,
      "lg": 6,
      "rules": {
        "required": true
      }
    }
  ],
  "locale": "ae"
}

Here you can see a form built and stored as above: EMIRATES SKATING CLUB

1 Like

Thank you @minha for your answer, it’s very important, now I’m little bit closer to shared collection pattern.

If sharding is not an option, the best approach depends on the number of tenants and data volume per tenant. Here’s a breakdown of the options without sharding:


:one: Shared Database, Shared Collection (One Collection for All Tenants)

:pushpin: All tenants share a single collection, distinguished by tenantId.

Example Structure

db.orders.insertOne({ tenantId: "tenant1", userId: "user123", orderId: 1, item: "Laptop" });
db.orders.insertOne({ tenantId: "tenant2", userId: "user456", orderId: 2, item: "Phone" });

Pros

:heavy_check_mark: Simple management – No need to create collections dynamically.
:heavy_check_mark: Efficient indexing – One index works for all tenants:

db.orders.createIndex({ tenantId: 1, userId: 1 });

:heavy_check_mark: Good for a smaller number of tenants (up to a few thousand) with moderate data per tenant.

Cons

:x: Performance issues at very large data volumes – Queries can slow down if the collection grows to hundreds of millions of documents.
:x: No strict data isolation – All tenants share the same collection.

:small_blue_diamond: When to choose?
:point_right: If you have fewer than 1,000 tenants and a moderate amount of data per tenant (up to tens of millions of records).


:two: Shared Database, Separate Collections (Each Tenant Has Its Own Collection, Shared DB)

:pushpin: Each tenant has its own collection, e.g., orders_tenant1, orders_tenant2.

Example Structure

db.orders_tenant1.insertOne({ userId: "user123", orderId: 1, item: "Laptop" });
db.orders_tenant2.insertOne({ userId: "user456", orderId: 2, item: "Phone" });

Pros

:heavy_check_mark: Better data isolation – No risk of one tenant affecting another’s data.
:heavy_check_mark: Faster queries for a single tenant – MongoDB scans a smaller collection.

Cons

:x: Scalability issues with a high number of tenants – Managing thousands of collections can be inefficient.
:x: Higher memory usage – Each collection requires its own indexes.

:small_blue_diamond: When to choose?
:point_right: If you have 1,000 – 10,000 tenants, and each has a significant amount of data.


:three: Separate Database per Tenant (Each Tenant Has Its Own Database)

:pushpin: Each tenant has a completely separate database, e.g., tenant1_db, tenant2_db.

Example Structure

use tenant1_db;
db.orders.insertOne({ userId: "user123", orderId: 1, item: "Laptop" });

use tenant2_db;
db.orders.insertOne({ userId: "user456", orderId: 2, item: "Phone" });

Pros

:heavy_check_mark: Best data isolation – No shared indexes or risks of cross-tenant access.
:heavy_check_mark: Easier backup and archival – Each database can be managed separately.

Cons

:x: Not scalable for a large number of tenants – Managing 10,000+ databases is impractical.
:x: Hard to scale without sharding – Each database has its own size limitations.

:small_blue_diamond: When to choose?
:point_right: If you have fewer than 1,000 tenants, but each has a very large dataset (100+ million records).


:rocket: Summary: Which Solution is Best Without Sharding?

Number of Tenants Data per Tenant Best Approach
< 1,000 Small/Medium Shared Collection
1,000 – 10,000 Large Data Volume Separate Collections
> 10,000 Extremely Large Data Difficult without sharding, sharding recommended

:point_right: If sharding is not an option and you have a high number of tenants (>10,000) or large data volumes per tenant, scaling will be challenging.

If possible, Shared Collection remains the most efficient solution as long as data volume does not exceed MongoDB’s collection limits (~terabytes). :rocket:

If sharding is not an option, the best choice depends on the number of tenants and data volume per tenant. Here’s a simple guide to help you decide:


:white_check_mark: Best Solution Without Sharding?

Number of Tenants Data per Tenant Best Approach
< 1,000 Small/Medium Shared Collection (1 collection for all tenants)
1,000 – 10,000 Large Separate Collections (1 collection per tenant)
> 10,000 Very Large Difficult without sharding, consider sharding

:trophy: Recommended Solution for Most CasesShared Collection

If you have less than 10,000 tenants, the best option is a single shared collection where each document includes tenantId.

Why?

:heavy_check_mark: Easier to manage – No need to create thousands of collections.
:heavy_check_mark: Efficient indexing – A single index can serve multiple tenants.
:heavy_check_mark: Better query performance – MongoDB optimizes queries when properly indexed.

When to Avoid It?

:x: If you have millions of records per tenant, queries might slow down without sharding.


:trophy: When to Use Separate Collections?

If you have 1,000+ tenants and large data per tenant, separate collections (orders_tenant1, orders_tenant2, etc.) can improve performance.

Why?

:heavy_check_mark: Better query performance – Each query targets a smaller dataset.
:heavy_check_mark: More isolation – Each tenant’s data is in its own collection.

Downsides?

:x: MongoDB struggles with too many collections (10,000+ collections can cause indexing issues).


:trophy: When to Use Separate Databases?

If you have fewer than 1,000 tenants but each has a huge dataset (100M+ records), separate databases (tenant1_db, tenant2_db) offer full isolation.

Why?

:heavy_check_mark: Best for high-volume tenants – No shared indexes, better scalability.
:heavy_check_mark: Easier backup and archiving – Each tenant can be managed separately.

Downsides?

:x: Not scalable beyond 1,000 tenants – Managing thousands of databases is impractical.


:dart: Final Answer

:one: If you have <10,000 tenantsUse Shared Collection (Best for performance & management).
:two: If you have large data per tenant (100M+ records per tenant)Use Separate Collections.
:three: If you have very few but extremely large tenantsUse Separate Databases.

:rocket: Without sharding, Shared Collection is the best balance between simplicity and scalability.

Given your setup (<1000 tenants, ~50 dynamic forms per tenant), the best approach without sharding is Shared Collection.

Why?

:heavy_check_mark: Efficient Indexing – A single collection with indexes on tenantId and formId will keep queries fast.
:heavy_check_mark: Easier Management – No need to create and maintain thousands of collections.
:heavy_check_mark: Scalable Enough – As long as individual forms don’t store massive amounts of data, performance will remain good.

Recommended Schema

db.forms.insertOne({
  tenantId: "tenant1",
  formId: "form123",
  name: "Customer Feedback",
  fields: [{ label: "Name", type: "text" }, { label: "Rating", type: "number" }]
});

Indexing Strategy

db.forms.createIndex({ tenantId: 1, formId: 1 });

This ensures fast lookups per tenant while keeping index memory usage low.

Would you like optimizations for specific queries? :rocket:

Thank you very much @paulishca for data and production example!
Ice Hockey is most popular sport in our country :smiley:

Most important for me is, how do you store data from your form, because: In my case I have multiple forms (about 50 forms for each tenant), I’m currently storing all data from all forms into the single collection. But I’m currently thinking, If it is good approach or if is not better to separate data from forms into separate collection 1dynamically created form = 1dynamically created collection.

…in my case forms are using data from another related forms for example:

  • fruitTypes {title, description, …another fields}
  • fruits {title, description, => fruitTypeId (dropDown item related to fruitTypes with {_id, title} fields) <= }

Thanks a lot for each your suggestion.

In my country the national sport is Mall-ing (this is UAE :slight_smile: )

Ok, I will soon build something for Cycling and maybe this can give you some ideas.
Looking at the example you give here, it looks like e-commerce, or purchase order processing or maybe delivery app/services.
I am not sure what you use forms for. Is it more to create products or rather to order them.

I see some things here that need to be considered.

  1. Who owns the item, the form, the inventory, etc
  2. Is the item open quantity/price (or doesn’t have a bar code, for instance).

Example Structure

db.orders.insertOne({ tenantId: "tenant1", userId: "user123", orderId: 1, item: "Laptop" }); 
db.orders.insertOne({ tenantId: "tenant2", userId: "user456", orderId: 2, item: "Phone" });

Use itemId instead of Laptop, Phone cause this Laptop might be sold by multiple tenants and you want to help everyone with the correct photo, name P/N, specs. Phone and Laptop would exist in the SKUs collection.

Look, an example from my cycling platform:

You can sell a Cannondale bike such as mine by saying: Cannondale Trail 3, 2019 black (you created the item). But I don’t want you to struggle with insufficient data or trying to get it from different places and get it wrong or find a photo on the net with a bearded fat guy in colorful Lycra.
So I give you all that by being the owner of all the SKUs and you take it from me:

It could be me or it could be another user who needs a new SKU and adds it to your plaform and you just approve it into the “global SKUs” of the platform.

I would use a collection of items with SKU. These are “official” items, they cannot be created or destroyed by others (like gov taxation).

A separate collection with more flexible items and standard fields for category, subcategory, type of units (pcs, kg) etc.
Fruits is the category, fruitTypes would be subcategories, I guess.
Open price items can be owned by the one who creates them, you may have 1000 different types of 1 kg of apples.
However, SKUs are unique; they belong to the platform; others can only own an inventory of the SKU but not the SKU. I could buy the exact condom from 10 different shops. Just to clarify … I blow them (with air) and can make rabbit, giraffes or the MeteorJS logo.
All collections should be on the same cluster (DB) so you can use aggregations.

When the information object is an order (purchase order or deliveries), you can save it in a collection of orders, the reference to the item itself is via the itemId and you can get its data with an aggregation (get children but avoid getting children of children, which should not be your case anyway).

Where to put data:

  1. SKUs and open items all in one collection (let’s say Products), indexed by category, subcategory, ownerId. (If no owner Id, the platform is the owner).
  2. Inventories of those items in per-tenant collection or all in one collection. For instance: the price, available quantity, rating etc, stay in the inventory (or products) collection. The item of an inventory collection references a product in the SKU/ open item collection.
  3. All form components in one DB. You would have fields, selectors, checkpoints (interaction), titles, subtitles, headers, dividers etc - items for only presentation and no interaction.
  4. form components with interaction follow the SKU/product schema and the inventory schema.

In a form you could have:

// fields

{ label: "Name", type: "text" }, // as per SKU schema
{ label: "Category", type: "text" }, // as per SKU schema
{ label: "Subcategory", type: "text" }, // as per SKU schema
{ label: "Rating", type: "number" }, // as per Inventory schema
{ label: "Price", type: "number" }, // as per inventory schema. 
// I call it an inventory schema but if can very well be the Products collection while the SKU collection contains the items from which a product can be "built".

One other thing I would consider is maintainability and scalability. For analytics (for instance, the average price of apples over time, over regions, rural vs small urban vs large urban). Ideally, all this data should be in the same small set of collections, or as you operate the platform, copy analytical data to specific collections for small or big data querying.

Ok, I might have gone off topic here, your question might have been more simple or a whole different question.
Sharding you would use if you cross borders and need to have specific data such as private information data of citizens, for example, that need to stay within the country or if you need to be close to your users (you have a client in France and one in Australia). - e.g. users in collections in each country, SKUs shared by both.

Hello @paulishca thanks a lot for your longer response. Sometime I will plan travel to UAE probably to the next Dubai Expo. Now back to the app…

In my case I have 2 different points:

Point1:
Static or semi-static data structure:
In my case som data from “fixed” logic is “siting” on separate “fixed” collections for example “projects”, “invoices”, “products”, …etc. But forms related to this collections I will make dynamically generated (each form field will be dynamically generated and some fields, on which is some “code logic” will be fixed - you will be able to work with them, set design and etc. but still they have to by preset in form. Next fields without logic will be fully flexible.

Point2:
Fully flexible and configurable structure:
I’m able to:

  • create fully configurable forms (for example “customers”)
  • create fully configurable form sub-fields,
  • create fully configurable sub-forms (for example “customerType”)
  • create subform field in parent form (for example customers: { customerTypeId: “_id of customer” })

Point3:
Currently not solved (data acquisition data):
Store data acquisition information, aprox. 500K new collection items per day per customer (instead of my app supports multiple tenants, all app instances will be run separately - run all on single instance is risky I think).
Structure of this data aquisition collections I want also make fully configurable.

And now, I have tons of questions: :smiley:

###Question/Problem1:
I need to make a decision about data model - choose between 2 approaches. Store into multiple collections or store into single collection.

  • If I fill ave somewhen later with some functionality, which will be “very hard” to make it universal and modular, still I will be have possibility to “hard code” functionalities and collections will stay almost unchanged (If I will store each form data into separate collection, this is I think strong argument).

###Question/Problem2:
Searching for a solution: how to store relations between forms to ensure data integrity, If I will be remove some data, for example, If I will remove customer, but those customer is “linked/used” as “customerId” in multiple collections.

###Question/Problem3:
Searching for a solution: some data structure to define “very robust” form validation based on data from another related form field values (this is related with Question2) with combination of current user role permissions and etc.

###Question/Problem4:
Searching for solution (currently I have no idea), how to manage triggers (when in some collection I will make CRUD operation) and how to define “calculation formula” with logic for example: "sum(fieldX) in form1 and store into somewhere else into some fieldY in form2"

###Question/Problem5:
…all another :smiley: