Anyone tried mongodb.com's Atlas Search?

a.com · May 6, 2020, 11:17pm

Is this like an (backend) alternative to algolia?

anyone given it a go?

I usually do an fuzzy search just in native mongo… not great but it’s passable for apps that aren’t search engines.

mullojo · May 7, 2020, 3:22am

Oh, this looks cool! I have some smaller search algorithms that I built manually after loading data in my client memory, but I have a scaling limitation with that, so I’ve been open to new search solutions.

A proper search indexing system built into MongoDB is the ideal concept in my mind, hopefully one less thing I need to do

I’ll try this with my app soon

mullojo · May 7, 2020, 3:45am

Wow, so easy, just a couple clicks inside Atlas to turn on the search indexes, then to run a search you just use an aggregate, I’ve used many of these already for other data analysis, so easy!

Add search to any Meteor app now in 5 min

db.movies.aggregate([
  {
    $search: {
      "text": {
        "query": "baseball",
        "path": "plot"
      }
    }
  },
  {
    $limit: 5
  },
  {
    $project: {
      "_id": 0,
      "title": 1,
      "plot": 1
    }
  }
])

mullojo · May 7, 2020, 3:57am

Docs are here if anyone is feeling lazy:

Use rawCollection() to query MongoDB with an aggregate function…

See Meteor Docs if needed: Collections | Meteor API Docs

marklynch · May 7, 2020, 9:25am

We’re using this and really impressed by it. With the regular text search we had to generate ngrams to make it work “as you type” but with this search it’s much easier. The only downside I see (well… other than the lock-in to Atlas) is that the searchBeta stage needs to be the first one in your aggregation so if you run a multi-tenant system like us that’s a very inefficient way to find results.

hemalr87 · November 28, 2020, 5:17am

Hmm, this sounds promising. Trying to experiment with it to replace some in-code search processes we have that are getting a bit slow as our collection grows.

For some reason though, I keep getting an error:

MongoError: Unrecognized pipeline stage name: '$search'

I created a simple collection called Notifications for this test, with a single field: ‘title’ and added some dummy documents to it.

The rawCollection/aggregate part seems to be working okay because the below code returns the single matched document:

try {
	const res = await Notifications.rawCollection()
		.aggregate([
			{
				$match: {
					_id: 'EAMo8wdqMzWKN96da',
				},
			},
		])
		.toArray();
	console.log(res); // Single document returned and logged
} catch (err) {
	console.error(err);
}

I’ve set up a search index on the ‘title’ field via the Atlas UI as per the tutorial instructions. This below code, however bring up the mentioned error.

try {
		const res = await Notifications.rawCollection()
			.aggregate([
				{
					$search: {
						text: {
							query: 'event',
							path: 'title',
						},
					},
				},
			])
			.toArray();
		console.log(res);
	} catch (err) {
		console.error(err); // MongoError: Unrecognized pipeline stage name: '$search'
	}
};

The db is running version 4.2, and the index is building successfully.

Have I missed anything obvious here?

Edit - Would this be a case of certain outdated packages? If so, which ones? (I’m on Meteor 1.11 - so the drivers should have support for Mongo version 4.2).

rjdavid · November 29, 2020, 3:23am

Are you using Atlas? If I remember this correctly, this is still not available outside Atlas.

hemalr87 · November 29, 2020, 3:58am

Yup, this is on Atlas. I confirmed that all the settings on Atlas are correct by setting up a stand-alone node.js app and running the same code using the native node mongo drivers.

So this code works and returns the results as expected:

const MongoClient = require('mongodb').MongoClient;
const uri = <MongoURL>
const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect(async (err) => {
	try {
		const res = await client
			.db('meteor')
			.collection('notifications')
			.aggregate([
				{
					$search: {
						text: {
							query: 'event',
							path: 'title',
						},
					},
				},
			])
			.toArray();
		console.log(res); // Logs a list of search result documents
		console.log(`${res.length} results`);
	} catch (error) {
		console.error(error);
	}
	client.close();
});

I tried to mimic a workaround by using the mongo client within the Meteor server:

import mongodb from 'mongodb';
...
const client = new mongodb.MongoClient(uri, { useNewUrlParser: true }); // Error: <rejected> TypeError: MongoClient is not a constructor

Haven’t figured out what/why that isn’t working because I was under the impression that Meteor ships with the native node mongodb driver. @mullojo and @marklynch got this working though so the fault must be in my Meteor code somewhere.

a.com · November 29, 2020, 1:03pm

hemalr87 · November 29, 2020, 11:13pm

Cheers, for that.

I went ahead and gave that timestamped code from your first suggested video a try:

try {
		const res = await Notifications.rawCollection()
			.aggregate([
				{
					$search: {
						autocomplete: {
							query: 'event',
							path: 'title',
						},
					},
				},
			])
			.toArray();
		console.log(res);
	} catch (err) {
		console.error(err); // Same error: MongoError: Unrecognized pipeline stage name: '$search'
	}

Unfortunately the error persists: MongoError: Unrecognized pipeline stage name: ‘$search’

I just saw your other Atlas Search related thread so I imagine you have got it working on your end. Is there anything I’m missing from this code snippet? Or is there any chance you can share a snippet from your end that’s working for you?

Thanks in advance!

a.com · November 30, 2020, 12:30am

You have to make sure you setup the index on atlas like the second video with Karen.

Then this works for me

export default async (root, args) => {
  try {
    const searchStep = {
      $search: {
        text: {
          query: args.searchText,
          path: 'name',
          fuzzy: {
            maxEdits: 1,
          },
        },
        highlight: {
          path: 'name',
        },
      },
    };

    let steps = [searchStep];

    const res = await MasterProducts.aggregate([
      ...steps,
      {$limit: args.limit || 25},
    ]);

    const countQuery = await MasterProducts.aggregate([
      ...steps,
      {$limit: 250}, // we limit the count query because otherwise it'll stake forever as it sometimes return 10k+ documents... the assumption is nobody will ever infinite scroll through more than 250 search results
      {$count: 'count'},
    ]);

    return {
      result: res,
      count: (countQuery[0] && countQuery[0].count) || 0,
    };
    return res;
  } catch (err) {
    console.log(err);
  }
};

Note this is in a nodejs app with mongoose though

hemalr87 · November 30, 2020, 12:49am

This makes sense then - when I run my exact code on a nodejs app (with the native mongodb npm package), the code runs fine. Something about Meteor’s implementation of it seems to be causing this but I’m not sure what because as far as I can tell, Meteor’s drivers are using either the 3.6.2 or the 3.6.3 nodejs drivers, both of which should be supporting this feature.

@a.com - Question out of interest - why do you use mongoose vs Meteor’s version?

a.com · November 30, 2020, 1:28am

I don’t use meteor anymore… just vanilla nodejs, mongoose and accountsjs.

But meteors much more active now-- I think it’s worth opening a ticket about this on the repo (or wherever the new team is watching).

hemalr87 · November 30, 2020, 2:30am

Ah right.

I think it’s worth opening a ticket about this on the repo

Yeah I might do this. I generally come here first and try to ensure I’m not missing something obvious on my end.

Cheers

radekmie · November 30, 2020, 8:00am

We’ve also tried it, but it never went out of the design phase as it’s impossible to search on multiple collections (yet). The requirement of $search being the first phase is also a possible performance loss, especially for multi-tenant systems. Hopefully, both are already on their feature requests list.

Also, if you had some search in the app already (very large, full of $lookups pipeline built with SparrowQL + text query transformed into a set of regexes in our case), remember to compare both approaches. Of course, it’d be really hard to implement a proper scoring algorithm, but its performance may be surprising. For example, if your query can be well-narrowed (e.g., set of IDs for multi-tenant systems; some date range) and indexed, then it may be faster (it was the case in our case).

hemalr87 · November 30, 2020, 9:23am

Cheers mate. Question - did you manage to get the $search working on Meteor in your trials?

Yes that limitation of it needing to be the first stage of an aggregation seems odd. I can’t imagine they haven’t thought of it - there must be a reason for it. I suspect it’s to do with the fact that they are different processes.

I just voted on the feature request (direct link for anyone to add their voice, if interested). I also came across this SO question and the strategy of using the ‘filter’ seems like it would provide a good level of performance for a multi-tenant system even with the $search having to come first.

radekmie · November 30, 2020, 9:39am

I can’t recall now, but I’m pretty sure it did not reach the app code at all.

The reason for this phase being the first one is (I guess) that Lucene is queried only once, and then the data is streamed into the MongoDB pipeline. I think it is possible to remove this restriction, but it may be complex to do it without performance loss.

And about using filter for solving multi-tenancy queries - it depends on your multi-tenancy scheme. In our case, it is a list of tenant IDs on each document. Now, using text search on IDs may not work. We haven’t performed any extensive tests, but I can imagine an unsolvable class of hard to debug problems regarding partial matches on IDs.

hemalr87 · November 30, 2020, 10:01am

Yup makes sense.

I was going to experiment with using the keyword analyzer for the IDs that need to match - https://docs.atlas.mongodb.com/reference/atlas-search/analyzers/keyword/#ref-keyword-analyzer

And then writing some code to ensure that things are correct. I don’t know what the performance cost of double handling the data like that would be but would like to find out and if it isn’t too big, it’ll provide some assurance and data on whether any results are being returned by the $search that shouldn’t be.

Edit - reading the ‘filter’ docs I may have misunderstood how to go about this. Will figure it out eventually…

marklynch · November 30, 2020, 12:33pm

@hemalr87 - It’s been working for us for a long time now so don’t think it’s a meteor issue. We’re using mongo@1.10.0 on Meteor 1.11.1. Sorry to ask such an obvious question but could you have forgotten to point the mongo_url to the atlas db running 4.2? IIRC the ‘Unrecognized pipeline stage name: ‘$search’’ comes from the db.

Thanks for that. Will check it out. That’s been our main issue with Atlas search as well.

hemalr87 · November 30, 2020, 1:06pm

Heh obvious question (and I think I have) but it’s a new cluster so I may have set things up wrong? I’ll go double check!

Thanks for confirming that the issue is on my end though

Edit:

Sorry to ask such an obvious question but could you have forgotten to point the mongo_url to the atlas db running 4.2

Day just starting here in Aus and very embarassingly, this was indeed the case! I had forgotten to put quotation marks around my MONGO_URL when starting the app and didn’t realise that it would fail silently and revert to the local mongo.

$search has lift off!

Thank you so much @marklynch