Docs are here if anyone is feeling lazy:
Use rawCollection()
to query MongoDB with an aggregate
function…
See Meteor Docs if needed: Collections | Meteor API Docs
Docs are here if anyone is feeling lazy:
Use rawCollection()
to query MongoDB with an aggregate
function…
See Meteor Docs if needed: Collections | Meteor API Docs
We’re using this and really impressed by it. With the regular text search we had to generate ngrams to make it work “as you type” but with this search it’s much easier. The only downside I see (well… other than the lock-in to Atlas) is that the searchBeta stage needs to be the first one in your aggregation so if you run a multi-tenant system like us that’s a very inefficient way to find results.
Hmm, this sounds promising. Trying to experiment with it to replace some in-code search processes we have that are getting a bit slow as our collection grows.
For some reason though, I keep getting an error:
MongoError: Unrecognized pipeline stage name: '$search'
I created a simple collection called Notifications for this test, with a single field: ‘title’ and added some dummy documents to it.
The rawCollection/aggregate part seems to be working okay because the below code returns the single matched document:
try {
const res = await Notifications.rawCollection()
.aggregate([
{
$match: {
_id: 'EAMo8wdqMzWKN96da',
},
},
])
.toArray();
console.log(res); // Single document returned and logged
} catch (err) {
console.error(err);
}
I’ve set up a search index on the ‘title’ field via the Atlas UI as per the tutorial instructions. This below code, however bring up the mentioned error.
try {
const res = await Notifications.rawCollection()
.aggregate([
{
$search: {
text: {
query: 'event',
path: 'title',
},
},
},
])
.toArray();
console.log(res);
} catch (err) {
console.error(err); // MongoError: Unrecognized pipeline stage name: '$search'
}
};
The db is running version 4.2, and the index is building successfully.
Have I missed anything obvious here?
Edit - Would this be a case of certain outdated packages? If so, which ones? (I’m on Meteor 1.11 - so the drivers should have support for Mongo version 4.2).
Are you using Atlas? If I remember this correctly, this is still not available outside Atlas.
Yup, this is on Atlas. I confirmed that all the settings on Atlas are correct by setting up a stand-alone node.js app and running the same code using the native node mongo drivers.
So this code works and returns the results as expected:
const MongoClient = require('mongodb').MongoClient;
const uri = <MongoURL>
const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect(async (err) => {
try {
const res = await client
.db('meteor')
.collection('notifications')
.aggregate([
{
$search: {
text: {
query: 'event',
path: 'title',
},
},
},
])
.toArray();
console.log(res); // Logs a list of search result documents
console.log(`${res.length} results`);
} catch (error) {
console.error(error);
}
client.close();
});
I tried to mimic a workaround by using the mongo client within the Meteor server:
import mongodb from 'mongodb';
...
const client = new mongodb.MongoClient(uri, { useNewUrlParser: true }); // Error: <rejected> TypeError: MongoClient is not a constructor
Haven’t figured out what/why that isn’t working because I was under the impression that Meteor ships with the native node mongodb driver. @mullojo and @marklynch got this working though so the fault must be in my Meteor code somewhere.
Cheers, for that.
I went ahead and gave that timestamped code from your first suggested video a try:
try {
const res = await Notifications.rawCollection()
.aggregate([
{
$search: {
autocomplete: {
query: 'event',
path: 'title',
},
},
},
])
.toArray();
console.log(res);
} catch (err) {
console.error(err); // Same error: MongoError: Unrecognized pipeline stage name: '$search'
}
Unfortunately the error persists: MongoError: Unrecognized pipeline stage name: ‘$search’
I just saw your other Atlas Search related thread so I imagine you have got it working on your end. Is there anything I’m missing from this code snippet? Or is there any chance you can share a snippet from your end that’s working for you?
Thanks in advance!
You have to make sure you setup the index on atlas like the second video with Karen.
Then this works for me
export default async (root, args) => {
try {
const searchStep = {
$search: {
text: {
query: args.searchText,
path: 'name',
fuzzy: {
maxEdits: 1,
},
},
highlight: {
path: 'name',
},
},
};
let steps = [searchStep];
const res = await MasterProducts.aggregate([
...steps,
{$limit: args.limit || 25},
]);
const countQuery = await MasterProducts.aggregate([
...steps,
{$limit: 250}, // we limit the count query because otherwise it'll stake forever as it sometimes return 10k+ documents... the assumption is nobody will ever infinite scroll through more than 250 search results
{$count: 'count'},
]);
return {
result: res,
count: (countQuery[0] && countQuery[0].count) || 0,
};
return res;
} catch (err) {
console.log(err);
}
};
Note this is in a nodejs app with mongoose though
This makes sense then - when I run my exact code on a nodejs app (with the native mongodb npm package), the code runs fine. Something about Meteor’s implementation of it seems to be causing this but I’m not sure what because as far as I can tell, Meteor’s drivers are using either the 3.6.2 or the 3.6.3 nodejs drivers, both of which should be supporting this feature.
@a.com - Question out of interest - why do you use mongoose vs Meteor’s version?
I don’t use meteor anymore… just vanilla nodejs, mongoose and accountsjs.
But meteors much more active now-- I think it’s worth opening a ticket about this on the repo (or wherever the new team is watching).
Ah right.
I think it’s worth opening a ticket about this on the repo
Yeah I might do this. I generally come here first and try to ensure I’m not missing something obvious on my end.
Cheers
We’ve also tried it, but it never went out of the design phase as it’s impossible to search on multiple collections (yet). The requirement of $search
being the first phase is also a possible performance loss, especially for multi-tenant systems. Hopefully, both are already on their feature requests list.
Also, if you had some search in the app already (very large, full of $lookup
s pipeline built with SparrowQL + text query transformed into a set of regexes in our case), remember to compare both approaches. Of course, it’d be really hard to implement a proper scoring algorithm, but its performance may be surprising. For example, if your query can be well-narrowed (e.g., set of IDs for multi-tenant systems; some date range) and indexed, then it may be faster (it was the case in our case).
Cheers mate. Question - did you manage to get the $search working on Meteor in your trials?
Yes that limitation of it needing to be the first stage of an aggregation seems odd. I can’t imagine they haven’t thought of it - there must be a reason for it. I suspect it’s to do with the fact that they are different processes.
I just voted on the feature request (direct link for anyone to add their voice, if interested). I also came across this SO question and the strategy of using the ‘filter’ seems like it would provide a good level of performance for a multi-tenant system even with the $search having to come first.
I can’t recall now, but I’m pretty sure it did not reach the app code at all.
The reason for this phase being the first one is (I guess) that Lucene is queried only once, and then the data is streamed into the MongoDB pipeline. I think it is possible to remove this restriction, but it may be complex to do it without performance loss.
And about using filter
for solving multi-tenancy queries - it depends on your multi-tenancy scheme. In our case, it is a list of tenant IDs on each document. Now, using text search on IDs may not work. We haven’t performed any extensive tests, but I can imagine an unsolvable class of hard to debug problems regarding partial matches on IDs.
Yup makes sense.
I was going to experiment with using the keyword analyzer for the IDs that need to match - https://docs.atlas.mongodb.com/reference/atlas-search/analyzers/keyword/#ref-keyword-analyzer
And then writing some code to ensure that things are correct. I don’t know what the performance cost of double handling the data like that would be but would like to find out and if it isn’t too big, it’ll provide some assurance and data on whether any results are being returned by the $search that shouldn’t be.
Edit - reading the ‘filter’ docs I may have misunderstood how to go about this. Will figure it out eventually…
@hemalr87 - It’s been working for us for a long time now so don’t think it’s a meteor issue. We’re using mongo@1.10.0 on Meteor 1.11.1. Sorry to ask such an obvious question but could you have forgotten to point the mongo_url to the atlas db running 4.2? IIRC the ‘Unrecognized pipeline stage name: ‘$search’’ comes from the db.
Thanks for that. Will check it out. That’s been our main issue with Atlas search as well.
Heh obvious question (and I think I have) but it’s a new cluster so I may have set things up wrong? I’ll go double check!
Thanks for confirming that the issue is on my end though
Edit:
Sorry to ask such an obvious question but could you have forgotten to point the mongo_url to the atlas db running 4.2
Day just starting here in Aus and very embarassingly, this was indeed the case! I had forgotten to put quotation marks around my MONGO_URL when starting the app and didn’t realise that it would fail silently and revert to the local mongo.
$search has lift off!
Thank you so much @marklynch
For multi-tenant could you not use Search itself to filter the result to only specific tenants? So something like:
tenantId : {
must_be: [1, 2, 3]
}
Is there a reason that wouldn’t work? I’m just going by the references on: https://docs.atlas.mongodb.com/reference/atlas-search/performance#-match-aggregation-stage-usage.
I’m not sure whether I understand your idea. Of course, you can use $match
right after $search
, but this will search for all matches (across all tenants) and later filter them. Now, while this may work, it will scale as the $search
alone, not the $match + $search
($match
for tenant ID may get rid of 99.9% results (if there is 1000 tenants of similar size).
Also, must
itself is not an operator - it’s just part of the compound
operator. That means you’d still have to use equals
, which works only for booleans and ObjectId
s.
So the way I’m doing this for a multi-tenant app is to use a compound query:
await Collection.rawCollection().aggregate(
[
{
$search: {
compound: {
filter: [{
text: {
query: this.userId,
path: 'createdBy',
},
}],
must: [{
text: {
query,
path,
fuzzy: {
maxEdits: 2,
prefixLength: 1,
},
},
}],
},
},
},
{
$project: {
score: { $meta: 'searchScore' },
createdAt: 1,
},
},
]
).toArray();
Where the ‘createdBy’ path is indexed using lucene.keyword so that it requires an exact match. All my tests indicated that this is accurate and doesn’t pollute results and this was confirmed by a member of the Atlas Search team.
That said, I want to subscribe to the results so I have an added check whereby the search operation only brings up the relevant _ids and I then subscribe to those _ids and ensure that the match confirms that the createdBy matches the subscribing userId.
Hope that makes sense.