How to find (and return) elements in multidimensional arrays?


#1

This is the JSON structure, note that the last entry has 2 multidimensional arrays in it, one for chr 10 and one for chr 12:

{
"_id": "9oFvJYeG9wpdBYunu",
"segments": [{
    "chr": "7",
    "start": "140422294",
    "end": "155048283",
    "length": "29.1",
    "snps": "1666"
}]
} {
"_id": "HK4WXc5mR6fyesjpP",
"segments": [{
    "chr": "10",
    "start": "83865742",
    "end": "90981118",
    "length": "6.3",
    "snps": "1380"
}]
} {
"_id": "3N4Z2dtX5PiuqmCFv",
"segments": [{
    "chr": "10",
    "start": "83865742",
    "end": "90981118",
    "length": "6.3",
    "snps": "1380"
}, {
    "chr": "12",
    "start": "32853998",
    "end": "44834540",
    "length": "5.1",
    "snps": "1623"
}]
}

How can I identify all segments with chr = 10? I want to get the detailed information about the elements with chr = 10.

I want to return the following fields:

_id, chr, start, end, length, snps

back. How can I query this in Meteor? I tried $elemMatch an $in but to no avail.

Thanks in advance for your help! Cross posted to SO as well: http://stackoverflow.com/questions/32893342/mongodb-how-to-find-and-return-elements-in-a-multidimensional-array

P.S.: this is on the server side, not sure if that’s important


#3
.find({"segments.chr":"10"})

You don’t say how you expect to structure the returning fields so I’ve made no guess, but you’ll have to map the cursor.

edit: that query returns documents with none-matching segments, sorry… goose chase

edit2:

this .find({"segments.chr":"10"},{"segments": { $elemMatch: {chr:"10"}}}) returns

    /* 1 */
{
    "_id" : "HK4WXc5mR6fyesjpP",
    "segments" : [ 
        {
            "chr" : "10",
            "start" : "83865742",
            "end" : "90981118",
            "length" : "6.3",
            "snps" : "1380"
        }
    ]
}

/* 2 */
{
    "_id" : "3N4Z2dtX5PiuqmCFv",
    "segments" : [ 
        {
            "chr" : "10",
            "start" : "83865742",
            "end" : "90981118",
            "length" : "6.3",
            "snps" : "1380"
        }
    ]
}

then you’ll have to map the cursor


#4

Close but still gives problems with documents with more than one element with chr = 10.

EDIT: I want exactly those elements with chr = 10, not give me all documents (with all other their segments) where one of the elements has chr = 10.

Please further note that according to the MongoDB documentation both ‘$elemMatch’ and ‘$’ only return the first match. However my collection has documents where there are two or more elements with chr = 10. I want to get all elements, also in this case with multiple chr 10 elements in one array.


#5

Used db.matches.find({ “segments.chr”: “10” }, { “kit1”:1, “segments.$”:1 }).sort({“kit1”:1}) as kit1 (key) identifies a document with 2+ elements in the array. It does find both elements but the values are identical.

See here:
{ “_id” : “ZtojR5BLoFHGoXCih”, “kit1” : “F393988”, “segments” : [ { “chr” : “10”, “start” : “34068234”, “end” : “50002936”, “length” : “7.2”, “snps” : “895” } ] }
{ “_id” : “SdhYw8gR4m9CWNHY6”, “kit1” : “F393988”, “segments” : [ { “chr” : “10”, “start” : “34068234”, “end” : “50002936”, “length” : “7.2”, “snps” : “895” } ] }


#6

I was working off the json you provided earlier

how about .find({"segments": {$all: [{ $elemMatch: {"chr":"10"}}]}})


#7

Not an answer to your question, but since the _id is only pointing to an array, no other data, do you really need an array?

This would be much easier to query:

        {
            "_id" : "3N4Z2dtX5PiuqmCFv",
            "segment_id": "112"
            "chr" : "10",
            "start" : "83865742",
            "end" : "90981118",
            "length" : "6.3",
            "snps" : "1380"
        },
        {
            "_id" : "HK4WXc5mR6fyesjpP",
            "segment_id": "112"
            "chr" : "12",
            "start" : "83243243",
            "end" : "90981118",
            "length" : "6.3",
            "snps" : "1380"
        }

#8
db.test.aggregate([
  {
    $match:{"segments.chr":10}
  },
  {
    $unwind: "$segments"
  },
  {
    $match: {"segments.chr": 10}
  },
  {
    $group: 
      {
        _id: "$_id",
        segments: 
          {
            $push: "$segments"
          }
      }
  }
]).pretty()

#9

This gives a lot of elements, most of them not matching chr 10

@ralof I agree and it looks like I have to break up the data structure. The downside is that this increases the database size a lot but it makes it very easy to query the data. I will give a couple more hours and then will give up, seems schemaless is flexible and keeping the database small but it’s not always possible to query it.


#11

Robert, it gives the following error:TypeError: Property ‘matchesaggregate’ of object meteor is not a function


#12

I don’t think it would change the size noticeably. Another upside is that you would not only get easier queries, but also much more manage result sets.


#13

It depends on the average number of elements within an index. While the majority has only one element there are some cases with 2 and more (can be up to 22 I think). So that is increasing it. But yeah, the positive out of that flat data structure is already much visible for me (it’s indeed so much easier to find documents with all sorts of different queries).

In the end I think the queries and the performance are designing the data structure, not the other way round (same as in relational databases).


#14

Storage is very cheap nowadays :slight_smile:


#15

Not on hosted systems, actually that’s where most hosting companies force
you into upgrading first. But I agree in so far that storage in general is
cheap but my app won’t be run on my own server.