Mongo Hidden Gem in Aggregation Pipeline, Join Collections with $lookup

mongo

Having foreign key relationships in mongo is a bit controversial, but is often a necessary evil. And when it’s necessary, we can use an aggregation query with a $lookup pipeline to effectively join collections. Let’s say we have 2 collections Trainer and Monkey:

Trainer documents:

{
  "trainerId": 1,
  "name": "Chris"
}

Monkey documents:

{
  "name": "George",
  "parentTrainer": 1,
  "cage": "red"
},
{
  "name": "Mia",
  "parentTrainer": 1,
  "cage": "blue"
}

Now let’s say we wanted to get Chris and the child monkeys. We could query Trainer and Monkey separately, but more efficient is to use an aggregate query with a $lookup pipeline to join the collections together. This is what that query looks like:

db.getCollection('Trainer').aggregate([
  { $match: { trainerId: 1 } },
  {
    $lookup: {
      from: 'Monkey',
      localField: 'parentTrainerId',
      foreignField: 'trainerId',
      as: 'monkeys'
    }
  }
]);

Which produces a resultset with a single document that looks like this:

{
  "_id": {
    "$oid": "663acb196a9726ae6c28a58f"
  },
  "trainerId": 1,
  "name": "Chris",
  "monkeys": [
    {
      "_id": {
        "$oid": "663acb4e6a9726ae6c28a594"
      },
      "name": "George",
      "parentTrainer": 1,
      "cage": "red"
    },
    {
      "_id": {
        "$oid": "663acb5a6a9726ae6c28a596"
      },
    "name": "Mia",
    "parentTrainer": 1,
    "cage": "blue"
    }
  ]
}

When doing a simple join, the localField/foreignField are a convenience for the more verbose let/pipeline. However, the let/pipeline approach is more flexible, such as if you needed to join on multiple columns. Here’s what that looks like, with an extra $match expression to only include monkeys in the blue cage:

db.getCollection('Trainer').aggregate(
  [
    { $match: { trainerId: 1 } },
    {
      $lookup: {
        from: 'Monkey',
        let: {
          monkey_parentTrainerId:
            '$parentTrainerId'
        },
        as: 'monkeys',
        pipeline: [
          {
            $match: {
              $expr: {
                $and: [
                  {
                    $eq: [
                      '$trainerId',
                      '$$monkey_parentTrainerId'
                    ]
                  },
                  { $eq: ['blue', '$cage'] }
                ]
              }
            }
          }
        ]
      }
    }
  ]
);

Which now only has Mia in the monkeys:

{
  "_id": {
    "$oid": "663acb196a9726ae6c28a58f"
  },
  "trainerId": 1,
  "name": "Chris",
  "monkeys": [
    {
      "name": "Mia",
      "parentTrainer": 1,
      "cage": "blue"
    }
  ]
}