MongoDB学习笔记(五):聚合操作

158 阅读6分钟
函数格式描述
db.collection.aggregate()db.<collection>.aggregate(<pipeline>,<options>)<pipeline>文档定义了操作中使用的聚合管道阶段和聚合操作符
<options>:文档声明了一些聚合操作的参数

- 常见的几种表达式

- 字段路径表达式

表达式说明案例描述
$<field>使用$来指示字段路径$name指示银行账户文档中客户姓名的字段
$<field>.<sub-field>使用$.来指示内嵌文档字段路径$info.dateOpened指示银行账户文档中开户日期的字段

- 系统变量表达式

表达式说明
$$<variable>使用$$来指示系统变量
$$CURRENT指示管道中当前操作的文档

$$CURRENT.<field>$<filed>是等效的

- 常量表达式

表达式说明
$litteral:<value>指示常量<value>
$litteral: "$name"指示常量字符串 "$name"(这里的$被当做常量处理,而不是字段路径表达式)

- 聚合管道阶段

管道说明
$project对输入文档进行再次投影
$match对输入文档进行筛选
$limit筛选出管道内前N篇文档
$skip跳过管道内前N篇文档
$unwind展开输入文档中的数组字段
$sort对输入文档进行查询操作
$lookup对输入文档进行查询操作
$group对输入文档进行分组
$out将管道中的文档输出

- $project

  • 先创建几个文档
db.accounts.insertMany([
    {
        name: {firstName: "alice", lastName: "wong"},
        balance: 50
    },
    {
        name: { firstName: "bob", lastName: "yang"},
        balance: 20
    }
])
创建成功
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("63d77f181ce94226b81725a3"),
                ObjectId("63d77f181ce94226b81725a4")
        ]
}

  • 对银行账户文档进行重新投影
db.accounts.aggregate([
    {
        $project: {
            _id:0,
            balance:1,
            clientName: "$name.firstName"
        }
    }
])

{ "balance" : 50, "clientName" : "alice" }
{ "balance" : 20, "clientName" : "bob" }


db.accounts.aggregate([
    {
        $project: {
            _id:0,
            balance:1,
            nameArray:["$name.firstName","$name.middleName","$name.lastName"]
        }
    }
])

{ "balance" : 50, "nameArray" : [ "alice", null, "wong" ] }
{ "balance" : 20, "nameArray" : [ "bob", null, "yang" ] }

$project是一个很常用的聚合阶段

可以用来灵活的控制输出文档的格式

也可以用来剔除不相关的字段,以优化管道操作的性能

- $match

$match中使用的文档筛选语法,和读取文档时的筛选语法相同

  • 对银行账户文档进行筛选
db.accounts.aggregate([
    {
        $match: {
            "name.firstName":"alice"
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50 }


db.accounts.aggregate([
    {
        $match: {
            $or: [
                { balance: { $gt: 40, $lt: 80 } },
                { "name.lastName": "yang" }
            ]
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50 }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20 }

  • 将筛选和投影阶段结合在一起
db.accounts.aggregate([
    {
        $match: {
            $or: [
                { balance:{ $gt:40, $lt:80}},
                { "name.lastName": "yang"}
            ]
        }
    },
    {
        $project: {
            _id: 0
        }
    }
])

{ "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50 }
{ "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20 }

$match也是一个很常用的聚合阶段

应该尽量在聚合管道的开始阶段应用$match

这样可以减少后续阶段中需要处理的文档数量,优化聚合操作性能

- $limit 和 $skip

  • 筛选第一篇银行账户文档
db.accounts.aggregate([
    { $limit: 1}
])
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50 }

  • 跳过第一篇银行账户文档
db.accounts.aggregate([
    { $skip: 1}
])

{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20 }

- $unwind

展开数组元素变成多条数据

  • 向现有的银行账户文档中加入数组字段
> db.accounts.update(
    { "name.firstName": "alice"},
    { $set: { currency: [ "CNY","USD"]}}
)

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

> db.accounts.update(
    { "name.firstName": "bob" },
    { $set: { currency: "GBP"}}
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

查看账户文档
> db.accounts.find()
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : [ "CNY", "USD" ] }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP" }

  • 将文档中的货币种类数组展开
db.accounts.aggregate([    {        $unwind: {            path: "$currency"        }    }])
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "CNY" }
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "USD" }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP" }

前两篇除了currency字段,其他都一样,且由数组变字符串
  • 展开数组时添加元素位置
添加includeArrayIndex参数,并给一个值,在$unwind生成的文档当中,要再添加一个字段,这里叫ccyIndex,而这个字段的值将是原数组字段那一个元素在原数组中的位置
> db.accounts.aggregate([
    {
        $unwind: {
            path: "$currency",
            includeArrayIndex: "ccyIndex"
        }
    }
])
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "CNY", "ccyIndex" : NumberLong(0) }
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "USD", "ccyIndex" : NumberLong(1) }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP", "ccyIndex" : null }

CNY在原数组中位置为0,USD在原数组中位置为1,第三条原数据因为不是数组,所以位置为null
  • 将文档中的货币种类数组展开
先再添加几个文档 
> db.accounts.insertMany([
    {
        name: {firstName: "charlie", lastName: "gordon" },
        balance: 100
    },
    {
        name: { firstName: "david", lastName: "wu" },
        balance: 200,
        currency:[]
    },
    {
        name: { firstName: "eddie",lastName: "kim"},
        balance:20,
        currency:null
    }
])
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("63d89e3b1ce94226b81725a5"),
                ObjectId("63d89e3b1ce94226b81725a6"),
                ObjectId("63d89e3b1ce94226b81725a7")
        ]
}

再将文档中的货币种类数组展开
> db.accounts.aggregate( [
    {
        $unwind: {
            path: "$currency"
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "CNY" }
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "USD" }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP" }

$unwind默认去除掉空数组或不存在该字段,或者为null的文档
  • 展开数组时保留空数组或不存在该字段的文档
> db.accounts.aggregate([
    {
        $unwind: {
            path: "$currency",
            preserveNullAndEmptyArrays: true
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "CNY" }
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : "USD" }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP" }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a5"), "name" : { "firstName" : "charlie", "lastName" : "gordon" }, "balance" : 100 }    # 原数组是空的
{ "_id" : ObjectId("63d89e3b1ce94226b81725a6"), "name" : { "firstName" : "david", "lastName" : "wu" }, "balance" : 200 }    # 不存在该字段
{ "_id" : ObjectId("63d89e3b1ce94226b81725a7"), "name" : { "firstName" : "eddie", "lastName" : "kim" }, "balance" : 20, "currency" : null }   # 空值

- $sort

1代表升序,-1代表降序

  • 对银行账户文档进行排序
db.accounts.aggregate( [    { $sort: { balance:1, "name.lastName": -1 }}])

{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP" }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a7"), "name" : { "firstName" : "eddie", "lastName" : "kim" }, "balance" : 20, "currency" : null }
{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : [ "CNY", "USD" ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a5"), "name" : { "firstName" : "charlie", "lastName" : "gordon" }, "balance" : 100 }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a6"), "name" : { "firstName" : "david", "lastName" : "wu" }, "balance" : 200, "currency" : [ ] }

- $lookup

使用单一字段值进行查询

$lookup: {
    from: <collection to join>,
    localField: <field from the input documents>,
    foreignField: <field from the documents of the "form" collection>,
    as: <output array field>
}
  • 增加一个集合用来存储外汇数据

db.forex.insertMany([
    {
        ccy: "USD",
        rate: 6.91,
        date: new Date("2018-12-21")
    },
    {
        ccy: "GBP",
        rate: 8.72,
        date: new Date("2018-08-21")
    },
    {
        ccy: "CNY",
        rate: 1.0,
        date: new Date("2018-12-21")
    }
])

{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("63d8abd51ce94226b81725a8"),
                ObjectId("63d8abd51ce94226b81725a9"),
                ObjectId("63d8abd51ce94226b81725aa")
        ]
}

  • 将查询到的 外汇汇率 写入 银行账户 文档
> db.accounts.aggregate([    {        $lookup: {            from: "forex",            # 对应的查询集合            localField:"currency",    # 管道集合当中想要查询的字段            foreignField: "ccy"       # 查询集合的字段            as: "forexData"           # 当currency和ccy相等时,将ccy添加到forexData中,否则不会        }    }])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : [ "CNY", "USD" ], "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }  # 数组内两个都是相等的数据
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP", "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a9"), "ccy" : "GBP", "rate" : 8.72, "date" : ISODate("2018-08-21T00:00:00Z") } ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a5"), "name" : { "firstName" : "charlie", "lastName" : "gordon" }, "balance" : 100, "forexData" : [ ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a6"), "name" : { "firstName" : "david", "lastName" : "wu" }, "balance" : 200, "currency" : [ ], "forexData" : [ ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a7"), "name" : { "firstName" : "eddie", "lastName" : "kim" }, "balance" : 20, "currency" : null, "forexData" : [ ] }

  • 如果localField是一个数组字段
先用unwind,再用lookup
> db.accounts.aggregate([
    {
        $unwind: {
            path: "$currency"
        }
    },
    {
        $lookup: {
            from: "forex",
            localField: "currency",
            foreignField: "ccy",
            as: "forexData"
        }
    }
])

使用复杂条件查询

$lookup: {
    from: <collection to join>,
    let: { <var_1>: <expression>,...,<var_n>: <expression>},   对查询集合中的文档使用聚合阶段进行处理时,如果需要参考管道文档中的字段,则必须使用let参数对字段进行声明,否则可以省略
    pipeline: [ <pipeline to execute on the collection to join> ], # 对查询集合中的文档进行处理
    as: <output array field>
}
  • 将特定日期外汇汇率写入银行账户文档
> db.accounts.aggregate([    {        $lookup: {            from: "forex",            pipeline: [                         { $match:                    {                        date: new Date("2018-12-21")                    }                }            ],
            as: "forexData"
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : [ "CNY", "USD" ], "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP", "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a5"), "name" : { "firstName" : "charlie", "lastName" : "gordon" }, "balance" : 100, "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a6"), "name" : { "firstName" : "david", "lastName" : "wu" }, "balance" : 200, "currency" : [ ], "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a7"), "name" : { "firstName" : "eddie", "lastName" : "kim" }, "balance" : 20, "currency" : null, "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }

在这个例子中,查询条件和管道文档之间,其实并没有直接的联系,这种查询被称作不相关查询,$lookup从3.6版本开始支持不相关查询
  • 将特定日期外汇汇率写入余额大于100的银行账户文档
> db.accounts.aggregate([
    {
        $lookup:{
            from: "forex",
            let: { bal: "$balance"},
            pipeline: [
                { $match: 
                    { $expr:
                        { $and:
                            [
                                { $eq: [ "$date", new Date("2018-12-21") ] },
                                { $gt: [ "$$bal",100] }
                            ]
                        }
                    }
                }
            ],
            as: "forexData"
        }
    }
])

{ "_id" : ObjectId("63d77f181ce94226b81725a3"), "name" : { "firstName" : "alice", "lastName" : "wong" }, "balance" : 50, "currency" : [ "CNY", "USD" ], "forexData" : [ ] }
{ "_id" : ObjectId("63d77f181ce94226b81725a4"), "name" : { "firstName" : "bob", "lastName" : "yang" }, "balance" : 20, "currency" : "GBP", "forexData" : [ ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a5"), "name" : { "firstName" : "charlie", "lastName" : "gordon" }, "balance" : 100, "forexData" : [ ] }
{ "_id" : ObjectId("63d89e3b1ce94226b81725a6"), "name" : { "firstName" : "david", "lastName" : "wu" }, "balance" : 200, "currency" : [ ], "forexData" : [ { "_id" : ObjectId("63d8abd51ce94226b81725a8"), "ccy" : "USD", "rate" : 6.91, "date" : ISODate("2018-12-21T00:00:00Z") }, { "_id" : ObjectId("63d8abd51ce94226b81725aa"), "ccy" : "CNY", "rate" : 1, "date" : ISODate("2018-12-21T00:00:00Z") } ] }   # 因为时间也要进行筛选,所以只有两个结果
{ "_id" : ObjectId("63d89e3b1ce94226b81725a7"), "name" : { "firstName" : "eddie", "lastName" : "kim" }, "balance" : 20, "currency" : null, "forexData" : [ ] }

- $group

$group: {
    _id: <expression>,            定义分组规则
    <field1>: { <accumulator1>:<expression1> }, 可以使用聚合操作符来定义新字段
    ... 
}
  • 增加一个集合用来存储股票交易记录
> db.transactions.insertMany([
    {
        symbol: "600519",
        qty: 100,
        price: 567.4,
        currency: "CNY"
    },
    {
        symbol: "AMZN",
        qty:1,
        price: 1377.5,
        currency: "USD"
    },
    {
        symbol: "AAPL",
        qty:2,
        price: 150.7,
        currency: "USD"
    }
])

{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("63d8d8931ce94226b81725ab"),
                ObjectId("63d8d8931ce94226b81725ac"),
                ObjectId("63d8d8931ce94226b81725ad")
        ]
}

  • 按照交易货币来分组交易记录
> db.transactions.aggregate([
    {
        $group: {
            _id: "$currency"
        }
    }
])
{ "_id" : "CNY" }
{ "_id" : "USD" }

不使用聚合操作符的情况下,$group可以返回管道文档中某一字段的所有(不重复)值
  • 使用聚合操作符计算分组聚合值
> db.transactions.aggregate([
    {
        $group: {
            _id: "$currency",
            totalQty: { $sum: "$qty"},    # 将各个小组里的qty值进行求和,给到totalQty
            totalNotional: { $sum: { $multiply: [ "$price","$qty" ] } },
            avgPrice: { $avg: "$price" },
            count: { $sum:1},
            maxNotional: { $max: { $multiply: [ "$price","$qty" ] } },
            minNotional: { $min: { $multiply: [ "$price","$qty" ] } }
            
        }
    }
])
{ "_id" : "CNY", "totalQty" : 100, "totalNotional" : 56740, "avgPrice" : 567.4, "count" : 1, "maxNotional" : 56740, "minNotional" : 56740 }
{ "_id" : "USD", "totalQty" : 3, "totalNotional" : 1678.9, "avgPrice" : 764.1, "count" : 2, "maxNotional" : 1377.5, "minNotional" : 301.4 }

  • 使用聚合操作符计算所有文档聚合值
> db.transactions.aggregate([
    {
        $group: {
            _id: null,
            totalQty: { $sum: "$qty"},    # 将各个小组里的qty值进行求和,给到totalQty
            totalNotional: { $sum: { $multiply: [ "$price","$qty" ] } },
            avgPrice: { $avg: "$price" },
            count: { $sum:1},
            maxNotional: { $max: { $multiply: [ "$price","$qty" ] } },
            minNotional: { $min: { $multiply: [ "$price","$qty" ] } }
            
        }
    }
])

{ "_id" : null, "totalQty" : 103, "totalNotional" : 58418.9, "avgPrice" : 698.5333333333333, "count" : 3, "maxNotional" : 56740, "minNotional" : 301.4 }

  • 使用聚合操作符创建数组字段
> db.transactions.aggregate([
    {
        $group: {
            _id: "$currency",
            symbols: { $push: "$symbol"}
        }
    }
])

{ "_id" : "CNY", "symbols" : [ "600519" ] }
{ "_id" : "USD", "symbols" : [ "AMZN", "AAPL" ] }

- $out

  • 将聚合管道中的文档写入一个新集合
> db.transactions.aggregate([
    {
        $group: {
            _id: "$currency",
            symbols: { $push: "$symbol"}
        }
    },
    {
        $out: "output"      # 将上面处理好的数据导入到新的集合output
    }
])

> db.output.find()
{ "_id" : "CNY", "symbols" : [ "600519" ] }
{ "_id" : "USD", "symbols" : [ "AMZN", "AAPL" ] }

  • 将聚合管道中的文档写入一个已存在的集合
> db.transactions.aggregate([
    {
        $group: {
            _id: "$symbol",
            totalNotional: { $sum: {$multiply: [ "$price","$qty"]}}
        }
    },
    {
        $out: "output"
    }
])


 查看output集合
> db.output.find()
{ "_id" : "AMZN", "totalNotional" : 1377.5 }
{ "_id" : "AAPL", "totalNotional" : 301.4 }
{ "_id" : "600519", "totalNotional" : 56740 }

$out会保留集合的索引,但是会清空旧文档内容,放入新的内容

如果聚合管道操作遇到错误,管道阶段不会创建新集合或是覆盖已存在的集合内容

- 聚合操作

- allowDiskUse

db.<collection>.aggregate( <pipeline>,<options>)
<options>:
    allowDiskUse: <boolean>
    
    每个聚合管道阶段使用的内存不能超过100MB
    
    如果数据量较大,为了防止聚合管道阶段超出内存上限并抛出错误,可以启用allowDiskUse选项
    
    allowDiskUse启用之后,聚合阶段可以在内存容量不足时,将操作数据写入临时文件中
    
    临时文件会被写入dbPath下的_tmp文件夹,dbPath的默认值为/data/db

> db.transactions.aggregate([
    {
        $group: {
            _id: "$currency",
            symbols:{ $push: "$symbol"}
        }
    }],
    { allowDiskUse:true}
)

- 聚合阶段顺序优化

- $project + $match

$match阶段会在$project阶段之前运行

原本写的顺序
> db.transactions.aggregate([
    {
        $project: {
            _id: 0,symbol:1,currency:1,
            notional: { $multiply:[ "$price","$qty"]}
        }
    },
    {
        $match: {
            currency: "USD",
            notional: { $gt: 1000 }
        }
    }
])

实际的执行顺序
> db.transactions.aggregate([
    {
        $macth: {
            currency: "USD",
        }
    },
    {
        $project: {
            _id: 0,symbol:1,currency:1,
            notional: { $multiply:[ "$price","$qty"]}
        }
    },
    {
        $match: {
            notional: { $gt: 1000 }
        }
    }
])

- $sort + $match

$match阶段会在$sort阶段之前运行

原本写的顺序
> db.transactions.aggregate(
    {
        $sort: {
            price: 1
        }
    },
    {
        $match: {
            currency: "USD"
        }
    }
)

实际执行顺序
> db.transactions.aggregate(
    {
        $match: {
            currency: "USD"
        }
    },
    {
        $sort: {
            price: 1
        }
    }
)

- $project + $skip

$skip阶段会在$project阶段之前运行

原本写的顺序
> db.transactions.aggregate([
    {
        $project: {
            _id: 0, symbol:1, currency:1,
            notional: { $multiply: [ "$price", "$qty"]}
        }
    },
        { 
        $skip:2 
    }
])

实际执行的顺序
> db.transactions.aggregate([
    { 
        $skip:2 
    },
    {
        $project: {
            _id: 0, symbol:1, currency:1,
            notional: { $multiply: [ "$price", "$qty"]}
        }
    }
])

- 聚合操作的合并优化

$sort + $limit

如果两者之间没有夹杂着会改变文档数量的聚合阶段,sortsort和limit阶段可以合并

原本写的顺序
> db.transactions.aggregate([
    {
        $sort: { price:1}
    },
    {
        $project: {
            _id:0, symbol:1,currency:1,
            notional: { $multiply: [ "$price", "$qty"] }
        }
    },
    {
        $limit: 2
    }
])

实际的执行顺序
> db.transactions.aggregate([
    {
        $sort: { price:1} +  $limit: 2    # 伪码,方便理解
    },
    {
        $project: {
            _id:0, symbol:1,currency:1,
            notional: { $multiply: [ "$price", "$qty"] }
        }
    },
])

$lookup + $unwind

连续排列在一起的$lookup$unwind阶段,如果$unwind应用在$lookup阶段创建的as字段上,则两者可以合并

> db.accounts.aggregate([
    {
        $lookup: {
            from: "forex",
            localField: "currency",
            foreignField: "ccy",
            as : "forexData"
        }
        +
        { $unwind: "$forexData"}
    }
])