亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Community

Learn

Tools Library

AI Tools

Leisure

English

nosql - MongoDB的MR問題！

怪我咯 2017-04-21 11:16:15

854

聽說mongodb的MapReduce是單線程的，性能很差，這是怎么回事？差到什么程度呢？？有哪位大俠能說說原理。

怪我咯

走同樣的路，發(fā)現(xiàn)不同的人生

reply all(3)

洪濤2017-04-21 11:18:15 3 floor

I don’t know whether the execution inside is single-threaded, but if it is a production environment, it is best not to directly access the mapReduce results every time. Depending on the size of the data, it will still take a certain amount of time. Our data is in the tens of millions, and each execution of mapReduce takes about 5-6 seconds. Fortunately, our application is not very real-time. So basically the data is cached for 2 hours, and then mapReduce is executed to obtain the latest results.

Like +0

Add Reply

左手右手慢動作2017-04-21 11:18:15 2 floor

I think this article will explain the performance issues of mongodb!
http://stackoverflow.com/questions/39...

Like +0

Add Reply

伊謝爾倫2017-04-21 11:18:15 1 floor

I have done similar things before using MapReduce. Because it was time consuming, I later modified it to use aggregate query for statistics. The specific example is as follows:

> db.user.findOne()
{
    "_id" : ObjectId("557a53e1e4b020633455b898"),
    "accountId" : "55546fc8e4b0d8376000b858",
    "tags" : [
        "金牌會員",
        "鉆石會員",
        "鉑金會員",
        "高級會員"
    ]
}

The basic document model is as above, I indexed it on accountId and tags

db.user.ensureIndex({"accountId":1, "tags":1})

Now it is required to count the tags under the user. MapReduce is designed as follows:

var mapFunction = function() {
   if(this.tags){
       for (var idx = 0; idx < this.tags.length; idx++) {
           var tag = this.tags[idx];
           emit(tag, 1);
       }
   }
};

var reduceFunction = function(key, values) {
    var cnt=0;   
    values.forEach(function(val){ cnt+=val;});  
    return cnt;
};


db.user.mapReduce(mapFunction,reduceFunction,{out:"mr1"})    //輸出到集合mr1中

Result:

> db.mr1.find().pretty()
{ "_id" : "金牌會員", "value" : 9000 }
{ "_id" : "鉆石會員", "value" : 43000 }
{ "_id" : "鉑金會員", "value" : 90000 }
{ "_id" : "銅牌會員", "value" : 3000 }
{ "_id" : "銀牌會員", "value" : 5000 }
{ "_id" : "高級會員", "value" : 50000 }

It seems to have achieved our effect. I just used a small amount of data 10W to do the above test. During the execution process, it will output:

> db.mapReduceTest.mapReduce(mapFunction,reduceFunction,{out:"mr1"})
{
    "result" : "mr1",
    "timeMillis" : 815,                   //耗時多久
    "counts" : {
        "input" : 110000,             //掃描的文檔數(shù)量
        "emit" : 200000,              //mongo執(zhí)行計算的次數(shù)
        "reduce" : 2001,
        "output" : 6
    },
    "ok" : 1
}

Because the data of my mock is relatively simple and regular, it can be seen that the number of calculations is almost twice the number of scanned documents. Later, I used random data for testing and found that the results were even worse. I decisively gave up the implementation of MapReduce and switched to other methods. accomplish.

Like +0

Add Reply