亚洲精品色播一区二区,亚洲欧美乱日韩乱国产,free性欧美人与牛

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

java - 面試問(wèn)題：查找出現(xiàn)次數(shù)最多的ip

巴扎黑 2017-04-18 10:31:54

768

就是現(xiàn)在有一個(gè)log 里面全都是訪問(wèn)者的ip(數(shù)據(jù)很大很大) 一行一個(gè)ip
比如
123.22.22.187
123.22.22.182
123.22.21.187
123.22.42.187
123.23.22.187
123.22.22.182
121.22.22.182
....

怎么查出來(lái)訪問(wèn)次數(shù)最多的10個(gè)IP?

數(shù)據(jù)量非常大不可能直接讀進(jìn)來(lái) 然后排序。。。

方式：
第一步：
如下讀?。?br>聲明一個(gè)變量 dictionary <string ,long> record
123.22.22.187
123.22.22.182
123.22.21.187
123.22.42.187
123.23.22.187
123.22.22.182
121.22.22.182
先與藍(lán)色部分為key ，key 出現(xiàn)的次數(shù)為value 存在 record中。
這樣，record的最大個(gè)數(shù)是255個(gè)，不算多，更說(shuō)不上要多少內(nèi)存。
再來(lái)一次大掃除，
使record 剩下最多的 10 個(gè)元素
以這10 個(gè)元素為條件，再掃描一次表，把符合條件的數(shù)據(jù)寫(xiě)到另一個(gè)[文件A]中，

第二步：
掃描 [文件A] 以下面數(shù)據(jù)藍(lán)色部分為key ，以出現(xiàn)次數(shù)為value 寫(xiě)到變量 dictionary <string ,long> record 中
123.22.22.187
123.22.22.182
123.22.21.187
123.22.42.187
123.23.22.187
123.22.22.182
121.22.22.182
。。。
第三步：
。。。
第四步：
。。。
就這樣，
用4 步相同的方法，提出最多的數(shù)據(jù)：10條，

問(wèn)題，上面這種方式感覺(jué)不行啊，是我沒(méi)理解嗎？？
比如說(shuō)：
123，125，196，158，假設(shè)這個(gè)ip地址出現(xiàn)的次數(shù)最多，那么如下，
121，123
121，14
121，56
123，125，196，158，
123，125，196，158，

如果分段比較次數(shù)，那么123，125，196，158，這個(gè)ip在第一步就被淘汰了？

巴扎黑

reply all(9)

伊謝爾倫2017-04-18 10:33:54 9 floor

The correct answer is this: sort|uniq -c

Like +0

Add Reply

大家講道理2017-04-18 10:33:54 8 floor

a.b.c.d

Deal with a first.

All files are divided into multiple reads (such as 255 times).

The records are then stored in 255 files, corresponding to a (such as 1.b.c.d; 2.b.c.d;)

This way you can get the largest ten.

Next, deal with b.

. . .

Finally got the answer.

Like +0

Add Reply

伊謝爾倫2017-04-18 10:33:54 7 floor

This kind of problem can use max-min heap.

Like +0

Add Reply

巴扎黑2017-04-18 10:33:54 6 floor

Basic idea

對(duì)所有的ip進(jìn)行分組，統(tǒng)計(jì)出現(xiàn)次數(shù)，ip做key，出現(xiàn)次數(shù)做val，保存在關(guān)聯(lián)數(shù)組中
數(shù)組按照鍵值（出現(xiàn)次數(shù)）降序排序，取出前 10 條記錄

Although it’s not java語(yǔ)言描述的，但javascript描述, I should be able to understand it. Anyway, I have tested it and it can achieve your effect:

The following is the complete example code:

    var data = [123.22.22.187 , 123.22.22.187 ....];  // 等價(jià)于日志中每行ip
    var data = getCount(data); // 對(duì)所有的ip進(jìn)行分組，統(tǒng)計(jì)出現(xiàn)次數(shù)
                               // ip做key，出現(xiàn)次數(shù)做val，保存在關(guān)聯(lián)數(shù)組中
    var rel  = getRecord(data , 10 , 'desc'); // 這個(gè)就是你想的結(jié)果了
    
    /*
     * ip 分組并統(tǒng)計(jì)次數(shù)的具體實(shí)現(xiàn)
     * @param Array data  日志文件中存放的那些所有ip組成的數(shù)組
     * @return Object
     */
    function getCount(data){
        var rel= {}; // 根據(jù)ip進(jìn)行分組后的結(jié)果，由于js沒(méi)有關(guān)聯(lián)數(shù)組
                      // 所以用對(duì)象替換
        var i;
        var count;
        var n;
        var cur;
        var curC;
        // 判斷是否已經(jīng)統(tǒng)計(jì)過(guò)了的
        var checkIsCount = function(ip){
            var k;
            
            for (k in rel)
                {
                    if (k === ip) {
                        return true;
                    }
                }
            
            return false;
        };
        
        // 每次固定一個(gè) ip
        // 先判斷當(dāng)前 ip 在 rel 中是否已經(jīng)存在（排除已被統(tǒng)計(jì)的ip，避免重復(fù)統(tǒng)計(jì)）
        // 不存在的時(shí)候，開(kāi)始統(tǒng)計(jì)其出現(xiàn)次數(shù)
        for (i = 0; i < data.length; ++i)
            {
                cur = data[i];
                count = 1;
                
                if (!checkIsCount(cur)) {
                    for (n = i + 1; n < data.length; ++n)
                        {
                            curC = data[n];
    
                            if (cur === curC) {
                                count++;
                            }    
                        }
    
                     rel[cur] = count;      
                }
            }
            
       // 保存了所有不同ip，及其出現(xiàn)次數(shù)的對(duì)象
       console.log('ip進(jìn)行了分組并統(tǒng)計(jì)后的結(jié)果：' , rel);
       return rel;
    }
    
    /*
     * 數(shù)組按照鍵值（出現(xiàn)次數(shù)）降序排序，取出前 len 條記錄
     * @param Object data 分組并統(tǒng)計(jì)出現(xiàn)次數(shù)后的對(duì)象
     * @param Number len  取出多少條
     * @param String sortType 升序還是降序(asc | desc)
     * @return Array
     */
    function getRecord(data , len , sortType){
        var rel = {};
        var result = [];
        var k;
        var k1;
        var curKey;
        var count = 0;
        // 檢查 ip 是否已被獲取
        var checkSorted = function(ip){
            var k;
            
            for (k in rel)
                {
                    if (k === ip) {
                        return true;
                    }
                }
                
            return false;
        };
    
        // 排序算法：選擇排序
        for (k in data)
            {
                
               if (count >= len) {
                   break;
               }
               
               curKey = k;
               
               if (!checkSorted(k)) {
                   for (k1 in data) 
                       {
                           if (!checkSorted(k1)) {
                              if (sortType === 'asc' ? data[curKey] > data[k1] : data[curKey] < data[k1]) {
                                  curKey = k1;
                              } 
                           }
                       }
                   
                   
                   // 按照鍵值排序后的最大值|最小值
                   rel[curKey] = data[curKey];
    
                   // 返回可視化結(jié)果
                   result.push(curKey);
                   
                   // 獲取的長(zhǎng)度增加
                   count++
               }
            }
    
        // 按照鍵值排序后的 len 長(zhǎng)度的記錄
        console.log('取出指定長(zhǎng)度記錄的實(shí)際結(jié)果：' , rel);
        console.log('取出指定長(zhǎng)度記錄的可視化結(jié)果：' , result);
        return result;
    }

Like +0

Add Reply

黃舟2017-04-18 10:33:54 5 floor

map-reduce or sort | uniq -c
Obviously use hadoop (map-reduce)?

Like +0

Add Reply

PHPzhong2017-04-18 10:33:54 4 floor

Sort it first, it will be easier to deal with after sorting.

Like +0

Add Reply

Ty802017-04-18 10:33:54 3 floor

This question should be broken into two parts to understand,
1. Reading a large file at once is inefficient
2. Find the top 10 records with the most occurrences
Question 1: For reading large files, you can use java AIO Implementation method reads files asynchronously
Question 2: LindedList needs to be extended. Each time a record is appended, first check whether a record exists, and if so, push the record position forward (this is more similar to the java memcached caching algorithm)

Like +0

Add Reply

巴扎黑2017-04-18 10:33:54 2 floor

These types of problems all have the same idea, "divide and conquer".
Each IP is equivalent to a 32-bit integer number. Traverse all the IPs in the log, divide them into n files according to the value range and save them (make sure each file is small enough). In this case, the same IP will definitely be assigned to the same file. In a file; then process these n files one by one, count the top 10 IPs that appear in each file; finally get the top 10 from these 10*n IPs.

Like +0

Add Reply