我現(xiàn)在有個需求
需要記錄頁面點擊數(shù)據(jù),上游吐到redis中,
上游怎么吐到redis中對我們來說是透明的,
我們只用關(guān)心redis中如何存儲就好。
查詢某天某頁面下所有點擊數(shù),即有效點擊總數(shù)
+無效點擊總數(shù)
查詢某天某頁面某分辨率下 所有有效點擊總數(shù)
和無效點擊總數(shù)
查詢某天某頁面某分辨率下所有的坐標點及點擊數(shù)
框選查詢(相當于范圍查詢) 查詢某天某頁面某分辨率下 某個范圍(比如100<x<1000,30<y<600
)坐標點的有效點擊總數(shù)
和無效點擊總數(shù)
。
同時還有各種維度的有效點擊數(shù)和無效點擊數(shù)
關(guān)于有效點擊和無效點擊:我們進行存儲時可以用0和1區(qū)分,至于前端如何定義有效或者無效,對我們透明。
關(guān)于分辨率:按寬度區(qū)分共有三種:比如1380 1190 1000; 根據(jù)現(xiàn)有實現(xiàn):有了分辨率可以將zset切割的小一些,比如沒有分辨率可能有共10w個key 的zset,有了分辨率我一次最多查詢某個分辨率下 可能只有3w個key 的zset
。
關(guān)于框選: 就是用鼠標在頁面上從左上到右下劃出一個框, 我們會查詢這個選擇框范圍(如100<x<1000,30<y<600
)內(nèi)所有的點相關(guān)的數(shù)據(jù)。
關(guān)于維度: 就是點擊這個點的用戶 所在地區(qū)
, 所使用瀏覽器
上游吐過來的點經(jīng)過處理存入redis,
x,y都經(jīng)過
Math.ceil(realx / 4.0) * 4;
Math.ceil(realy / 4.0) * 4;
處理,即相當于4個點為一個點
存儲到redis.
zset
來實現(xiàn)需求。一個 zset 記錄某天某頁面某分辨率的數(shù)據(jù)
key
為 date_pageid_分辨率 member為: 有效OR無效_ 瀏覽器_ 地區(qū)score
為點擊數(shù)
舉例:key
: 20140908_0001_1000member
: 0_1_10對應無效點擊,1對應瀏覽器表中的QQ瀏覽器,1對應地區(qū)表中的上海
score
:10
每個坐標點相關(guān)數(shù)據(jù)都用一個對應的
zset
記錄key
為 date_pageid_分辨率_ 橫坐標_ 縱坐標member
為: 有效OR無效瀏覽器地區(qū)score
為點擊數(shù)
舉例:key
: 20140908_0001_1000_23_478member
: 0_1_20對應無效點擊,1對應瀏覽器表中的QQ瀏覽器,2對應地區(qū)表中的北京
score
:12
這樣可以理解為,坐標為(23,478)
這個點,在20140908
這一天,pageid
為0001的頁面上,分辨率
為1000的時候,來自北京地區(qū)的,使用QQ瀏覽器,進行的無效點擊數(shù)
為12
兩個zset 做輔助范圍查詢
通過zrangebyscore 分別獲得x,y范圍(如
100<x<1000,30<y<600
)對應的key集然后取交集獲得需要查詢的真正key集
?
y的輔助查詢zetkey
為: date_pageid_分辨率yeg.20140908_0001_1000_y
member
: 為 ?date_pageid分辨率_ 橫坐標 _縱坐標eg.20140908_0001_1000_23_478
score
為:橫坐標y的值?eg.478
x的輔助查詢zet
key
為: date_pageid_分辨率xeg.20140908_0001_1000_x
member
: 為 ?date_pageid分辨率_ 橫坐標 _縱坐標eg.20140908_0001_1000_23_478
score
為:橫坐標X的值?eg.23
查詢速度太慢
舉例 :比如我想一次取出某天某頁面某分辨率下所有的點,
可能需要一次查詢幾萬個keyeg. keys("20140908_0001_1000_*");
獲得查詢的key集之后 ,還需要使用zrange(key)
得到每個key下的member集,然后再使用zscore(key,member)
獲得對應的key和 member下的score值
可以看到這個操作:
串行化執(zhí)行,不容易改成并行化。
暫時的解決方案:可以利用異步任務執(zhí)行 ,進行緩存以優(yōu)化查詢速度,
但是有可能引起redis慢查詢問題。
框選行為
舉例:查詢范圍(如100<x<1000,30<y<600
)
使用
zrangeByScore(key, 100, 1000)``zrangeByScore(key, 30, 600)
查出x,y在各自范圍分別對應的key集,然后
取交集
獲得最終需要查詢的key集
獲得查詢的key集之后 ,還需要使用
zrange(key)
得到每個key下的member集,
然后再使用
zscore(key,member)
獲得對應的key和 member下的score值
缺點:因為查詢范圍不定,所以無法進行緩存,當查詢范圍很大時,即key很多的時候,查詢速度很慢。和上面查詢坐標點一樣
串行化執(zhí)行,不容易改成并行化。有可能引起redis慢查詢問題。
不知道大家針對我
現(xiàn)在的實現(xiàn)方案有什么更好的優(yōu)化策略
或者針對查詢需求有沒有什么更好的設計方案
,
新人第一次發(fā)帖,感謝@暗雨西喧
對排版的提醒。
請大家多指教。
?
小伙看你根骨奇佳,潛力無限,來學PHP伐。
That is, when there are many keys, the query speed is very slow
Many of the key queries are slow. Does this mean that the zset actually clicked on the last query is used?
Not sure how many resolutions there will be? You can modify the key of zset not to have resolution, but to have resolution in value. This can reduce a lot of keys. If your search conditions have resolution, you can do some filtering after searching for value, and the speed should be very fast.
But the box selection behavior is because the range is variable
Frame selection query (equivalent to range query) Query on a certain day, a certain page, a certain resolution
The total number of valid clicks and the total number of invalid clicks at coordinate points in a certain range (such as 100<x<1000,30<y<600).
It’s like asking users to manually draw an area for search. Can you consider changing this condition to include the entire image? Cut into 10 parts (100 parts, 10,000 parts). Each part is a square. The condition can only select a certain square, rather than just drawing it randomly. In this way, the data in each square can be "summarized" predictably. .
Let’s talk about these first, see if it helps, if you still need to optimize, you can modify the query description in the question. There are some places that can be supplemented by your brain, but I don’t know if you want to express this, so I will give you a simpler one. Write the examples in detail and use typesetting, it looks very tiring
I wrote them separately. Here is what you have done after correcting the question
First of all, you are not using the essence of zset, which is automatically sorting the index according to scop. It seems that you must not understand the resolution I mentioned above when you put it in value. Let me give you an example
A zset records the data of a certain page and a certain resolution on a certain day
The key is date_pageid_resolution and the member is: valid OR invalid_browser_region
score is the number of clicks
Example: key : 20140908_0001_1000
member: 0_1_1 0 corresponds to invalid clicks, 1 corresponds to QQ browser in the browser table, 1 corresponds to Shanghai in the region table
score:10
Suppose there are 3 resolutions: A, B, C
Saving the key as you said will look like this
20140908_0001_A
20140908_0001_B
20140908_0001_C
The storage method I am talking about is
key:20140908_0001
member:valid OR invalid_browser_region_number of clicks
score:resolution
When searching like this, you actually only need to get the 0001 page of the day 20140908 (just 1 key), and then range A resolution and look at its members. This is not easy to use because it does not display the nice resolution. It's not interesting here. There are problems with using zset in this case.
The above is just an example! Actually, don't do this. There is a better way. After you revised the question and understood the requirements, I came up with a new approach.
zset:data set
key:date-page-resolution
score: coordinates (think about turning x and y into a number)
member: browser-region-number of valid clicks-number of invalid clicks
If the date becomes an optional range, this set is needed to specifically store the date. We call it: date set
key:page
score:date
member:data set key
The purpose of the date set is to index the data set key. Your method of using key() is very slow because it will perform an all search. Your example is a certain day. I understand that there may be no date range, so the date set can be unnecessary. Similarly, if there are too many resolutions and it is impossible to master, you can also imitate this set to make a collection of keys!
Then there are two coordinates zset. I didn’t look at them carefully. Let’s think carefully about using zset.
You gave 4 query examples below
A Query the number of all clicks on a certain page on a certain day, that is, the total number of valid clicks + the total number of invalid clicks
B Query the total number of valid clicks and the total number of invalid clicks on a certain page and a certain resolution on a certain day
C Query all the coordinate points and number of clicks on a certain page and a certain resolution on a certain day
D Box selection query (equivalent to range query) Query the total number of valid clicks and the total number of invalid clicks at a certain range (such as 100<x<1000,30<y<600) coordinate points on a certain day at a certain resolution on a certain page
A: You said there are 3 resolutions, then add 3 resolutions after the key, range 0 and -1 are all taken
20150415-page1-1380,20150415-page1-1190,20150415-page1-1000
B: This is good. Just check one key and get all range 0 and -1
20150415-page1-1380
C: Okay, you can also get the coordinates for the first two, but you don’t have a show
D: After using your coordinate set to get the key, check the data set range coordinates
I finished writing, but I found a small problem when checking for typos. It seems that you need to record the valid and invalid browsers in each region? If it is not necessary, the member in the data set can just record valid and invalid numbers. If it is necessary, the design needs to be considered based on the number of browsers in the region. Your question does not seem to introduce this aspect.
Maybe my understanding of redis is different from the questioner’s. According to my idea, to achieve the above requirements may be
Remember log, etl transfer data
Finally available for inquiry