亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Home Database Mysql Tutorial 構(gòu)建一個(gè)類timeline系統(tǒng)的架構(gòu)設(shè)計(jì)_MySQL

構(gòu)建一個(gè)類timeline系統(tǒng)的架構(gòu)設(shè)計(jì)_MySQL

May 31, 2016 am 08:50 AM

最近一直對(duì)微博、twitter、微信朋友圈這類軟件所提供的類timeline系統(tǒng)架構(gòu)很有興趣,也可以叫做時(shí)間軸、news feed,或者status update,查閱了不少資料,也結(jié)合自己對(duì)于架構(gòu)設(shè)計(jì)的一些積累認(rèn)識(shí),嘗試著設(shè)計(jì)了一把。下圖是一個(gè)簡(jiǎn)單tweets界面:

首先明確目標(biāo),要設(shè)計(jì)的系統(tǒng)是一個(gè)用戶數(shù)、數(shù)據(jù)量、并發(fā)量足夠大的平臺(tái),按照一般經(jīng)驗(yàn):

1M+ Active User,10T后臺(tái)數(shù)據(jù),3k/per sec QPS,peek 6k/per sec QPS

這里聲明一點(diǎn),業(yè)務(wù)需求決定技術(shù)演化路線,任何項(xiàng)目開始做要start from the bottom up,千萬不要過度設(shè)計(jì)啊,下文都是嘗試的思考

這里先介紹一個(gè)簡(jiǎn)化的模型,當(dāng)系統(tǒng)單機(jī)+MySQL時(shí)候,我們的表結(jié)構(gòu)設(shè)計(jì)可以如下

用戶表:

| uid | username |

關(guān)系表:

| uid | following_uid |

feed表:

| fid | uid | content | create_time |

假設(shè)A用戶關(guān)注了B、C。B和C各發(fā)表一個(gè)feed直接insert feed表即可。

A查看自己的timeline的SQL可以表示為:

SELECT uid, content FROM feed WHERE uid IN (SELECT following_uid FROM 關(guān)系表 WHERE uid = A) ORDER BY create_time DESC

這里如果QPS較低,總體數(shù)據(jù)量在1T以下時(shí)候,單機(jī)+MySQL單純的靠scale up還可以滿足,但是一旦用戶數(shù)、數(shù)據(jù)量、并發(fā)量成倍的上去的時(shí)候,這個(gè)模型顯然不夠用了。

一個(gè)更加成熟的設(shè)計(jì)方案如下圖所示,

這里大致分為4個(gè)部分,Portal、Pusher、Puller以及存儲(chǔ),下面詳細(xì)的說明下。

存儲(chǔ)&緩存

先說存儲(chǔ)的原因是,隨著數(shù)據(jù)量的增大,我們首選要解決的是一個(gè)擴(kuò)容問題,這個(gè)實(shí)際在項(xiàng)目kick-off時(shí)候容量規(guī)劃就應(yīng)該做好,單庫單表無法滿足我們需求的時(shí)候,我們就分庫分表。

尤其是基于MySQL這種久經(jīng)考驗(yàn)但是缺少NoSQL特性的數(shù)據(jù)庫,在水平擴(kuò)展方面沒有很好的支持,這里涉及很多問題不多討論原因,所以最初的分庫分表規(guī)則要定制好,形成一套規(guī)則,以及數(shù)據(jù)庫路由訪問層(DAL),防止數(shù)據(jù)遷移帶來的高運(yùn)維成本。

分庫可以按照業(yè)務(wù)重要程度分成不同庫服務(wù)于不同的業(yè)務(wù),例如核心的feed庫應(yīng)該單獨(dú)一個(gè)物理庫來存儲(chǔ),而非核心的例如用戶資料庫可以單獨(dú)一個(gè)物理庫+邏輯庫解決。

核心庫在這樣一個(gè)大型系統(tǒng),可以分為64庫,feed表可以分為256表。采用uid求模的方式來計(jì)算該feed應(yīng)該路由存儲(chǔ)到哪個(gè)庫分片(slice)的哪張表(table)上。還有一種思路是按照時(shí)間來分表,這里不做延伸。

表的存儲(chǔ)方面也有一些技巧,一般MySQL允許的單表容量在2000w,所以一般將熱表數(shù)據(jù)鎖定在一個(gè)2000w行的量級(jí)上,而將歷史的冷數(shù)據(jù)干脆archive到一個(gè)history表,這些表一般不輕易查詢。對(duì)于存儲(chǔ)引擎來說,熱表可以用InnoDB,冷表用MyISAM,定期由DBA維護(hù)保持熱表的行數(shù)穩(wěn)定不膨脹,提高熱表的查詢效率。

關(guān)于事務(wù)方面,其實(shí)很多場(chǎng)景下,我們要追求性能,就必須舍棄掉一些東西,例如一致性、可用性等等,在這里對(duì)于feed這種場(chǎng)景,我們完全可以犧牲一些一致性的苛刻要求,只追求最終一致性來達(dá)到我們對(duì)于減低系統(tǒng)復(fù)雜度,提高性能的目的,所以我們?cè)谶@種系統(tǒng)內(nèi)盡量不啟用事務(wù),不管是數(shù)據(jù)庫事務(wù),還是分布式事務(wù)都不用。

第二個(gè),按照上面那個(gè)復(fù)雜的連表查詢SQL,當(dāng)數(shù)據(jù)量很大的時(shí)候,首先uid落不到一個(gè)庫上更很少可能會(huì)是一張表,這種查詢必然在大數(shù)據(jù)情況下產(chǎn)品性能瓶頸,一個(gè)常見的解決模式便是利用緩存,用空間換時(shí)間的思想,盡量讓請(qǐng)求通通落到緩存上,也就是說提高緩存命中率,例如timeline如果能做到90%,那么用戶體驗(yàn)就會(huì)上升很多。

緩存這里指分布式緩存,相對(duì)與數(shù)據(jù)庫訪問,他們性能是非常高的,單機(jī)上萬QPS不成問題,而且響應(yīng)時(shí)間基本在

在timeline存儲(chǔ)的場(chǎng)景中,可以將用戶的所有關(guān)注人的feed組成一個(gè)list,存儲(chǔ)在分布式緩存中,例如redis的存儲(chǔ)數(shù)據(jù)結(jié)構(gòu)選擇list,可以用RPUSHX命令更新一組由feed組成的timeline。

當(dāng)然用redis這種開源的NoSQL當(dāng)緩存,也要有很好的HA策略,例如主從,雙寫隨機(jī)讀,一致性哈希等,他們?cè)谶\(yùn)維成本,性能、客戶端輕重角度各有優(yōu)劣,因此選擇最適合自己的方案即可。還有當(dāng)不可用時(shí)候的雪崩效應(yīng),會(huì)不會(huì)擊穿數(shù)據(jù)庫等,都需要考慮。

另外這里延伸下,pusher的下游是數(shù)據(jù)源,而cache的更新可以不走pusher,而是在MySQL下面接入一個(gè)偽裝的從庫,接收row based binlog來判斷insert、update、delete從而可以實(shí)現(xiàn)更新timeline cache的邏輯。

Portal

這里的Portal提供面向客戶的對(duì)外服務(wù)。

最上層分別是WEB和API的模塊,一個(gè)類timeline系統(tǒng)一般是web和app端都覆蓋的,因此,WEB提供平臺(tái)端訪問直接給前端提供服務(wù);而API可以遵循REST架構(gòu)風(fēng)格,通過OAuth2.0協(xié)議做權(quán)限驗(yàn)證,對(duì)外給第三方提供服務(wù)。

第二層我叫做Presentation tier,主要負(fù)責(zé)基礎(chǔ)的權(quán)限驗(yàn)證,封裝不同的view給前端使用。

第三層到達(dá)了核心的業(yè)務(wù)邏輯層。

其實(shí)按照一個(gè)合理的分層思想,還應(yīng)該有數(shù)據(jù)資源層、物理訪問層,這幅圖其實(shí)想更多的表達(dá)一個(gè)流程圖的意義,故沒有表現(xiàn)出這兩個(gè)底層來。

Pusher

當(dāng)系統(tǒng)比較簡(jiǎn)單的時(shí)候,Portal可以直接來操作底層數(shù)據(jù)源MySQL存儲(chǔ),但是別忘了我們要構(gòu)建的是一個(gè)復(fù)雜的大規(guī)模分布式系統(tǒng),因此我們要借鑒一些互聯(lián)網(wǎng)設(shè)計(jì)的常用模式,這里最適合的模式便是“異步解耦”。

一般單機(jī)App server(如Tomcat)其QPS可以達(dá)到5000左右,這只是理論值,而且不考慮峰值,利用集群模式,將server scale out橫向擴(kuò)展,做到彼此無狀態(tài),可以一定程度解決性能瓶頸,但是當(dāng)客戶端要求響應(yīng)極其苛刻的時(shí)候(例如timeline首頁要求必須150ms內(nèi)返回),不能以阻塞客戶端、或者犧牲性能為代價(jià),這時(shí)候我們可以把異步來解耦,客戶端直接返回,由后端服務(wù)慢慢消化,在Java中Executors框架遍提供這種單機(jī)的模型,一個(gè)BlockingQueue,后面是一個(gè)線程池(thread pool)按照系統(tǒng)可以接受的吞吐量來處理任務(wù)。

Pusher是一種重發(fā)輕查的方式,如下圖所示,它有兩個(gè)職責(zé),1)存儲(chǔ)MySQL,2)fan-out去更新所有活躍用戶的timeline cache

不管#1還是#2,前面都接了一個(gè)queue來,這個(gè)queue可以是redis也可以是 Kestrel ,這里說下 Kestrel ,它是twitter開源的一個(gè)消息隊(duì)列,它的一個(gè)去中心化思路對(duì)于小型團(tuán)隊(duì)秉承簡(jiǎn)單易用的原則很重要:

“Dropping the requirement on cross communication makes it horizontally scale to infinity and beyond: no multicast, no clustering, no "elections", no coordination at all. No talking! Shhh!”

Delivery可以看做是投遞服務(wù),主要是fan-out,所謂fan-out如下圖所示,汲取的是“推”的思想,將新的feed更新到粉絲的timeline cache中,就是上一節(jié)提到的分布式緩存。

當(dāng)然fan-out也不是所有的粉絲都分發(fā)出去,那對(duì)于“姚晨”這種有千萬粉絲的用戶來說,瞬間的負(fù)荷就太大了。所以要采用分級(jí)的策略,也就是說將粉絲按照活躍程度分為不同的等級(jí),對(duì)于近一兩天有登陸的用戶push到優(yōu)先queue來處理,而將不活躍甚至數(shù)周都沒有登陸的用戶干脆不進(jìn)行推送。

Puller

再來看Puller,它可以看做是一個(gè)“拉”的過程。

對(duì)于活躍用戶,他的timeline cache還沒有過期(expired),那么可以按照uid直接去redis緩存中拉數(shù)據(jù)。

對(duì)于非活躍用戶,他的timeline cache未命中,所以要建立這個(gè)timeline cache,按照最開始介紹的簡(jiǎn)單模型,需要取出所有的關(guān)注uid,然后分庫分表查詢,這里也不是一個(gè)串行的過程,同樣在單機(jī)上利用Java提供的Excecutors框架,用invokeAll(Callable)方法獲取所有Future,設(shè)置一個(gè)timeout時(shí)間查詢失敗的干脆忽略,還有考慮用CyclicBarrier,需要所有的子任務(wù)都完成時(shí),才執(zhí)行主任務(wù)取進(jìn)行多數(shù)據(jù)源merge,rank、sort、filter、page工作。

對(duì)于Puller拉的緩存其實(shí)還以分為3級(jí),

1)JVM緩存:?jiǎn)螜C(jī)JVM緩存,當(dāng)前端portal不是隨機(jī)call puller的時(shí)候,而是固定user落到固定的server上時(shí)候,這個(gè)緩存就非常重要,可以嘗試用guava的cache來實(shí)現(xiàn)。

2)page or fragment cache:通常明星用戶的timeline會(huì)被很多人訪問,可以嘗試將這些高頻訪問用戶的渲染出來的timeline進(jìn)行一個(gè)緩存,這個(gè)渲染的結(jié)果甚至可以是包含js、模板運(yùn)算完畢的。

3)上述的timeline cache,不贅述。

對(duì)于緩存還可以將內(nèi)容和id分開,眾所周知,固定長(zhǎng)度并且都是數(shù)字保存的數(shù)據(jù)結(jié)構(gòu)更加有利于查詢和節(jié)省存儲(chǔ)空間,那么緩存可以分為兩類,一級(jí)緩存是

| uid | fid | following_uid |

二級(jí)緩存是

| fid | content |

而二級(jí)緩存就可以用protobuf、thrift等高性能二進(jìn)制壓縮協(xié)議來存儲(chǔ)交換,而一級(jí)緩存可以用更少的資源來存儲(chǔ),當(dāng)二級(jí)緩存down的時(shí)候,還可以用來迅速查詢MySQL取數(shù)據(jù)。

Comet

一般客戶端都會(huì)看到“幾條未讀新鮮事”之類的提醒,一般都是通過comet來實(shí)現(xiàn)的long polling,用nginx的push stream module可以完成這個(gè)操作,不斷對(duì)比timeline的last fid和timeline cache中的last fid,如果有diff則提示count(diff)未讀。

服務(wù)型架構(gòu)

Portal、Pusher、Puller等在模塊化的基礎(chǔ)之上,可以采用服務(wù)化的思想來改造,twitter用的即是 Finagle 框架,該種框架可以提供高性能的二進(jìn)制RPC通信交互協(xié)議,同時(shí)提供服務(wù)注冊(cè)、發(fā)現(xiàn),服務(wù)治理的能力。這里打個(gè)廣告自己在項(xiàng)目組開源的 navi-rpc 和這種服務(wù)化框架思想類似。

上述純個(gè)人研究思考用,實(shí)際情況要具體問題具體分析,沒有放諸四海而皆準(zhǔn)的辦法,但是有些設(shè)計(jì)的原則和模式可以沉淀下來借鑒。

原創(chuàng)文章,轉(zhuǎn)載請(qǐng)注明來源于neoremind.net。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

PHP Tutorial
1488
72
Establishing secure remote connections to a MySQL server Establishing secure remote connections to a MySQL server Jul 04, 2025 am 01:44 AM

TosecurelyconnecttoaremoteMySQLserver,useSSHtunneling,configureMySQLforremoteaccess,setfirewallrules,andconsiderSSLencryption.First,establishanSSHtunnelwithssh-L3307:localhost:3306user@remote-server-Nandconnectviamysql-h127.0.0.1-P3307.Second,editMyS

Analyzing the MySQL Slow Query Log to Find Performance Bottlenecks Analyzing the MySQL Slow Query Log to Find Performance Bottlenecks Jul 04, 2025 am 02:46 AM

Turn on MySQL slow query logs and analyze locationable performance issues. 1. Edit the configuration file or dynamically set slow_query_log and long_query_time; 2. The log contains key fields such as Query_time, Lock_time, Rows_examined to assist in judging efficiency bottlenecks; 3. Use mysqldumpslow or pt-query-digest tools to efficiently analyze logs; 4. Optimization suggestions include adding indexes, avoiding SELECT*, splitting complex queries, etc. For example, adding an index to user_id can significantly reduce the number of scanned rows and improve query efficiency.

Handling NULL Values in MySQL Columns and Queries Handling NULL Values in MySQL Columns and Queries Jul 05, 2025 am 02:46 AM

When handling NULL values ??in MySQL, please note: 1. When designing the table, the key fields are set to NOTNULL, and optional fields are allowed NULL; 2. ISNULL or ISNOTNULL must be used with = or !=; 3. IFNULL or COALESCE functions can be used to replace the display default values; 4. Be cautious when using NULL values ??directly when inserting or updating, and pay attention to the data source and ORM framework processing methods. NULL represents an unknown value and does not equal any value, including itself. Therefore, be careful when querying, counting, and connecting tables to avoid missing data or logical errors. Rational use of functions and constraints can effectively reduce interference caused by NULL.

Performing logical backups using mysqldump in MySQL Performing logical backups using mysqldump in MySQL Jul 06, 2025 am 02:55 AM

mysqldump is a common tool for performing logical backups of MySQL databases. It generates SQL files containing CREATE and INSERT statements to rebuild the database. 1. It does not back up the original file, but converts the database structure and content into portable SQL commands; 2. It is suitable for small databases or selective recovery, and is not suitable for fast recovery of TB-level data; 3. Common options include --single-transaction, --databases, --all-databases, --routines, etc.; 4. Use mysql command to import during recovery, and can turn off foreign key checks to improve speed; 5. It is recommended to test backup regularly, use compression, and automatic adjustment.

Calculating Database and Table Sizes in MySQL Calculating Database and Table Sizes in MySQL Jul 06, 2025 am 02:41 AM

To view the size of the MySQL database and table, you can query the information_schema directly or use the command line tool. 1. Check the entire database size: Execute the SQL statement SELECTtable_schemaAS'Database',SUM(data_length index_length)/1024/1024AS'Size(MB)'FROMinformation_schema.tablesGROUPBYtable_schema; you can get the total size of all databases, or add WHERE conditions to limit the specific database; 2. Check the single table size: use SELECTta

Handling character sets and collations issues in MySQL Handling character sets and collations issues in MySQL Jul 08, 2025 am 02:51 AM

Character set and sorting rules issues are common when cross-platform migration or multi-person development, resulting in garbled code or inconsistent query. There are three core solutions: First, check and unify the character set of database, table, and fields to utf8mb4, view through SHOWCREATEDATABASE/TABLE, and modify it with ALTER statement; second, specify the utf8mb4 character set when the client connects, and set it in connection parameters or execute SETNAMES; third, select the sorting rules reasonably, and recommend using utf8mb4_unicode_ci to ensure the accuracy of comparison and sorting, and specify or modify it through ALTER when building the library and table.

Aggregating data with GROUP BY and HAVING clauses in MySQL Aggregating data with GROUP BY and HAVING clauses in MySQL Jul 05, 2025 am 02:42 AM

GROUPBY is used to group data by field and perform aggregation operations, and HAVING is used to filter the results after grouping. For example, using GROUPBYcustomer_id can calculate the total consumption amount of each customer; using HAVING can filter out customers with a total consumption of more than 1,000. The non-aggregated fields after SELECT must appear in GROUPBY, and HAVING can be conditionally filtered using an alias or original expressions. Common techniques include counting the number of each group, grouping multiple fields, and filtering with multiple conditions.

Implementing Transactions and Understanding ACID Properties in MySQL Implementing Transactions and Understanding ACID Properties in MySQL Jul 08, 2025 am 02:50 AM

MySQL supports transaction processing, and uses the InnoDB storage engine to ensure data consistency and integrity. 1. Transactions are a set of SQL operations, either all succeed or all fail to roll back; 2. ACID attributes include atomicity, consistency, isolation and persistence; 3. The statements that manually control transactions are STARTTRANSACTION, COMMIT and ROLLBACK; 4. The four isolation levels include read not committed, read submitted, repeatable read and serialization; 5. Use transactions correctly to avoid long-term operation, turn off automatic commits, and reasonably handle locks and exceptions. Through these mechanisms, MySQL can achieve high reliability and concurrent control.

See all articles