


How to do Docker mirroring of Node service? Detailed explanation of extreme optimization
Oct 19, 2022 pm 07:38 PMDuring this period, I have been developing an HTML dynamic service that is common to all categories of Tencent documents. In order to facilitate the generation and deployment of access to various categories, and to comply with the trend of moving to the cloud, I am considering using Docker. Methods are used to fix service content and manage product versions in a unified manner. This article will share the optimization experience I accumulated in the process of serving Docker for your reference. [Related tutorial recommendations: nodejs video tutorial]
Start with an example. Most students who are new to Docker should write the Dockerfile of the project like this, as shown below:
FROM?node:14 WORKDIR?/app COPY?.?. #?安裝?npm?依賴 RUN?npm?install #?暴露端口 EXPOSE?8000 CMD?["npm",?"start"]
Build, package, upload, all in one go. Then look at the image status. Damn it, the volume of a simple node web service has reached an astonishing 1.3 G, and the image transmission and construction speed is also very slow:
It would be fine if this image only needs to be deployed with one instance, but this service must be provided to all development students for high-frequency integration and deployment environment (see my previous article for solutions to achieve high-frequency integration). First of all, if the image size is too large, it will inevitably affect the image pull and update speed, and the integration experience will be worse. Secondly, after the project is launched, there may be tens of thousands of test environment instances online at the same time. Such container memory consumption cost is unacceptable for any project. An optimized solution must be found.
After discovering the problem, I began to study Docker’s optimization plan and prepared to perform surgery on my image.
Node project production environment optimization
The first thing to do is of course the most familiar area of ????the front end, optimizing the size of the code itself. Typescript was used when developing the project before. In order to save trouble, the project was directly packaged using tsc to generate es5 and then ran directly. There are two main volume problems here. One is that the development environment ts source code has not been processed, and the js code used in the production environment has not been compressed.
The other is that the referenced node_modules is too bloated. Still includes many npm packages for development and debugging environments, such as ts-node, typescript, etc. Now that it is packaged into js, ??these dependencies should naturally be removed.
Generally speaking, since the server-side code will not be exposed like the front-end code, services running on physical machines are more concerned about stability and do not care about more volume, so these places generally do not Do processing. However, after Dockerization, as the deployment scale becomes larger, these problems become very obvious and need to be optimized in the production environment.
In fact, we are very familiar with the optimization methods of these two points on the front end. If it is not the focus of this article, we will briefly mention it. For the first point, use Webpack babel to downgrade and compress the Typescript source code. If you are worried about error troubleshooting, you can add sourcemap, but it is a bit redundant for docker images, which will be discussed later. For the second point, sort out the dependencies and devDependencies of the npm package, and remove dependencies that are not necessary at runtime to facilitate use in the production environment. npm install --production
Install dependencies.
Optimize the project image volume
Use as streamlined a basic image as possible
We know that container technology provides process isolation at the operating system level, and the Docker container itself is a process running on A process under an independent operating system, that is to say, the Docker image needs to be packaged into an operating system-level environment that can run independently. Therefore, an important factor in determining image size becomes obvious: the size of the Linux operating system packaged into the image.
Generally speaking, to reduce the size of the dependent operating system, we need to consider two aspects. The first is to remove as many unnecessary tool libraries as possible under Linux, such as python, cmake, telnet wait. The second is to choose a more lightweight Linux distribution system. Regular official images should provide castrated versions of each release based on the above two factors.
Take node:14, the officially provided version of node, as an example. In the default version, its basic operating environment is Ubuntu, which is a large and comprehensive Linux distribution to ensure maximum compatibility. The version that removes unnecessary tool library dependencies is called the node:14-slim version. The smallest image distribution is called node:14-alpine. Linux alpine is a highly streamlined lightweight Linux distribution that only contains basic tools. Its own Docker image is only 4 to 5M in size, so it is very suitable for making the smallest version of the Docker image.
In our service, since the dependencies for running the service are determined, in order to reduce the size of the base image as much as possible, we choose the alpine version as the base image for the production environment.
分級構(gòu)建
這時候,我們遇到了新的問題。由于 alpine 的基本工具庫過于簡陋,而像 webpack 這樣的打包工具背后可能使用的插件庫極多,構(gòu)建項目時對環(huán)境的依賴較大。并且這些工具庫只有編譯時需要用到,在運行時是可以去除的。對于這種情況,我們可以利用 Docker 的分級構(gòu)建
的特性來解決這一問題。
首先,我們可以在完整版鏡像下進行依賴安裝,并給該任務(wù)設(shè)立一個別名(此處為build
)。
#?安裝完整依賴并構(gòu)建產(chǎn)物 FROM?node:14?AS?build WORKDIR?/app COPY?package*.json?/app/ RUN?["npm",?"install"] COPY?.?/app/ RUN?npm?run?build
之后我們可以啟用另一個鏡像任務(wù)來運行生產(chǎn)環(huán)境,生產(chǎn)的基礎(chǔ)鏡像就可以換成 alpine 版本了。其中編譯完成后的源碼可以通過--from
參數(shù)獲取到處于build
任務(wù)中的文件,移動到此任務(wù)內(nèi)。
FROM?node:14-alpine?AS?release WORKDIR?/release COPY?package*.json?/ RUN?["npm",?"install",?"--registry=http://r.tnpm.oa.com",?"--production"] #?移入依賴與源碼 COPY?public?/release/public COPY?--from=build?/app/dist?/release/dist #?啟動服務(wù) EXPOSE?8000 CMD?["node",?"./dist/index.js"]
Docker 鏡像的生成規(guī)則是,生成鏡像的結(jié)果僅以最后一個鏡像任務(wù)為準。因此前面的任務(wù)并不會占用最終鏡像的體積,從而完美解決這一問題。
當然,隨著項目越來越復(fù)雜,在運行時仍可能會遇到工具庫報錯,如果曝出問題的工具庫所需依賴不多,我們可以自行補充所需的依賴,這樣的鏡像體積仍然能保持較小的水平。
其中最常見的問題就是對node-gyp
與node-sass
庫的引用。由于這個庫是用來將其他語言編寫的模塊轉(zhuǎn)譯為 node 模塊,因此,我們需要手動增加g++ make python
這三個依賴。
#?安裝生產(chǎn)環(huán)境依賴(為兼容?node-gyp?所需環(huán)境需要對?alpine?進行改造) FROM?node:14-alpine?AS?dependencies RUN?apk?add?--no-cache?python?make?g++ COPY?package*.json?/ RUN?["npm",?"install",?"--registry=http://r.tnpm.oa.com",?"--production"] RUN?apk?del?.gyp
詳情可見:https://github.com/nodejs/docker-node/issues/282
合理規(guī)劃 Docker Layer
構(gòu)建速度優(yōu)化
我們知道,Docker 使用 Layer 概念來創(chuàng)建與組織鏡像,Dockerfile 的每條指令都會產(chǎn)生一個新的文件層,每層都包含執(zhí)行命令前后的狀態(tài)之間鏡像的文件系統(tǒng)更改,文件層越多,鏡像體積就越大。而 Docker 使用緩存方式實現(xiàn)了構(gòu)建速度的提升。若 Dockerfile 中某層的語句及依賴未更改,則該層重建時可以直接復(fù)用本地緩存。
如下所示,如果 log 中出現(xiàn)Using cache
字樣時,說明緩存生效了,該層將不會執(zhí)行運算,直接拿原緩存作為該層的輸出結(jié)果。
Step?2/3?:?npm?install ?--->?Using?cache ?--->?efvbf79sd1eb
通過研究 Docker 緩存算法,發(fā)現(xiàn)在 Docker 構(gòu)建過程中,如果某層無法應(yīng)用緩存,則依賴此步的后續(xù)層都不能從緩存加載。例如下面這個例子:
COPY?.?. RUN?npm?install
此時如果我們更改了倉庫的任意一個文件,此時因為npm install
層的上層依賴變更了,哪怕依賴沒有進行任何變動,緩存也不會被復(fù)用。
因此,若想盡可能的利用上npm install
層緩存,我們可以把 Dockerfile 改成這樣:
COPY?package*.json?. RUN?npm?install COPY?src?.
這樣在僅變更源碼時,node_modules
的依賴緩存仍然能被利用上了。
由此,我們得到了優(yōu)化原則:
最小化處理變更文件,僅變更下一步所需的文件,以盡可能減少構(gòu)建過程中的緩存失效。
對于處理文件變更的 ADD 命令、COPY 命令,盡量延遲執(zhí)行。
構(gòu)建體積優(yōu)化
在保證速度的前提下,體積優(yōu)化也是我們需要去考慮的。這里我們需要考慮的有三點:
Docker 是以層為單位上傳鏡像倉庫的,這樣也能最大化的利用緩存的能力。因此,執(zhí)行結(jié)果很少變化的命令需要抽出來單獨成層,如上面提到的
npm install
的例子里,也用到了這方面的思想。如果鏡像層數(shù)越少,總上傳體積就越小。因此,在命令處于執(zhí)行鏈尾部,即不會對其他層緩存產(chǎn)生影響的情況下,盡量合并命令,從而減少緩存體積。例如,設(shè)置環(huán)境變量和清理無用文件的指令,它們的輸出都是不會被使用的,因此可以將這些命令合并為一行 RUN 命令。
RUN?set?ENV=prod?&&?rm?-rf?./trash
- Docker cache 的下載也是通過層緩存的方式,因此為了減少鏡像的傳輸下載時間,我們最好使用固定的物理機器來進行構(gòu)建。例如在流水線中指定專用宿主機,能是的鏡像的準備時間大大減少。
當然,時間和空間的優(yōu)化從來就沒有兩全其美的辦法,這一點需要我們在設(shè)計 Dockerfile 時,對 Docker Layer 層數(shù)做出權(quán)衡。例如為了時間優(yōu)化,需要我們拆分文件的復(fù)制等操作,而這一點會導(dǎo)致層數(shù)增多,略微增加空間。
這里我的建議是,優(yōu)先保證構(gòu)建時間,其次在不影響時間的情況下,盡可能的縮小構(gòu)建緩存體積。
以 Docker 的思維管理服務(wù)
避免使用進程守護
我們編寫傳統(tǒng)的后臺服務(wù)時,總是會使用例如 pm2、forever 等等進程守護程序,以保證服務(wù)在意外崩潰時能被監(jiān)測到并自動重啟。但這一點在 Docker 下非但沒有益處,還帶來了額外的不穩(wěn)定因素。
首先,Docker 本身就是一個流程管理器,因此,進程守護程序提供的崩潰重啟,日志記錄等等工作 Docker 本身或是基于 Docker 的編排程序(如 kubernetes)就能提供了,無需使用額外應(yīng)用實現(xiàn)。除此之外,由于守護進程的特性,將不可避免的對于以下的情況產(chǎn)生影響:
增加進程守護程序會使得占用的內(nèi)存增多,鏡像體積也會相應(yīng)增大。
由于守護進程一直能正常運行,服務(wù)發(fā)生故障時,Docker 自身的重啟策略將不會生效,Docker 日志里將不會記錄崩潰信息,排障溯源困難。
由于多了個進程的加入,Docker 提供的 CPU、內(nèi)存等監(jiān)控指標將變得不準確。
因此,盡管 pm2 這樣的進程守護程序提供了能夠適配 Docker 的版本:pm2-runtime
,但我仍然不推薦大家使用進程守護程序。
其實這一點其實是源自于我們的固有思想而犯下的錯誤。在服務(wù)上云的過程中,難點其實不僅僅在于寫法與架構(gòu)上的調(diào)整,開發(fā)思路的轉(zhuǎn)變才是最重要的,我們會在上云的過程中更加深刻體會到這一點。
日志的持久化存儲
無論是為了排障還是審計的需要,后臺服務(wù)總是需要日志能力。按照以往的思路,我們將日志分好類后,統(tǒng)一寫入某個目錄下的日志文件即可。但是在 Docker 中,任何本地文件都不是持久化的,會隨著容器的生命周期結(jié)束而銷毀。因此,我們需要將日志的存儲跳出容器之外。
最簡單的做法是利用 Docker Manager Volume
,這個特性能繞過容器自身的文件系統(tǒng),直接將數(shù)據(jù)寫到宿主物理機器上。具體用法如下:
docker?run?-d?-it?--name=app?-v?/app/log:/usr/share/log?app
運行 docker 時,通過-v 參數(shù)為容器綁定 volumes,將宿主機上的 /app/log
目錄(如果沒有會自動創(chuàng)建)掛載到容器的 /usr/share/log
中。這樣服務(wù)在將日志寫入該文件夾時,就能持久化存儲在宿主機上,不隨著 docker 的銷毀而丟失了。
當然,當部署集群變多后,物理宿主機上的日志也會變得難以管理。此時就需要一個服務(wù)編排系統(tǒng)來統(tǒng)一管理了。從單純管理日志的角度出發(fā),我們可以進行網(wǎng)絡(luò)上報,給到云日志服務(wù)(如騰訊云 CLS)托管?;蛘吒纱鄬⑷萜鬟M行批量管理,例如Kubernetes
這樣的容器編排系統(tǒng),這樣日志作為其中的一個模塊自然也能得到妥善保管了。這樣的方法很多,就不多加贅述了。
k8s 服務(wù)控制器的選擇
鏡像優(yōu)化之外,服務(wù)編排以及控制部署的負載形式對性能的影響也很大。這里以最流行的Kubernetes
的兩種控制器(Controller):Deployment
與 StatefulSet
為例,簡要比較一下這兩類組織形式,幫助選擇出最適合服務(wù)的 Controller。
StatefulSet
是 K8S 在 1.5 版本后引入的 Controller,主要特點為:能夠?qū)崿F(xiàn) pod 間的有序部署、更新和銷毀。那么我們的制品是否需要使用 StatefulSet
做 pod 管理呢?官方簡要概括為一句話:
Deployment 用于部署無狀態(tài)服務(wù),StatefulSet 用來部署有狀態(tài)服務(wù)。
這句話十分精確,但不易于理解。那么,什么是無狀態(tài)呢?在我看來,StatefulSet
的特點可以從如下幾個步驟進行理解:
StatefulSet
管理的多個 pod 之間進行部署,更新,刪除操作時能夠按照固定順序依次進行。適用于多服務(wù)之間有依賴的情況,如先啟動數(shù)據(jù)庫服務(wù)再開啟查詢服務(wù)。由于 pod 之間有依賴關(guān)系,因此每個 pod 提供的服務(wù)必定不同,所以
StatefulSet
管理的 pod 之間沒有負載均衡的能力。又因為 pod 提供的服務(wù)不同,所以每個 pod 都會有自己獨立的存儲空間,pod 間不共享。
為了保證 pod 部署更新時順序,必須固定 pod 的名稱,因此不像
Deployment
那樣生成的 pod 名稱后會帶一串隨機數(shù)。Since the pod name is fixed, the
Service
connected toStatefulSet
can directly use the pod name as the access domain name without providingCluster IP
, so theService
connected toStatefulSet
is calledHeadless Service
.
Through this we should understand that if a single service is deployed on k8s, or there is no dependency between multiple services, then Deployment
must be simple and The best choice, automatic scheduling, automatic load balancing. If the start and stop of services must meet a certain sequence, or the data volume mounted on each pod needs to still exist after destruction, then it is recommended to choose StatefulSet
.
Based on the principle of not adding entities unless necessary, it is strongly recommended that all workloads running a single service use Deployment
as the Controller.
Written at the end
After studying it, I almost forgot the initial goal, so I quickly rebuilt Docker to see the optimization results.
#It can be seen that the optimization effect on the mirror volume is still good, reaching about 10 times. Of course, if the project does not require such a high version of node support, the image size can be further reduced by about half.
Afterwards, the image warehouse will compress the stored image files, and the image version packaged with node14 will eventually be compressed to less than 50M.
Of course, in addition to visible volume data, the more important optimization actually lies in the architectural design changes from physical machine-oriented services to containerized cloud services. change.
Containerization is already the visible future. As a developer, you must always remain sensitive to cutting-edge technologies and actively practice them in order to transform technology into productivity and contribute to the evolution of the project.
For more node-related knowledge, please visit: nodejs tutorial!
The above is the detailed content of How to do Docker mirroring of Node service? Detailed explanation of extreme optimization. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

There are two npm-related files in the Node.js installation directory: npm and npm.cmd. The differences are as follows: different extensions: npm is an executable file, and npm.cmd is a command window shortcut. Windows users: npm.cmd can be used from the command prompt, npm can only be run from the command line. Compatibility: npm.cmd is specific to Windows systems, npm is available cross-platform. Usage recommendations: Windows users use npm.cmd, other operating systems use npm.

Detailed explanation and installation guide for PiNetwork nodes This article will introduce the PiNetwork ecosystem in detail - Pi nodes, a key role in the PiNetwork ecosystem, and provide complete steps for installation and configuration. After the launch of the PiNetwork blockchain test network, Pi nodes have become an important part of many pioneers actively participating in the testing, preparing for the upcoming main network release. If you don’t know PiNetwork yet, please refer to what is Picoin? What is the price for listing? Pi usage, mining and security analysis. What is PiNetwork? The PiNetwork project started in 2019 and owns its exclusive cryptocurrency Pi Coin. The project aims to create a one that everyone can participate

Node.js can be used as a backend framework as it offers features such as high performance, scalability, cross-platform support, rich ecosystem, and ease of development.

Yes, Node.js can be used for front-end development, and key advantages include high performance, rich ecosystem, and cross-platform compatibility. Considerations to consider are learning curve, tool support, and small community size.

The following global variables exist in Node.js: Global object: global Core module: process, console, require Runtime environment variables: __dirname, __filename, __line, __column Constants: undefined, null, NaN, Infinity, -Infinity

Yes, Node.js is a backend development language. It is used for back-end development, including handling server-side business logic, managing database connections, and providing APIs.

To connect to a MySQL database, you need to follow these steps: Install the mysql2 driver. Use mysql2.createConnection() to create a connection object that contains the host address, port, username, password, and database name. Use connection.query() to perform queries. Finally use connection.end() to end the connection.

Node.js is suitable for the following project types: Network and server applications Event-driven applications Real-time applications Data-intensive applications Command-line tools and scripts Lightweight microservices
