


Google releases 'Vlogger” model: a single picture generates a 10-second video
Mar 21, 2024 pm 12:21 PMGoogle has released a new video framework:
You only need a picture of your face and a recording of your speech to get a lifelike video of your speech.
The video duration is variable, and the current example seen is up to 10s.
You can see that whether it is mouth shape or facial expression, it is very natural.
If the input image covers the entire upper body, it can also be matched with rich gestures:
After reading it, netizens said:
Yes With it, we no longer need to fix our hair and get dressed for online video conferences in the future.
Well, just take a portrait and record the speech audio (manual dog head)
Use your voice to control the portrait to generate a video
This framework is called VLOGGER.
It is mainly based on the diffusion model and contains two parts:
One is a random human-to-3d-motion diffusion model.
The other is a new diffusion architecture for enhancing text-to-image models.
Among them, the former is responsible for using the audio waveform as input to generate the character's body control actions, including eyes, expressions and gestures, overall body posture, etc.
The latter is a temporal dimension image-to-image model that is used to extend the large-scale image diffusion model and use the just predicted actions to generate corresponding frames.
In order to make the results conform to a specific character image, VLOGGER also takes the pose map of the parameter image as input.
The training of VLOGGER is completed on a very large data set (named MENTOR).
How big is it? It is 2,200 hours long and contains a total of 800,000 character videos.
Among them, the video duration of the test set is also 120 hours long, with a total of 4,000 characters.
According to Google, the most outstanding performance of VLOGGER is its diversity:
As shown in the figure below, the darker (red) the color of the final pixel image, the richer the actions.
Compared with previous similar methods in the industry, the biggest advantage of VLOGGER is that it does not need to train everyone, does not rely on face detection and cropping, and The generated video is complete (including both face and lips, body movements) and more.
Specifically, as shown in the following table:
The Face Reenactment method cannot use audio and text to control such video generation.
Audio-to-motion can generate audio by encoding audio into 3D facial movements, but the effect it generates is not realistic enough.
Lip sync can handle videos of different themes, but it can only simulate mouth movements.
In comparison, the latter two methods, SadTaker and Styletalk, perform closest to Google VLOGGER, but they are also defeated by the inability to control the body and further edit the video.
Speaking of video editing, as shown in the figure below, one of the applications of the VLOGGER model is this. It can make the character shut up, close eyes, only close the left eye, or open the whole eye with one click:
Another application is video translation:
For example, change the English speech of the original video into Spanish with the same mouth shape.
Netizens complained
Finally, according to the "old rule", Google did not release the model, and all that can be seen now are more effects and papers.
Well, there are a lot of complaints:
The image quality of the model, the mouth shape is not right, it still looks very robotic, etc.
Therefore, some people do not hesitate to leave negative reviews:
Is this the level of Google?
I'm a bit sorry for the name "VLOGGER".
——Compared with OpenAI’s Sora, the netizen’s statement is indeed not unreasonable. .
What do you think?
More effects:
https://enriccorona.github.io/vlogger/
Full paper:
https://enriccorona.github .io/vlogger/paper.pdf
The above is the detailed content of Google releases 'Vlogger” model: a single picture generates a 10-second video. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Binance is a world-renowned digital asset trading platform. Its official APP provides users with a safe and convenient mobile trading experience. Through the Binance APP, you can buy and sell cryptocurrencies anytime, anywhere, manage your digital assets and get the latest market trends.

Dogecoin does not have an official app, and users need to trade through third-party exchanges. This article recommends 6 platforms and provides usage steps. 1. Binance: Large transaction volume and comprehensive functions; 2. Ouyi: Integrated accounts and NFT markets; 3. Huobi: High security; 4. Gate.io: Rich currency types; 5. KuCoin: Fast listing speed; 6. Kraken: Strong compliance. Downloading requires the official channel to complete registration, identity verification, recharge, and transaction of Dogecoin (DOGE) and ensure account security, enable 2FA and set complex passwords.

Huobi is a world-renowned digital asset service platform, providing users with a safe and reliable trading experience. Huobi App integrates various functions such as market viewing, trading, asset management, etc., making it convenient for users to operate anytime, anywhere.

Steps for beginners to start trading altcoins: 1. Understand the altcoins; 2. Recommend using the wallets that are provided by the centralized exchange in the early stage; 3. Choose a regular exchange (OKX, Binance and Huobi).

What is Chainbase(C) in directory? How Chainbase works What is Chainbase token (C)? How to get free C token airdrop? Chainbase (C) Price Forecasts for 2025, 2026 and 2030 Conclusions The cryptocurrency world is generating more data than ever before. Facing hundreds of blockchains, thousands of tokens, and emerging new decentralized applications, navigating this complex pattern is extremely challenging for investors, developers and analysts. Reliable, real-time blockchain data is important for making informed decisions and driving the next generation of crypto products

PremierePro error "compilingmovie" is usually caused by mismatch in project settings, too many special effects plugins, cache or driver problems, export paths or improper format. 1. Make sure that the sequence settings are consistent with the material attributes and select appropriate export presets; 2. Turn off complex effects and third-party plug-ins, and try to use the "render and replace" function; 3. Clean up the cache and update PremierePro and graphics card drivers; 4. Change the export path and format, and turn off hardware accelerated encoding to troubleshoot problems. If it still cannot be solved, you can gradually troubleshoot abnormal fragments or effects by creating a new project.

There are four main ways to make money from digital coins: earn the difference (including spot, leverage, contract trading); deposit the coins into the platform to obtain interest (regular, current, DeFi pledge); invest computing power or hold the coins to verify transactions to get rewards (PoW/PoS); participate in project airdrop or new currency issuance (airdrop, IEO/IDO). 2. For operations, you need to choose a safe and compliant exchange such as Binance, OKX, and Huobi. 3. Spot trading operations are divided into four steps: recharge funds (check the fiat currency or currency address) → select trading pairs (such as BTC/USDT) → place an order (market order quick transaction, limit order price control, stop-profit and stop-loss order risk control) → withdrawal (confirm that the network and address are correct). 4. Risk management must be done well: only use spare money to invest and allocate diversified allocation;

First, download the Binance App through the official channel to ensure security. 1. Android users should visit the official website, confirm that the URL is correct, download the Android installation package, and enable the "Allow to install applications from unknown sources" permission in the browser. It is recommended to close the permission after completing the installation. 2. Apple users need to use a non-mainland Apple ID (such as the United States or Hong Kong), log in to the ID in the App Store and search and download the official "Binance" application. After installation, you can switch back to the original Apple ID. 3. Be sure to enable two-factor verification (2FA) after downloading and keep the application updated to ensure account security. The entire process must be operated through official channels to avoid clicking unknown links.
