


Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source
Apr 25, 2024 pm 05:25 PMRecently, Diffusion Model has made significant progress in the field of image generation, bringing unprecedented development opportunities to image generation and video generation tasks. Despite the impressive results, the multi-step iterative denoising properties inherent in the inference process of diffusion models result in high computational costs. Recently, a series of diffusion model distillation algorithms have emerged to accelerate the inference process of diffusion models. These methods can be roughly divided into two categories: i) trajectory-preserving distillation; ii) trajectory reconstruction distillation. However, these two types of methods are limited by the limited effect ceiling or changes in the output domain.
In order to solve these problems, the ByteDance technical team proposed a trajectory segmentation consistency model called Hyper-SD. Hyper-SD's open source has also been recognized by Huggingface CEO Clem Delangue.
This model is a novel diffusion model distillation framework that combines the advantages of trajectory-preserving distillation and trajectory reconstruction distillation, while compressing the number of denoising steps. while maintaining near-lossless performance. Compared with existing diffusion model acceleration algorithms, this method achieves excellent acceleration results. Verified by extensive experiments and user reviews, Hyper-SD can achieve SOTA-level image generation performance in 1 to 8 steps of generation on both SDXL and SD1.5 architectures.
Project homepage: https://hyper-sd.github.io/
Paper link : https://arxiv.org/abs/2404.13686
Huggingface Link: https://huggingface.co/ByteDance/Hyper-SD
Single-step generated Demo link: https://huggingface.co/spaces/ByteDance/Hyper-SDXL-1Step-T2I
Real-time drawing board Demo link: https://huggingface. co/spaces/ByteDance/Hyper-SD15-Scribble
Method
1. Trajectory segmentation consistency distillation
Consistency Distillation (CD) [24] and Consistency Trajectory Model (CTM) [4] both aim to transform the diffusion model into a consistency model for the entire time step range [0, T] through one-shot distillation. However, these distillation models often fail to achieve optimality due to limitations in model fitting capabilities. Inspired by the soft consistency objective introduced in CTM, we refine the training process by dividing the entire time step range [0, T] into k segments and performing piecewise consistent model distillation step by step.
In the first stage, we set k=8 and use the original diffusion model to initialize and
. The starting time step
is uniformly randomly sampled from
. We then sample the end time step
, where
is calculated as follows:
The training loss is calculated as follows:
Where is calculated by formula 3,
represents the exponential moving average (EMA) of the student model.
Subsequently, we restore the model weights from the previous stage and continue training, gradually reducing k to [4,2,1]. It is worth noting that k=1 corresponds to the standard CTM training scheme. For the distance metric d, we employ a mixture of adversarial loss and mean squared error (MSE) loss. In experiments, we observed that the MSE loss is more effective when the predicted and target values ??are close (e.g., for k=8, 4), while the adversarial loss increases as the difference between the predicted and target values ??increases. becomes more precise (for example, for k=2, 1). Therefore, we dynamically increase the weight of the adversarial loss and decrease the weight of the MSE loss throughout the training phase. In addition, we also integrate a noise perturbation mechanism to enhance training stability. Take the two-stage Trajectory Segment Consensus Distillation (TSCD) process as an example. As shown in the figure below, we perform independent consistency distillation in the
and
time periods in the first stage, and then perform global consistency trajectory distillation based on the consistency distillation results of the previous two periods.
The complete algorithm process is as follows:

2. Human feedback learning
In addition to distillation, we further incorporate feedback learning to improve the performance of the accelerated diffusion model. Specifically, we improve the generation quality of accelerated models by leveraging human aesthetic preferences and feedback from existing visual perception models. For aesthetic feedback, we utilize the LAION aesthetic predictor and the aesthetic preference reward model provided in ImageReward to guide the model to generate more aesthetic images, as follows:
Where is the aesthetic reward model, including the aesthetic predictor of the LAION dataset and ImageReward model, c is the text prompt,
is used together with the ReLU function as the hinge loss. In addition to feedback from aesthetic preferences, we note that existing visual perception models embedding rich prior knowledge about images can also serve as good feedback providers. Empirically, we find that instance segmentation models can guide the model to generate well-structured objects. Specifically, we first diffuse the noise on image
in the latent space to
, after which, similar to ImageReward, we perform iterative denoising until a specific time step
and predict
directly. We then leverage a perceptual instance segmentation model to evaluate the performance of structure generation by examining the difference between instance segmentation annotations for real images and instance segmentation predictions for denoised images, as follows:
Where is the instance segmentation model (such as SOLO). Instance segmentation models can more accurately capture the structural defects of generated images and provide more targeted feedback signals. It is worth noting that in addition to instance segmentation models, other perceptual models are also applicable. These perceptual models can serve as complementary feedback to subjective aesthetics, focusing more on objective generative quality. Therefore, the diffusion model we use to optimize the feedback signal can be defined as:
3. One-step generation of enhanced
due to consistency loss Inherent limitations, one-step generation within the consistency model framework are not ideal. As analyzed in CM, the consistent distillation model shows excellent accuracy in guiding the trajectory endpoint at position
. Therefore, fractional distillation is a suitable and effective method to further improve the one-step generation effect of our TSCD model. Specifically, we advance further generation through an optimized distribution matching distillation (DMD) technique. DMD enhances the model's output by utilizing two different scoring functions: distribution
from the teacher model and
from the fake model. We combine mean squared error (MSE) loss with score-based distillation to improve training stability. In this process, the aforementioned human feedback learning techniques are also integrated to fine-tune our model to effectively generate images with high fidelity.
By integrating these strategies, our method can not only achieve excellent low-step inference results on both SD1.5 and SDXL (and without Classifier-Guidance), but also achieve an ideal global consistency model without targeting Each specific number of steps trains UNet or LoRA to implement a unified low-step reasoning model.
Experiment
On SD1.5 and SDXL and the current existing A quantitative comparison of acceleration algorithms shows that Hyper-SD is significantly better than the current state-of-the-art methods
In addition, Hyper-SD can use one model to implement various Different from low-step inference, the above quantitative indicators also show the effect of our method when using unified model inference.
The visualization of the acceleration effect on SD1.5 and SDXL intuitively demonstrates the superiority of Hyper-SD in accelerating diffusion model inference sex.
A large number of User-Study also shows the superiority of Hyper-SD compared to various existing acceleration algorithms.
The accelerated LoRA trained by Hyper-SD is well compatible with different styles of Vincent figure base models.
At the same time, Hyper-SD’s LoRA can also adapt to the existing ControlNet to achieve high-quality controllable image generation at a low number of steps.
Summary
The paper proposes Hyper-SD, a unified diffusion model acceleration framework that can significantly improve the generation ability of diffusion models in low-step situations. , realizing new SOTA performance based on SDXL and SD15. This method uses trajectory segmentation consistency distillation to enhance the trajectory preservation capability during the distillation process and achieve a generation effect close to the original model. Then, the potential of the model at extremely low step counts is improved by further leveraging human feedback learning and variational fractional distillation, resulting in more optimized and efficient model generation. The paper also open sourced the Lora plug-in for SDXL and SD15 from 1 to 8 steps inference, as well as a dedicated one-step SDXL model, aiming to further promote the development of the generative AI community.
The above is the detailed content of Accelerate diffusion model, generate SOTA-level images in the fastest 1 step, Byte Hyper-SD is open source. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

To view Git commit history, use the gitlog command. 1. The basic usage is gitlog, which can display the submission hash, author, date and submission information; 2. Use gitlog--oneline to obtain a concise view; 3. Filter by author or submission information through --author and --grep; 4. Add -p to view code changes, --stat to view change statistics; 5. Use --graph and --all to view branch history, or use visualization tools such as GitKraken and VSCode.

To delete a Git branch, first make sure it has been merged or no retention is required. Use gitbranch-d to delete the local merged branch. If you need to force delete unmerged branches, use the -D parameter. Remote branch deletion uses the gitpushorigin-deletebranch-name command, and can synchronize other people's local repositories through gitfetch-prune. 1. To delete the local branch, you need to confirm whether it has been merged; 2. To delete the remote branch, you need to use the --delete parameter; 3. After deletion, you should verify whether the branch is successfully removed; 4. Communicate with the team to avoid accidentally deleting shared branches; 5. Clean useless branches regularly to keep the warehouse clean.

The five most valuable stablecoins in 2025 are Tether (USDT), USD Coin (USDC), Dai (DAI), First Digital USD (FDUSD) and TrueUSD (TUSD).

The "Dogcoin" in the currency circle usually refers to newly issued cryptocurrencies with extremely low market value, opaque project information, weak technical foundation or even no practical application scenarios. These tokens often appear with high-risk narratives.

To identify fake altcoins, you need to start from six aspects. 1. Check and verify the background of the materials and project, including white papers, official websites, code open source addresses and team transparency; 2. Observe the online platform and give priority to mainstream exchanges; 3. Beware of high returns and people-pulling modes to avoid fund traps; 4. Analyze the contract code and token mechanism to check whether there are malicious functions; 5. Review community and media operations to identify false popularity; 6. Follow practical anti-fraud suggestions, such as not believing in recommendations or using professional wallets. The above steps can effectively avoid scams and protect asset security.

To add a subtree to a Git repository, first add the remote repository and get its history, then merge it into a subdirectory using the gitmerge and gitread-tree commands. The steps are as follows: 1. Use the gitremoteadd-f command to add a remote repository; 2. Run gitmerge-srecursive-no-commit to get branch content; 3. Use gitread-tree--prefix= to specify the directory to merge the project as a subtree; 4. Submit changes to complete the addition; 5. When updating, gitfetch first and repeat the merging and steps to submit the update. This method keeps the external project history complete and easy to maintain.

What are the key points of the catalog? UselessCoin: Overview and Key Features of USELESS The main features of USELESS UselessCoin (USELESS) Future price outlook: What impacts the price of UselessCoin in 2025 and beyond? Future Price Outlook Core Functions and Importances of UselessCoin (USELESS) How UselessCoin (USELESS) Works and What Its Benefits How UselessCoin Works Major Advantages About USELESSCoin's Companies Partnerships How they work together

This article has selected several top Python "finished" project websites and high-level "blockbuster" learning resource portals for you. Whether you are looking for development inspiration, observing and learning master-level source code, or systematically improving your practical capabilities, these platforms are not to be missed and can help you grow into a Python master quickly.
