


10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.
Aug 27, 2024 pm 03:31 PMWith less than 10 lines of code, the mathematical capabilities of large models (GSM8k) can be improved by 20%!
Several independent scholars have proposed improvements to large model sampling, which has attracted the attention of the open source community.
Currently, this method has achieved results on Mistral-7B, and testing on Llama3-70B is also ongoing.
This method is called min-p sampling, which aims to balance the coherence and diversity of the generated text.
Simply put, it allows the model to exert different characteristics in different situations, such as maintaining stable performance on factual issues and being creative in scenarios such as writing.
Currently, this method has achieved results on Mistral-7B, and testing on Llama-70B is about to begin.
In the paper, the author mentioned that this method has been widely used by the open source community.
At the same time, the author also revealed that closed source model manufacturers such as Anthropic and Google have also tested or are testing min-p.
The news has also been confirmed by Google. Logan Kilpatrick, the developer community leader who switched from OpenAI to Google, has replied "On it".
Abram Jackson, a researcher at Microsoft Copilot, said after reading it that this is the first improvement he has seen regarding token sampling in the inference process, and there is still a lot of room for improvement in the future.
It is worth mentioning that the main author of this widely watched study, Minh Nhat Nguyen, has never systematically learned CS at all, but is self-taught.
With the help of an AI security research organization called Apart Research, Minh and other members of the team completed the project.
Dynamic adjustment of the sampling threshold
min-p is a dynamic truncation sampling method, the core of which is to scale the minimum probability threshold according to the maximum probability of the token distribution at each step.
The purpose of this is mainly to balance the coherence and diversity of the generated text, especially under higher temperature conditions.
Specifically, min-p introduces a basic probability threshold p_base, which represents the minimum probability requirement for entering the sampling pool.
When generating tokens at each step, min-p will multiply p_base with the largest token probability p_max in the current probability distribution to obtain a scaled absolute threshold p_scaled.
Only tokens with probability greater than or equal to p_scaled can enter the sampling pool.
When the model's prediction probability for a certain token is very high (that is, p_max is very large), the value of p_scaled will also be very high, causing the sampling pool to be greatly reduced, and the vast majority of low-probability tokens are filtered, leaving only a few with high confidence. The choice of ensures the consistency of the output;
When the model’s prediction probabilities for all tokens are relatively close (p_max is lower), the value of p_scaled will also become lower accordingly, relaxing the requirements for the sampling pool , incorporating more medium-probability tokens gives the model more space to generate more diverse content.
After determining the sampling pool, min-p will scale the token probability distribution according to temperature.
It divides the logarithmic probability of token by a temperature parameter τ, and after normalization, the scaled probability distribution of temperature is obtained.
A τ value greater than 1 will make the probability distribution flatter, increasing the chance of low-probability tokens being selected; when
τ is less than 1, it will make the distribution sharper, strengthening the advantages of high-probability tokens.
Finally, min-p randomly selects the next token from the scaled sampling pool according to the adjusted probability distribution.
Stability and creativity, "I want it all"
What is the effect of the min-p method? The author used Mistral-7B as the basic model for testing. Let's look at the results by scenario.
In the inference task, the author uses the GPQA dataset. When temperature is 1, you can see that min-p has a slight advantage over the past top-p.
As temperature increases, the GPQA score shows a downward trend overall, but it can be observed that min-p decreases significantly slower than top-p.
The downward trend of min-p does not become obvious until temperature reaches 3, when the score of top-p is close to 0.
In other words, compared to top-p, min-p better maintains the required stability in inference tasks.
Mathematical tasks also need to maintain stable performance. Here the author used the GSM8K data set for testing.
The result is that the score corresponding to min-p decreases with temperature faster than in GPQA, but still slower than the top-p method.
The third type of task is creative writing. At this time, the requirements for stability are not so high, but the model needs to be more creative.
This test was done using the AlpacaEval dataset, and the experimental data was obtained from an independent evaluator in the open source community.
Experimental results show that under the settings of temperature=1.5 and min-p=0.1, the performance of min-p is particularly outstanding and can generate creative writing content that is difficult to generate with the top-p method.
Under this parameter, the text obtained by the min-p method achieved a human judgment preference rate of 58.12%, which is much higher than the performance of other methods under similar settings.
Paper address:
https://arxiv.org/abs/2407.01082
GitHub:
https://github.com/menhguin/minp_paper/
Reference link:
https:// x.com/menhguin/status/1826132708508213629
The above is the detailed content of 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

As the world's leading digital asset trading platform, Binance has high reliability and security. 1. The platform has strong comprehensive strength, providing a variety of transaction modes and rich digital asset products, and the market size and liquidity ensure stable operation; 2. The security mechanism is perfect, including technical security (encryption technology, risk control system, dual-factor verification), user asset security fund (SAFU) and identity verification (KYC) processes; 3. Mainland users should pay attention to complying with local regulations, completing account registration and verification, and improving personal operation security awareness; 4. The official entrance is www.binance.com, and the domain name should be recognized when accessing to prevent phishing links. In summary, Binance can be used safely under the premise of compliance, and users' own security measures are the key guarantees.

Binance is a world-renowned digital asset trading platform. Its official APP provides users with a safe and convenient mobile trading experience. Through the Binance APP, you can buy and sell cryptocurrencies anytime, anywhere, manage your digital assets and get the latest market trends.

Google has launched a browser extension called "PasswordCheckup" to help users determine whether their passwords are in a secure state. In the future, this password leakage detection feature will be a default feature of Google Chrome, not just limited to optional extensions. Although the PasswordCheckup extension provided by Google can automatically detect the password security used by users when logging into different websites, interested users can still experience it in advance by downloading the ChromeCanary version. However, it should be noted that this function is turned off by default and users need to turn it on manually. Once the function is enabled, users can know the login they entered when logging in on non-Google sites.

OKE is a world-renowned digital asset service platform, committed to providing users with a safe, stable and efficient digital asset trading experience. With its strong technical strength, comprehensive risk control system and user-friendly operation interface, the platform has gained wide recognition from users around the world.

Binance is a world-renowned digital asset trading platform, providing users with secure, stable and convenient cryptocurrency trading services. Users can buy, sell, manage and market the transactions of hundreds of digital currencies such as Bitcoin and Ethereum anytime, anywhere through their official app.

Apple mobile phone users can access Binance by switching the App Store area or using the official web version. 1. Switch to Apple ID not in mainland China to log in to the App Store to download the app directly; 2. Use a mobile browser to access the Binance official website and trade without downloading the app. Android users should download the application through the official website and avoid unofficial channels to ensure security. After registering an account, you need to enable two-factor authentication (2FA) immediately to improve security.

Dogecoin does not have an official app, and users need to trade through third-party exchanges. This article recommends 6 platforms and provides usage steps. 1. Binance: Large transaction volume and comprehensive functions; 2. Ouyi: Integrated accounts and NFT markets; 3. Huobi: High security; 4. Gate.io: Rich currency types; 5. KuCoin: Fast listing speed; 6. Kraken: Strong compliance. Downloading requires the official channel to complete registration, identity verification, recharge, and transaction of Dogecoin (DOGE) and ensure account security, enable 2FA and set complex passwords.

The Ouyi Exchange app can be downloaded and registered to use through the official link. 1. Click the official download link to complete the app installation; 2. After opening the app, select your mobile phone number or email address to register, set your password and complete verification; 3. It is recommended to enable secondary verification, change your password regularly, avoid leaking information, and be alert to phishing websites to ensure account security. Follow the prompts to start trading.
