亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Home Hardware Tutorial Hardware Review 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.

10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.

Aug 27, 2024 pm 03:31 PM
Google Model Research math Open source author main

With less than 10 lines of code, the mathematical capabilities of large models (GSM8k) can be improved by 20%!

Several independent scholars have proposed improvements to large model sampling, which has attracted the attention of the open source community.

Currently, this method has achieved results on Mistral-7B, and testing on Llama3-70B is also ongoing.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

This method is called min-p sampling, which aims to balance the coherence and diversity of the generated text.

Simply put, it allows the model to exert different characteristics in different situations, such as maintaining stable performance on factual issues and being creative in scenarios such as writing.

Currently, this method has achieved results on Mistral-7B, and testing on Llama-70B is about to begin.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

In the paper, the author mentioned that this method has been widely used by the open source community.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

At the same time, the author also revealed that closed source model manufacturers such as Anthropic and Google have also tested or are testing min-p.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

The news has also been confirmed by Google. Logan Kilpatrick, the developer community leader who switched from OpenAI to Google, has replied "On it".

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

Abram Jackson, a researcher at Microsoft Copilot, said after reading it that this is the first improvement he has seen regarding token sampling in the inference process, and there is still a lot of room for improvement in the future.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

It is worth mentioning that the main author of this widely watched study, Minh Nhat Nguyen, has never systematically learned CS at all, but is self-taught.

With the help of an AI security research organization called Apart Research, Minh and other members of the team completed the project.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

Dynamic adjustment of the sampling threshold

min-p is a dynamic truncation sampling method, the core of which is to scale the minimum probability threshold according to the maximum probability of the token distribution at each step.

The purpose of this is mainly to balance the coherence and diversity of the generated text, especially under higher temperature conditions.

Specifically, min-p introduces a basic probability threshold p_base, which represents the minimum probability requirement for entering the sampling pool.

When generating tokens at each step, min-p will multiply p_base with the largest token probability p_max in the current probability distribution to obtain a scaled absolute threshold p_scaled.

Only tokens with probability greater than or equal to p_scaled can enter the sampling pool.

When the model's prediction probability for a certain token is very high (that is, p_max is very large), the value of p_scaled will also be very high, causing the sampling pool to be greatly reduced, and the vast majority of low-probability tokens are filtered, leaving only a few with high confidence. The choice of ensures the consistency of the output;

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

When the model’s prediction probabilities for all tokens are relatively close (p_max is lower), the value of p_scaled will also become lower accordingly, relaxing the requirements for the sampling pool , incorporating more medium-probability tokens gives the model more space to generate more diverse content.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

After determining the sampling pool, min-p will scale the token probability distribution according to temperature.

It divides the logarithmic probability of token by a temperature parameter τ, and after normalization, the scaled probability distribution of temperature is obtained.

A τ value greater than 1 will make the probability distribution flatter, increasing the chance of low-probability tokens being selected; when

τ is less than 1, it will make the distribution sharper, strengthening the advantages of high-probability tokens.

Finally, min-p randomly selects the next token from the scaled sampling pool according to the adjusted probability distribution.

Stability and creativity, "I want it all"

What is the effect of the min-p method? The author used Mistral-7B as the basic model for testing. Let's look at the results by scenario.

In the inference task, the author uses the GPQA dataset. When temperature is 1, you can see that min-p has a slight advantage over the past top-p.

As temperature increases, the GPQA score shows a downward trend overall, but it can be observed that min-p decreases significantly slower than top-p.

The downward trend of min-p does not become obvious until temperature reaches 3, when the score of top-p is close to 0.

In other words, compared to top-p, min-p better maintains the required stability in inference tasks.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

Mathematical tasks also need to maintain stable performance. Here the author used the GSM8K data set for testing.

The result is that the score corresponding to min-p decreases with temperature faster than in GPQA, but still slower than the top-p method.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

The third type of task is creative writing. At this time, the requirements for stability are not so high, but the model needs to be more creative.

This test was done using the AlpacaEval dataset, and the experimental data was obtained from an independent evaluator in the open source community.

Experimental results show that under the settings of temperature=1.5 and min-p=0.1, the performance of min-p is particularly outstanding and can generate creative writing content that is difficult to generate with the top-p method.

Under this parameter, the text obtained by the min-p method achieved a human judgment preference rate of 58.12%, which is much higher than the performance of other methods under similar settings.

10 行代碼讓大模型數(shù)學(xué)提升 20%,“野路子”研究谷歌也測(cè)上了,主要作者全靠自學(xué)成才

Paper address:

https://arxiv.org/abs/2407.01082

GitHub:

https://github.com/menhguin/minp_paper/

Reference link:

https:// x.com/menhguin/status/1826132708508213629

The above is the detailed content of 10 lines of code improved the mathematics of large models by 20%. The research on 'Yeluzi' was also tested by Google. The main author is all self-taught.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Is Binance reliable? Binance's security analysis in mainland China (address included) Is Binance reliable? Binance's security analysis in mainland China (address included) Jul 11, 2025 pm 08:48 PM

As the world's leading digital asset trading platform, Binance has high reliability and security. 1. The platform has strong comprehensive strength, providing a variety of transaction modes and rich digital asset products, and the market size and liquidity ensure stable operation; 2. The security mechanism is perfect, including technical security (encryption technology, risk control system, dual-factor verification), user asset security fund (SAFU) and identity verification (KYC) processes; 3. Mainland users should pay attention to complying with local regulations, completing account registration and verification, and improving personal operation security awareness; 4. The official entrance is www.binance.com, and the domain name should be recognized when accessing to prevent phishing links. In summary, Binance can be used safely under the premise of compliance, and users' own security measures are the key guarantees.

Binance Exchange Official Website APP Binance Latest Version Download Tutorial Binance Exchange Official Website APP Binance Latest Version Download Tutorial Jul 24, 2025 pm 10:39 PM

Binance is a world-renowned digital asset trading platform. Its official APP provides users with a safe and convenient mobile trading experience. Through the Binance APP, you can buy and sell cryptocurrencies anytime, anywhere, manage your digital assets and get the latest market trends.

Google Chrome 76 integrated leak password detection function Google Chrome 76 integrated leak password detection function Jul 17, 2025 am 09:45 AM

Google has launched a browser extension called "PasswordCheckup" to help users determine whether their passwords are in a secure state. In the future, this password leakage detection feature will be a default feature of Google Chrome, not just limited to optional extensions. Although the PasswordCheckup extension provided by Google can automatically detect the password security used by users when logging into different websites, interested users can still experience it in advance by downloading the ChromeCanary version. However, it should be noted that this function is turned off by default and users need to turn it on manually. Once the function is enabled, users can know the login they entered when logging in on non-Google sites.

OKE official genuine version of Italy and Europe v6.130.0 Android latest version download guide OKE official genuine version of Italy and Europe v6.130.0 Android latest version download guide Jul 11, 2025 pm 07:09 PM

OKE is a world-renowned digital asset service platform, committed to providing users with a safe, stable and efficient digital asset trading experience. With its strong technical strength, comprehensive risk control system and user-friendly operation interface, the platform has gained wide recognition from users around the world.

Binance Exchange official download link Binance latest version app installation tutorial Binance Exchange official download link Binance latest version app installation tutorial Jul 11, 2025 pm 07:45 PM

Binance is a world-renowned digital asset trading platform, providing users with secure, stable and convenient cryptocurrency trading services. Users can buy, sell, manage and market the transactions of hundreds of digital currencies such as Bitcoin and Ethereum anytime, anywhere through their official app.

How to download Binance on Apple mobile phone Android version Binance security portal How to download Binance on Apple mobile phone Android version Binance security portal Jul 17, 2025 pm 04:30 PM

Apple mobile phone users can access Binance by switching the App Store area or using the official web version. 1. Switch to Apple ID not in mainland China to log in to the App Store to download the app directly; 2. Use a mobile browser to access the Binance official website and trade without downloading the app. Android users should download the application through the official website and avoid unofficial channels to ensure security. After registering an account, you need to enable two-factor authentication (2FA) immediately to improve security.

How to download the official Dogecoin App? Dogecoin App Use Guide How to download the official Dogecoin App? Dogecoin App Use Guide Jul 22, 2025 pm 11:36 PM

Dogecoin does not have an official app, and users need to trade through third-party exchanges. This article recommends 6 platforms and provides usage steps. 1. Binance: Large transaction volume and comprehensive functions; 2. Ouyi: Integrated accounts and NFT markets; 3. Huobi: High security; 4. Gate.io: Rich currency types; 5. KuCoin: Fast listing speed; 6. Kraken: Strong compliance. Downloading requires the official channel to complete registration, identity verification, recharge, and transaction of Dogecoin (DOGE) and ensure account security, enable 2FA and set complex passwords.

Ouyi Exchange App official v6.125.1 download address and registration process Ouyi Exchange App official v6.125.1 download address and registration process Jul 11, 2025 pm 09:00 PM

The Ouyi Exchange app can be downloaded and registered to use through the official link. 1. Click the official download link to complete the app installation; 2. After opening the app, select your mobile phone number or email address to register, set your password and complete verification; 3. It is recommended to enable secondary verification, change your password regularly, avoid leaking information, and be alert to phishing websites to ensure account security. Follow the prompts to start trading.

See all articles