亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Home Hardware Tutorial Hardware Review Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Mar 12, 2025 pm 01:03 PM
git ai Model behind modal DeepSeek o1 sft Significant

Researchers from Shanghai Jiaotong University, Shanghai AI Lab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language mockups (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Advantages of Visual-RFT:

Compared with traditional visual instruction fine-tuning (SFT) methods, Visual-RFT has the following significant advantages:

  • Less sample learning ability: only 10 to 1000 pieces of data can be used to achieve effective fine-tuning.
  • Stronger generalization: In scenarios with limited data, performance is better than SFT.

The researchers verified Visual-RFT on multiple visual perception tasks (detection, classification, location, etc.), and the results showed that Visual-RFT achieved significant performance improvements and easily achieved capability transfer even under the settings of open vocabulary and small sample learning.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

The researchers designed corresponding verifiable rewards for different tasks: IoU-based rewards are used for detection and positioning tasks, and classification correctness-based rewards are used for classification tasks.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

In the inference positioning task, Visual-RFT demonstrates strong visual reasoning capabilities, such as accurately identifying waterproof glasses that athletes need to wear in pictures.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Experimental results:

Experiments based on the QWen2-VL 2B/7B model show that Visual-RFT is superior to SFT in open object detection, small sample detection, fine-grained classification and inference positioning tasks. Even if you detect a specific anime character (such as Slime), Visual-RFT can be achieved with just a small amount of data.

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Open source information:

The Visual-RFT project is open source and contains training, evaluation code and data.

Project address: http://ipnx.cn/link/ec56522bc9c2e15be17d11962eeec453

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

The above is the detailed content of Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to use PHP to build social sharing functions PHP sharing interface integration practice How to use PHP to build social sharing functions PHP sharing interface integration practice Jul 25, 2025 pm 08:51 PM

The core method of building social sharing functions in PHP is to dynamically generate sharing links that meet the requirements of each platform. 1. First get the current page or specified URL and article information; 2. Use urlencode to encode the parameters; 3. Splice and generate sharing links according to the protocols of each platform; 4. Display links on the front end for users to click and share; 5. Dynamically generate OG tags on the page to optimize sharing content display; 6. Be sure to escape user input to prevent XSS attacks. This method does not require complex authentication, has low maintenance costs, and is suitable for most content sharing needs.

How to use PHP combined with AI to achieve text error correction PHP syntax detection and optimization How to use PHP combined with AI to achieve text error correction PHP syntax detection and optimization Jul 25, 2025 pm 08:57 PM

To realize text error correction and syntax optimization with AI, you need to follow the following steps: 1. Select a suitable AI model or API, such as Baidu, Tencent API or open source NLP library; 2. Call the API through PHP's curl or Guzzle and process the return results; 3. Display error correction information in the application and allow users to choose whether to adopt it; 4. Use php-l and PHP_CodeSniffer for syntax detection and code optimization; 5. Continuously collect feedback and update the model or rules to improve the effect. When choosing AIAPI, focus on evaluating accuracy, response speed, price and support for PHP. Code optimization should follow PSR specifications, use cache reasonably, avoid circular queries, review code regularly, and use X

PHP calls AI intelligent voice assistant PHP voice interaction system construction PHP calls AI intelligent voice assistant PHP voice interaction system construction Jul 25, 2025 pm 08:45 PM

User voice input is captured and sent to the PHP backend through the MediaRecorder API of the front-end JavaScript; 2. PHP saves the audio as a temporary file and calls STTAPI (such as Google or Baidu voice recognition) to convert it into text; 3. PHP sends the text to an AI service (such as OpenAIGPT) to obtain intelligent reply; 4. PHP then calls TTSAPI (such as Baidu or Google voice synthesis) to convert the reply to a voice file; 5. PHP streams the voice file back to the front-end to play, completing interaction. The entire process is dominated by PHP to ensure seamless connection between all links.

How to make PHP container support automatic construction? Continuously integrated CI configuration method of PHP environment How to make PHP container support automatic construction? Continuously integrated CI configuration method of PHP environment Jul 25, 2025 pm 08:54 PM

To enable PHP containers to support automatic construction, the core lies in configuring the continuous integration (CI) process. 1. Use Dockerfile to define the PHP environment, including basic image, extension installation, dependency management and permission settings; 2. Configure CI/CD tools such as GitLabCI, and define the build, test and deployment stages through the .gitlab-ci.yml file to achieve automatic construction, testing and deployment; 3. Integrate test frameworks such as PHPUnit to ensure that tests are automatically run after code changes; 4. Use automated deployment strategies such as Kubernetes to define deployment configuration through the deployment.yaml file; 5. Optimize Dockerfile and adopt multi-stage construction

The top 10 most authoritative cryptocurrency market websites in the world (the latest version of 2025) The top 10 most authoritative cryptocurrency market websites in the world (the latest version of 2025) Jul 29, 2025 pm 12:48 PM

The top ten authoritative cryptocurrency market and data analysis platforms in 2025 are: 1. CoinMarketCap, providing comprehensive market capitalization rankings and basic market data; 2. CoinGecko, providing multi-dimensional project evaluation with independence and trust scores; 3. TradingView, having the most professional K-line charts and technical analysis tools; 4. Binance market, providing the most direct real-time data as the largest exchange; 5. Ouyi market, highlighting key derivative indicators such as position volume and capital rate; 6. Glassnode, focusing on on-chain data such as active addresses and giant whale trends; 7. Messari, providing institutional-level research reports and strict standardized data; 8. CryptoCompa

Twilio call keeping and recovery: Meeting mode with independent call leg processing Twilio call keeping and recovery: Meeting mode with independent call leg processing Jul 25, 2025 pm 08:42 PM

This article elaborates on two main methods to realize call hold and unhold in Twilio. The preferred option is to leverage Twilio's Conference feature to easily enable call retention and recovery by updating the conference participant resources, and to customize music retention. Another approach is to deal with independent call legs, which requires more complex TwiML logic, passed, and arrived management, but is more cumbersome than meeting mode. The article provides specific code examples and operation steps to help developers efficiently implement Twilio call control.

How to choose a free market website in the currency circle? The most comprehensive review in 2025 How to choose a free market website in the currency circle? The most comprehensive review in 2025 Jul 29, 2025 pm 06:36 PM

The most suitable tools for querying stablecoin markets in 2025 are: 1. Binance, with authoritative data and rich trading pairs, and integrated TradingView charts suitable for technical analysis; 2. Ouyi, with clear interface and strong functional integration, and supports one-stop operation of Web3 accounts and DeFi; 3. CoinMarketCap, with many currencies, and the stablecoin sector can view market value rankings and deans; 4. CoinGecko, with comprehensive data dimensions, provides trust scores and community activity indicators, and has a neutral position; 5. Huobi (HTX), with stable market conditions and friendly operations, suitable for mainstream asset inquiries; 6. Gate.io, with the fastest collection of new coins and niche currencies, and is the first choice for projects to explore potential; 7. Tra

Ethena treasury strategy: the rise of the third empire of stablecoin Ethena treasury strategy: the rise of the third empire of stablecoin Jul 30, 2025 pm 08:12 PM

The real use of battle royale in the dual currency system has not yet happened. Conclusion In August 2023, the MakerDAO ecological lending protocol Spark gave an annualized return of $DAI8%. Then Sun Chi entered in batches, investing a total of 230,000 $stETH, accounting for more than 15% of Spark's deposits, forcing MakerDAO to make an emergency proposal to lower the interest rate to 5%. MakerDAO's original intention was to "subsidize" the usage rate of $DAI, almost becoming Justin Sun's Solo Yield. July 2025, Ethe

See all articles