性xxxxfreexxxxx欧美牲交,久久国产成人精品av

Generate a custom Q/A data set

OpenAI嵌入模型

開源嵌入模型

首頁

科技週邊

人工智慧

選擇最適合資料的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 26, 2024 pm 06:10 PM

人工智慧 openai

OpenAI recently announced the launch of their latest generation embedding model embedding v3, which they claim is the most performant embedding model with higher multi-language performance. This batch of models is divided into two types: the smaller text-embeddings-3-small and the more powerful and larger text-embeddings-3-large.

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

Little information is disclosed about how these models are designed and trained, and the models are only accessible through a paid API. So there have been many open source embedding models. But how do these open source models compare with the OpenAI closed source model?

This article will empirically compare the performance of these new models with open source models. We plan to build a data retrieval workflow where the key task is to find the most relevant documents from the corpus based on the user's query.

Our corpus is the European Artificial Intelligence Act, which is currently in the validation phase. This corpus is the world’s first legal framework involving artificial intelligence and is unique in that it is available in 24 languages. This allows us to compare the accuracy of data retrieval in different language backgrounds, providing important support for the cross-cultural application of artificial intelligence.

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

We plan to create a custom synthetic question/answer dataset using a multilingual text corpus and use this dataset to compare OpenAI with the state-of-the-art The accuracy of open source embedding models. We will share the full code as our approach can be easily adapted to other data corpora.

Generate a custom Q/A data set

First, we can start by creating a custom question and answer (Q/A) data set, The advantage of doing this is to ensure that the data set will not become a bias factor in model training, avoiding situations that may occur in benchmark references such as MTEB. Furthermore, by generating custom datasets, we can tailor the evaluation process to a specific data corpus, which can be important for scenarios like Retrieval Augmentation Applications (RAG).

We will follow the simple process suggested in the Llama Index documentation. First, the corpus is divided into chunks. Next, for each block, a large language model (LLM) is used to generate a series of synthetic questions to ensure that the answer is in the corresponding block.

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

Implementing this strategy using an LLM data frame like Llama Index is very simple, as shown in the code below.

from llama_index.readers.web import SimpleWebPageReader from llama_index.core.node_parser import SentenceSplitter  language = "EN" url_doc = "https://eur-lex.europa.eu/legal-content/"+language+"/TXT/HTML/?uri=CELEX:52021PC0206"  documents = SimpleWebPageReader(html_to_text=True).load_data([url_doc])  parser = SentenceSplitter(chunk_size=1000) nodes = parser.get_nodes_from_documents(documents, show_progress=True)

The corpus is the English version of the EU Artificial Intelligence Act, obtained directly from the web using this official URL. This article uses the draft version from April 2021, as the final version is not yet available in all European languages. So the version we chose can replace language in the URL with any of the other 23 official EU languages, retrieving text in different languages ??(BG for Bulgarian, ES for Spanish, CS for Czech, etc. ).

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

Use a SentenceSplitter object to split the document into chunks of every 1000 tokens. For English, this generates about 100 chunks. Each block is then provided as context to the following prompt (the default prompt suggested in the Llama Index library):

prompts={} prompts["EN"] = """\ Context information is below.  --------------------- {context_str} ---------------------  Given the context information and not prior knowledge, generate only questions based on the below query.  You are a Teacher/ Professor. Your task is to setup {num_questions_per_chunk} questions for an upcoming quiz/examination. The questions should be diverse in nature across the document. Restrict the questions to the context information provided." """

This prompt can generate questions about the documentation block , the number of questions to generate for each data chunk is passed as parameter "num_questions_per_chunk", which we set to 2. Questions can then be generated by calling generate_qa_embedding_pairs in the Llama Index library:

from llama_index.llms import OpenAI from llama_index.legacy.finetuning import generate_qa_embedding_pairs  qa_dataset = generate_qa_embedding_pairs(llm=OpenAI(model="gpt-3.5-turbo-0125",additional_kwargs={'seed':42}),nodes=nodes,qa_generate_prompt_tmpl = prompts[language],num_questions_per_chunk=2 )

我們依靠OpenAI的GPT-3.5-turbo-0125來完成這項任務(wù)，結(jié)果對象' qa_dataset '包含問題和答案(塊)對。作為生成問題的示例，以下是前兩個問題的結(jié)果(其中“答案”是文本的第一部分):

What are the main objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) according to the explanatory memorandum?

How does the proposal for a Regulation on artificial intelligence aim to address the risks associated with the use of AI while promoting the uptake of AI in the European Union, as outlined in the context information?

OpenAI嵌入模型

評估函數(shù)也是遵循Llama Index文檔：首先所有答案(文檔塊)的嵌入都存儲在VectorStoreIndex中，以便有效檢索。然后評估函數(shù)循環(huán)遍歷所有查詢，檢索前k個最相似的文檔，并根據(jù)MRR (Mean Reciprocal Rank)評估檢索的準(zhǔn)確性，代碼如下：

def evaluate(dataset, embed_model, insert_batch_size=1000, top_k=5):# Get corpus, queries, and relevant documents from the qa_dataset objectcorpus = dataset.corpusqueries = dataset.queriesrelevant_docs = dataset.relevant_docs # Create TextNode objects for each document in the corpus and create a VectorStoreIndex to efficiently store and retrieve embeddingsnodes = [TextNode(id_=id_, text=text) for id_, text in corpus.items()]index = VectorStoreIndex(nodes, embed_model=embed_model, insert_batch_size=insert_batch_size)retriever = index.as_retriever(similarity_top_k=top_k) # Prepare to collect evaluation resultseval_results = [] # Iterate over each query in the dataset to evaluate retrieval performancefor query_id, query in tqdm(queries.items()):# Retrieve the top_k most similar documents for the current query and extract the IDs of the retrieved documentsretrieved_nodes = retriever.retrieve(query)retrieved_ids = [node.node.node_id for node in retrieved_nodes] # Check if the expected document was among the retrieved documentsexpected_id = relevant_docs[query_id][0]is_hit = expected_id in retrieved_ids # assume 1 relevant doc per query # Calculate the Mean Reciprocal Rank (MRR) and append to resultsif is_hit:rank = retrieved_ids.index(expected_id) + 1mrr = 1 / rankelse:mrr = 0eval_results.append(mrr) # Return the average MRR across all queries as the final evaluation metricreturn np.average(eval_results)

嵌入模型通過' embed_model '參數(shù)傳遞給評估函數(shù)，對于OpenAI模型，該參數(shù)是一個用模型名稱和模型維度初始化的OpenAIEmbedding對象。

from llama_index.embeddings.openai import OpenAIEmbedding  embed_model = OpenAIEmbedding(model=model_spec['model_name'],dimensinotallow=model_spec['dimensions'])

dimensions參數(shù)可以縮短嵌入(即從序列的末尾刪除一些數(shù)字)，而不會失去嵌入的概念表示屬性。OpenAI在他們的公告中建議，在MTEB基準(zhǔn)測試中，嵌入可以縮短到256大小，同時仍然優(yōu)于未縮短的text-embedding-ada-002嵌入（大小為1536）。

我們在四種不同的嵌入模型上運行評估函數(shù):

兩個版本的text-embedding-3-large:一個具有最低可能維度(256)，另一個具有最高可能維度(3072)。它們被稱為“OAI-large-256”和“OAI-large-3072”。

OAI-small:text-embedding-3-small，維數(shù)為1536。

OAI-ada-002:傳統(tǒng)的文本嵌入text-embedding-ada-002，維度為1536。

每個模型在四種不同的語言上進行評估:英語(EN)，法語(FR)，捷克語(CS)和匈牙利語(HU)，分別涵蓋日耳曼語，羅曼語，斯拉夫語和烏拉爾語的例子。

embeddings_model_spec = { }  embeddings_model_spec['OAI-Large-256']={'model_name':'text-embedding-3-large','dimensions':256} embeddings_model_spec['OAI-Large-3072']={'model_name':'text-embedding-3-large','dimensions':3072} embeddings_model_spec['OAI-Small']={'model_name':'text-embedding-3-small','dimensions':1536} embeddings_model_spec['OAI-ada-002']={'model_name':'text-embedding-ada-002','dimensions':None}  results = []  languages = ["EN", "FR", "CS", "HU"]  # Loop through all languages for language in languages: # Load datasetfile_name=language+"_dataset.json"qa_dataset = EmbeddingQAFinetuneDataset.from_json(file_name) # Loop through all modelsfor model_name, model_spec in embeddings_model_spec.items(): # Get modelembed_model = OpenAIEmbedding(model=model_spec['model_name'],dimensinotallow=model_spec['dimensions']) # Assess embedding score (in terms of MRR)score = evaluate(qa_dataset, embed_model) results.append([language, model_name, score])  df_results = pd.DataFrame(results, columns = ["Language" ,"Embedding model", "MRR"])

MRR精度如下:

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

嵌入尺寸越大，性能越好。

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

開源嵌入模型

圍繞嵌入的開源研究也是非?；钴S的，Hugging Face 的 MTEB leaderboard會經(jīng)常發(fā)布最新的嵌入模型。

為了在本文中進行比較，我們選擇了一組最近發(fā)表的四個嵌入模型(2024)。選擇的標(biāo)準(zhǔn)是他們在MTEB排行榜上的平均得分和他們處理多語言數(shù)據(jù)的能力。所選模型的主要特性摘要如下。

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

e5-mistral-7b-instruct:微軟的這個E5嵌入模型是從Mistral-7B-v0.1初始化的，并在多語言混合數(shù)據(jù)集上進行微調(diào)。模型在MTEB排行榜上表現(xiàn)最好，但也是迄今為止最大的(14GB)。

multilingual-e5-large-instruct(ML-E5-large):微軟的另一個E5模型，可以更好地處理多語言數(shù)據(jù)。它從xlm-roberta-large初始化，并在多語言數(shù)據(jù)集的混合上進行訓(xùn)練。它比E5-Mistral小得多(10倍)，上下文大小也小得多(514)。

BGE-M3:該模型由北京人工智能研究院設(shè)計，是他們最先進的多語言數(shù)據(jù)嵌入模型，支持100多種工作語言。截至2024年2月22日，它還沒有進入MTEB排行榜。

nomic-embed-text-v1 (Nomic- embed):該模型由Nomic設(shè)計，其性能優(yōu)于OpenAI Ada-002和text-embedding-3-small，而且大小僅為0.55GB。該模型是第一個完全可復(fù)制和可審計的(開放數(shù)據(jù)和開源訓(xùn)練代碼)的模型。

用于評估這些開源模型的代碼類似于用于OpenAI模型的代碼。主要的變化在于模型參數(shù)：

embeddings_model_spec = { }  embeddings_model_spec['E5-mistral-7b']={'model_name':'intfloat/e5-mistral-7b-instruct','max_length':32768, 'pooling_type':'last_token', 'normalize': True, 'batch_size':1, 'kwargs': {'load_in_4bit':True, 'bnb_4bit_compute_dtype':torch.float16}} embeddings_model_spec['ML-E5-large']={'model_name':'intfloat/multilingual-e5-large','max_length':512, 'pooling_type':'mean', 'normalize': True, 'batch_size':1, 'kwargs': {'device_map': 'cuda', 'torch_dtype':torch.float16}} embeddings_model_spec['BGE-M3']={'model_name':'BAAI/bge-m3','max_length':8192, 'pooling_type':'cls', 'normalize': True, 'batch_size':1, 'kwargs': {'device_map': 'cuda', 'torch_dtype':torch.float16}} embeddings_model_spec['Nomic-Embed']={'model_name':'nomic-ai/nomic-embed-text-v1','max_length':8192, 'pooling_type':'mean', 'normalize': True, 'batch_size':1, 'kwargs': {'device_map': 'cuda', 'trust_remote_code' : True}}  results = []  languages = ["EN", "FR", "CS", "HU"]  # Loop through all models for model_name, model_spec in embeddings_model_spec.items(): print("Processing model : "+str(model_spec)) # Get modeltokenizer = AutoTokenizer.from_pretrained(model_spec['model_name'])embed_model = AutoModel.from_pretrained(model_spec['model_name'], **model_spec['kwargs']) if model_name=="Nomic-Embed":embed_model.to('cuda') # Loop through all languagesfor language in languages: # Load datasetfile_name=language+"_dataset.json"qa_dataset = EmbeddingQAFinetuneDataset.from_json(file_name) start_time_assessment=time.time() # Assess embedding score (in terms of hit rate at k=5)score = evaluate(qa_dataset, tokenizer, embed_model, model_spec['normalize'], model_spec['max_length'], model_spec['pooling_type']) # Get duration of score assessmentduration_assessment = time.time()-start_time_assessment results.append([language, model_name, score, duration_assessment])  df_results = pd.DataFrame(results, columns = ["Language" ,"Embedding model", "MRR", "Duration"])

結(jié)果如下：

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

BGE-M3的表現(xiàn)最好，其次是ML-E5-Large、E5-mistral- 7b和Nomic-Embed。 BGE-M3模型尚未在MTEB排行榜上進行基準(zhǔn)測試，我們的結(jié)果表明它可能比其他模型排名更高。雖然BGE-M3針對多語言資料進行了最佳化，但它在英語方面的表現(xiàn)也比其他模型更好。

因為式開源模型所以一般都需要本地運行，所以我們也刻意記錄了每個嵌入模型的處理時間。

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

E5-mistral-7b比其他模型大10倍以上，所以最慢是很正常的

總結(jié)

我們把所有的結(jié)果做一個匯總

選擇最適合數(shù)據(jù)的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

採用開源模型獲得了最好的性能，BGE-M3模型表現(xiàn)最佳。模型具有與OpenAI模型相同的上下文長度(8K)，大小為2.2GB。

OpenAI的large(3072)、small 和ada模型的表現(xiàn)非常相似?？s小large的嵌入尺寸(256)會導(dǎo)致效能下降，並且沒有像OpenAI說的那樣比ada更好。

幾乎所有型號(ML-E5-large除外)在英語上都表現(xiàn)最好。在捷克語和匈牙利語等語言中，表現(xiàn)有顯著差異，可能是因為訓(xùn)練的資料比較少。

我們應(yīng)該付費訂閱OpenAI，還是託管一個開源嵌入模型?

OpenAI最近的價格調(diào)整使得他們的API變得更加實惠，現(xiàn)在每百萬代幣的成本為0.13美元。如果每個月處理一百萬個查詢(假設(shè)每個查詢涉及大約1K令牌)，沒那麼成本約為130美元。所以可以根據(jù)實際需要計算來選擇是否要託管開源嵌入模型。

當(dāng)然成本效益並不是唯一的考慮因素。也可能需要考慮延遲、隱私和對資料處理工作流程的控制等其他因素。開源模型提供了完全資料控制的優(yōu)勢，增強了隱私性和客製化。

說到延遲，OpenAI的API也存在延遲問題，有時會導(dǎo)致回應(yīng)時間延長，所有有時候OpenAI的API不一定是最快的選擇。

總之，在開源模型和像OpenAI這樣的專有解決方案之間做出選擇並不是一個簡單的答案。開源嵌入提供了一個非常好的選項，它將效能與對資料的更好控制結(jié)合在一起。而OpenAI的產(chǎn)品可能仍然會吸引那些優(yōu)先考慮便利性的人，特別是如果隱私問題是次要的。

本文程式碼：https://github.com/Yannael/multilingual-embeddings

以上是選擇最適合資料的嵌入模型：OpenAI 和開源多語言嵌入的對比測試的詳細內(nèi)容。更多資訊請關(guān)注PHP中文網(wǎng)其他相關(guān)文章！

本網(wǎng)站聲明

本文內(nèi)容由網(wǎng)友自願投稿，版權(quán)歸原作者所有。本站不承擔(dān)相應(yīng)的法律責(zé)任。如發(fā)現(xiàn)涉嫌抄襲或侵權(quán)的內(nèi)容，請聯(lián)絡(luò)admin@php.cn

熱AI工具

Undress AI Tool

免費脫衣圖片

Undresser.AI Undress

人工智慧驅(qū)動的應(yīng)用程序，用於創(chuàng)建逼真的裸體照片

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Clothoff.io

AI脫衣器

Video Face Swap

使用我們完全免費的人工智慧換臉工具，輕鬆在任何影片中換臉！

熱工具

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

中文版，非常好用

禪工作室 13.0.1

強大的PHP整合開發(fā)環(huán)境

Dreamweaver CS6

視覺化網(wǎng)頁開發(fā)工具

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

熱門話題

Laravel 教程

1597

PHP教程

1488

Related knowledge

位元組跳動剪映推出 SVIP 超級會員：連續(xù)包年 499 元，提供多種 AI 功能 Jun 28, 2024 am 03:51 AM

本站6月27日訊息，剪映是由位元組跳動旗下臉萌科技開發(fā)的一款影片剪輯軟體，依託於抖音平臺且基本面向該平臺用戶製作短影片內(nèi)容，並相容於iOS、安卓、Windows 、MacOS等作業(yè)系統(tǒng)。剪映官方宣布會員體系升級，推出全新SVIP，包含多種AI黑科技，例如智慧翻譯、智慧劃重點、智慧包裝、數(shù)位人合成等。價格方面，剪映SVIP月費79元，年費599元（本站註：折合每月49.9元），連續(xù)包月則為59元每月，連續(xù)包年為499元每年（折合每月41.6元）。此外，剪映官方也表示，為提升用戶體驗，向已訂閱了原版VIP

使用Rag和Sem-Rag提供上下文增強AI編碼助手 Jun 10, 2024 am 11:08 AM

透過將檢索增強生成和語意記憶納入AI編碼助手，提升開發(fā)人員的生產(chǎn)力、效率和準(zhǔn)確性。譯自EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG，作者JanakiramMSV。雖然基本AI程式設(shè)計助理自然有幫助，但由於依賴對軟體語言和編寫軟體最常見模式的整體理解，因此常常無法提供最相關(guān)和正確的程式碼建議。這些編碼助手產(chǎn)生的代碼適合解決他們負責(zé)解決的問題，但通常不符合各個團隊的編碼標(biāo)準(zhǔn)、慣例和風(fēng)格。這通常會導(dǎo)致需要修改或完善其建議，以便將程式碼接受到應(yīng)

微調(diào)真的能讓LLM學(xué)到新東西嗎:引入新知識可能讓模型產(chǎn)生更多的幻覺 Jun 11, 2024 pm 03:57 PM

大型語言模型（LLM）是在龐大的文字資料庫上訓(xùn)練的，在那裡它們獲得了大量的實際知識。這些知識嵌入到它們的參數(shù)中，然後可以在需要時使用。這些模型的知識在訓(xùn)練結(jié)束時被「具體化」。在預(yù)訓(xùn)練結(jié)束時，模型實際上停止學(xué)習(xí)。對模型進行對齊或進行指令調(diào)優(yōu)，讓模型學(xué)習(xí)如何充分利用這些知識，以及如何更自然地回應(yīng)使用者的問題。但是有時模型知識是不夠的，儘管模型可以透過RAG存取外部內(nèi)容，但透過微調(diào)使用模型適應(yīng)新的領(lǐng)域被認為是有益的。這種微調(diào)是使用人工標(biāo)註者或其他llm創(chuàng)建的輸入進行的，模型會遇到額外的實際知識並將其整合

為大模型提供全新科學(xué)複雜問答基準(zhǔn)與評估體系，UNSW、阿貢、芝加哥大學(xué)等多家機構(gòu)共同推出SciQAG框架 Jul 25, 2024 am 06:42 AM

編輯|ScienceAI問答（QA）資料集在推動自然語言處理（NLP）研究中發(fā)揮著至關(guān)重要的作用。高品質(zhì)QA資料集不僅可以用於微調(diào)模型，也可以有效評估大語言模型（LLM）的能力，尤其是針對科學(xué)知識的理解和推理能力。儘管目前已有許多科學(xué)QA數(shù)據(jù)集，涵蓋了醫(yī)學(xué)、化學(xué)、生物等領(lǐng)域，但這些數(shù)據(jù)集仍有一些不足之處。其一，資料形式較為單一，大多數(shù)為多項選擇題（multiple-choicequestions），它們易於進行評估，但限制了模型的答案選擇範(fàn)圍，無法充分測試模型的科學(xué)問題解答能力。相比之下，開放式問答

OpenAI超級對齊團隊遺作：兩個大模型博弈一番，輸出更好懂了 Jul 19, 2024 am 01:29 AM

如果AI模型給的答案一點也看不懂，你敢用嗎？隨著機器學(xué)習(xí)系統(tǒng)在更重要的領(lǐng)域中得到應(yīng)用，證明為什麼我們可以信任它們的輸出，並明確何時不應(yīng)信任它們，變得越來越重要。獲得對複雜系統(tǒng)輸出結(jié)果信任的一個可行方法是，要求系統(tǒng)對其輸出產(chǎn)生一種解釋，這種解釋對人類或另一個受信任的系統(tǒng)來說是可讀的，即可以完全理解以至於任何可能的錯誤都可以被發(fā)現(xiàn)。例如，為了建立對司法系統(tǒng)的信任，我們要求法院提供清晰易讀的書面意見，解釋並支持其決策。對於大型語言模型來說，我們也可以採用類似的方法。不過，在採用這種方法時，確保語言模型生

VSCode 前端開發(fā)新紀元：12款 AI 代碼助理推薦 Jun 11, 2024 pm 07:47 PM

在前端開發(fā)的世界裡，VSCode以其強大的功能和豐富的插件生態(tài)，成為了無數(shù)開發(fā)者的首選工具。而近年來，隨著人工智慧技術(shù)的快速發(fā)展，VSCode上的AI代碼助理也如雨後春筍般湧現(xiàn)，大大提升了開發(fā)者的編碼效率。 VSCode上的AI代碼助手，如雨後春筍般湧現(xiàn)，大大提升了開發(fā)者的編碼效率。它利用人工智慧技術(shù)，能夠聰明地分析程式碼，提供精準(zhǔn)的程式碼補全、自動糾錯、語法檢查等功能，大大減少了開發(fā)者在編碼過程中的錯誤和繁瑣的手工工作。有今天，就為大家推薦12款VSCode前端開發(fā)AI程式碼助手，幫助你在程式設(shè)計之路

SK 海力士 8 月 6 日將展示 AI 相關(guān)新品：12 層 HBM3E、321-high NAND 等 Aug 01, 2024 pm 09:40 PM

本站8月1日消息，SK海力士今天（8月1日）發(fā)布博文，宣布將出席8月6日至8日，在美國加州聖克拉拉舉行的全球半導(dǎo)體記憶體峰會FMS2024，展示諸多新一代產(chǎn)品。未來記憶體和儲存高峰會（FutureMemoryandStorage）簡介前身是主要面向NAND供應(yīng)商的快閃記憶體高峰會（FlashMemorySummit），在人工智慧技術(shù)日益受到關(guān)注的背景下，今年重新命名為未來記憶體和儲存高峰會（FutureMemoryandStorage），以邀請DRAM和儲存供應(yīng)商等更多參與者。新產(chǎn)品SK海力士去年在

SOTA性能，廈大多模態(tài)蛋白質(zhì)-配體親和力預(yù)測AI方法，首次結(jié)合分子表面訊息 Jul 17, 2024 pm 06:37 PM

編輯|KX在藥物研發(fā)領(lǐng)域，準(zhǔn)確有效地預(yù)測蛋白質(zhì)與配體的結(jié)合親和力對於藥物篩選和優(yōu)化至關(guān)重要。然而，目前的研究並沒有考慮到分子表面訊息在蛋白質(zhì)-配體相互作用中的重要作用?；洞?，來自廈門大學(xué)的研究人員提出了一種新穎的多模態(tài)特徵提取（MFE）框架，該框架首次結(jié)合了蛋白質(zhì)表面、3D結(jié)構(gòu)和序列的信息，並使用交叉注意機制進行不同模態(tài)之間的特徵對齊。實驗結(jié)果表明，該方法在預(yù)測蛋白質(zhì)-配體結(jié)合親和力方面取得了最先進的性能。此外，消融研究證明了該框架內(nèi)蛋白質(zhì)表面資訊和多模態(tài)特徵對齊的有效性和必要性。相關(guān)研究以「S

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

選擇最適合資料的嵌入模型：OpenAI 和開源多語言嵌入的對比測試

Generate a custom Q/A data set

OpenAI嵌入模型

開源嵌入模型

總結(jié)

熱AI工具

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

熱門文章

熱工具

記事本++7.3.1

SublimeText3漢化版

禪工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

熱門話題