日韩亚洲欧美中文高清,性夜影院爽黄a爽在线看,国产成人精品亚洲日本在线观看

ホームページ

バックエンド開発

Python チュートリアル

【Python】Bilibiliビデオコメントとバレットチャットを処理?分析するスクリプト

Barbara Streisand

Jan 05, 2025 pm 07:54 PM

[Python] A Script for Processing and Analysing Bilibili Video Comments and Bullet Chats

免責(zé)事項: 個人的な學(xué)習(xí)および研究目的のみ。それ以外の用途での使用は固く禁止されています。

導(dǎo)入

このスクリプトは人文科學(xué)の學(xué)術(shù)目的、特にネットワークプラットフォームの談話分析の研究のために開発されました。 Bilibili の弾丸チャットとコメントを包括的に調(diào)査できます。サブカルチャーと社會問題に関連する膨大な內(nèi)容に焦點が當てられており (レビューされた資料に基づく)、徹底的な調(diào)査、分析、補足、要約が必要です。

內(nèi)容が膨大であるため、結(jié)果はリンクに表示されます。

サブカルチャーの観點からのコメントと弾丸チャットの研究:
https://nbviewer.org/github/Excalibra/scripts/blob/main/d-ipynb/サブカルチャーの視點からのレビューとブレットスクリーンリサーチ.ipynb

計畫では、「サブカルチャー」セクションと「社會問題」セクションの調(diào)査を完了してから公開する予定でした。ただし、この分野の研究者や學(xué)生のニーズを考慮して、現(xiàn)在は共有されています。

特徴と原理

スクリプトの機能:

ビデオのタイトル、著者、公開日、再生回數(shù)、お気に入り、共有、累積箇條書きチャット、コメント數(shù)、ビデオの説明、カテゴリ、ビデオのリンク、カバー畫像のリンクなどのデータを収集します。
感情スコア、品詞分析、タイムスタンプ、ユーザー ID を含む 100 件の箇條書きチャットを抽出します。
上位 20 件のコメントと、いいね!、感情スコア、トピックへの返信、メンバーシップ ID、名前、およびコメントのタイムスタンプを取得します。

強化された機能:

バレットチャット: ユーザー名、誕生日、登録日、フォロワー數(shù)、フォロー數(shù) (Cookie を使用)。
コメント: コメント投稿者の IP ロケーションを表示します (Web インターフェース経由)。
感情中央値、単語頻度統(tǒng)計、ワードクラウド、棒グラフを含むデータを Excel ファイルに出力します。

動作原理:

API を使用して JSON 情報を取得し、それを Excel ファイルに処理し、SnowNLP、ThuNLP、Jieba などの言語モデルを使用してテキストセグメンテーション、ストップワードフィルタリング、品詞分析、単語頻度統(tǒng)計を行います。 Matplotlib はグラフの生成に使用されます。

すぐに始めましょう

(Windows ユーザーは pip と python を使用できます。Mac ユーザーはデフォルトで pip3 と python3 を使用する必要があります。)

スクリプトソースコード: GitHub リポジトリ。

前提條件ライブラリ:
必要なライブラリをインストールします:

pip3 install --no-cache-dir -r https://ghproxy.com/https://github.com/Excalibra/scripts/blob/main/d-txt/requirements.txt

次に、スクリプトを?qū)g行します (オンライン):

python3 -c "$(curl -fsSL https://ghproxy.com/https://github.com/Excalibra/scripts/blob/main/d-python/get_bv_baseinfo.py)"

import json
import time
import requests
import os
from datetime import datetime
import re
from bs4 import BeautifulSoup
from openpyxl import Workbook
from openpyxl.styles import Alignment, Font
from snownlp import SnowNLP
import statistics
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import platform
import thulac
import matplotlib.font_manager as fm
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By


'''''''''

# Reference Links

## General

Regex: https://regex101.com/
Zhihu - Two ways to obtain Bilibili video bullet comments using Python: https://zhuanlan.zhihu.com/p/609154366
Juejin - Parsing Bilibili video bullet comments: https://juejin.cn/post/7137928570080329741
CSDN - Bilibili historical bullet comment crawler: https://blog.csdn.net/sinat_18665801/article/details/104519838
CSDN - How to write a Bilibili bullet comment crawler: https://blog.csdn.net/bigbigsman/article/details/78639053?utm_source=app
Bilibili - Bilibili bullet comment notes: https://www.bilibili.com/read/cv5187469/
Bilibili third-party API: https://www.bookstack.cn/read/BilibiliAPIDocs/README.md

## Reverse Lookup by UID

https://github.com/esterTion/BiliBili_crc2mid
https://github.com/cwuom/GetDanmuSender/blob/main/main.py
https://github.com/Aruelius/crc32-crack

## User Basic Information

https://api.bilibili.com/x/space/acc/info?mid=298220126
https://github.com/ria-klee/bilibili-uid
https://github.com/SocialSisterYi/bilibili-API-collect/blob/master/docs/user/space.md

## Comments

https://www.bilibili.com/read/cv10120255/
https://github.com/SocialSisterYi/bilibili-API-collect/blob/master/docs/comment/readme.md

## JSON

https://json-schema.apifox.cn
https://bbs.huaweicloud.com/blogs/279515
https://www.cnblogs.com/mashukui/p/16972826.html

## Cookie

https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Cookies

## Unpacking

https://www.cnblogs.com/will-wu/p/13251545.html
https://www.w3schools.com/python/python_tuples.asp

'''''''''''

class BilibiliAPI:
    @staticmethod
    # Parse video link basic information JSON and return it in JSON format
    def get_bv_json(video_url):
        video_id = re.findall(r'BV\w+', video_url)[0]
        api_url = f'https://api.bilibili.com/x/web-interface/view?bvid={video_id}'
        bv_json = requests.get(api_url).json()
        return bv_json

    @staticmethod
    # Parse video link bullet comments XML using the 'cid' field in JSON
    def get_danmu_xml(bv_json):
        cid = bv_json['data']["cid"]
        api_url = f'https://comment.bilibili.com/{cid}.xml'
        danmu_xml = api_url
        return danmu_xml

    @staticmethod
    # Parse video link comments JSON using the 'aid' field in JSON
    def get_comment_json(bv_json):
        aid = bv_json['data']["aid"]
        api_url = f'https://api.bilibili.com/x/v2/reply/main?next=1&type=1&oid={aid}'
        comment_json = requests.get(api_url).json()
        return comment_json

    @staticmethod
    # Enhanced parsing of video link comments JSON using the 'aid' field in JSON
    def get_comment_json_to_webui(bv_json):
        aid = bv_json['data']["aid"]
        api_url = f'https://api.bilibili.com/x/v2/reply/main?next=1&type=1&oid={aid}'

        # Determine the current operating system type
        if platform.system() == "Windows":
            # Windows platform
            driver = webdriver.Chrome()
        else:
            # Other platforms
            driver = webdriver.Chrome(ChromeDriverManager().install())

        # Provide login time
        print("Provide 45 seconds for Bilibili login")
        time.sleep(45)

        # Open the link
        driver.get(api_url)

        # Provide view effect time
        print("Provide 15 seconds to check the effects")
        time.sleep(15)

        # Find the <pre class="brush:php;toolbar:false"> element
        pre_element = driver.find_element(By.TAG_NAME, 'pre')

        # Get the text content of the element
        text_content = pre_element.text

        # Close WebDriver
        driver.quit()

        return text_content

    @staticmethod
    # Traverse user information and return basic parameters, preparing for XLSX write-in
    def get_user_card(mid, cookies):
            api_url = f'https://account.bilibili.com/api/member/getCardByMid?mid={mid}'
            try:
                response = requests.get(api_url, cookies=cookies)
                user_card_json = response.json()
            except json.JSONDecodeError:
                return {"error": "Failed to parse JSON. Ensure a good network environment. Too many API calls might trigger restrictions; try again later."}

            if 'message' in user_card_json:
                message = user_card_json['message']
                if 'request blocked' in message or 'frequent requests' in message:
                    return {"warning": "Ensure a good network environment. Too many API calls might trigger restrictions; try again later."}

            return user_card_json

class CRC32Checker:
    ''''''''''
    # CRC32 cracking
    # Source: https://github.com/Aruelius/crc32-crack
    # Author: Aruelius
    # Note: This section has been slightly adjusted and encapsulated as a class for easier use.
    '''''''''

    CRCPOLYNOMIAL = 0xEDB88320
    crctable = [0 for x in range(256)]

    def __init__(self):
        self.create_table()

    def create_table(self):
        # Create a CRC table for quick CRC value computation
        for i in range(256):
            crcreg = i
            for _ in range(8):
                if (crcreg & 1) != 0:
                    crcreg = self.CRCPOLYNOMIAL ^ (crcreg >> 1)
                else:
                    crcreg = crcreg >> 1
            self.crctable[i] = crcreg

    def crc32(self, string):
        # Compute the CRC32 value for the given string
        crcstart = 0xFFFFFFFF
        for i in range(len(str(string))):
            index = (crcstart ^ ord(str(string)[i])) & 255
            crcstart = (crcstart >> 8) ^ self.crctable[index]
        return crcstart

    def crc32_last_index(self, string):
        # Compute the last character CRC table index for a given string
        crcstart = 0xFFFFFFFF
        for i in range(len(str(string))):
            index = (crcstart ^ ord(str(string)[i])) & 255
            crcstart = (crcstart >> 8) ^ self.crctable[index]
        return index

    def get_crc_index(self, t):
        # Find the index in the CRC table corresponding to the highest byte value
        for i in range(256):
            if self.crctable[i] >> 24 == t:
                return i
        return -1

    def deep_check(self, i, index):
        # Deep check based on index and previous CRC32 values to verify the assumption
        string = ""
        tc = 0x00
        hashcode = self.crc32(i)
        tc = hashcode & 0xff ^ index[2]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[2]] ^ (hashcode >> 8)
        tc = hashcode & 0xff ^ index[1]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[1]] ^ (hashcode >> 8)
        tc = hashcode & 0xff ^ index[0]
        if not (tc <= 57 and tc >= 48):
            return [0]
        string += str(tc - 48)
        hashcode = self.crctable[index[0]] ^ (hashcode >> 8)
        return [1, string]

    def main(self, string):
        # Main function to compute and validate CRC32 for the given string
        index = [0 for x in range(4)]
        i = 0
        ht = int(f"0x{string}", 16) ^ 0xffffffff
        for i in range(3, -1, -1):
            index[3-i] = self.get_crc_index(ht >> (i*8))
            snum = self.crctable[index[3-i]]
            ht ^= snum >> ((3-i)*8)
        for i in range(100000000):
            lastindex = self.crc32_last_index(i)
            if lastindex == index[3]:
                deepCheckData = self.deep_check(i, index)
                if deepCheckData[0]:
                    break
        if i == 100000000:
            return -1
        return f"{i}{deepCheckData[1]}"
class Tools:
    @staticmethod
    # Get save path and format
    def get_save():
        return os.path.join(os.path.join(os.path.expanduser("~"), "Desktop"),
                            "Bilibili_Video_Analysis_{}.xlsx".format(datetime.now().strftime('%Y-%m-%d')))

    @staticmethod
    # Format timestamp
    def format_timestamp(timestamp):
        dt_object = datetime.fromtimestamp(timestamp)
        formatted_time = dt_object.strftime("%Y-%m-%d %H:%M:%S")
        return formatted_time

    @staticmethod
    # Calculate sentiment score
    def calculate_sentiment_score(text):
        s = SnowNLP(text)
        sentiment_score = s.sentiments
        return sentiment_score

    @staticmethod
    # Generate a word cloud
    def get_word_cloud(sheet_name: str, workbook: Workbook):
        sheet = workbook[sheet_name]

        # Read frequency data
        words = []
        frequencies = []
        for row in sheet.iter_rows(min_row=2, values_only=True):
            words.append(row[0])
            frequencies.append(row[1])

        system = platform.system()

        if system == 'Darwin':  # macOS
            font_path = '/System/Library/Fonts/STHeiti Light.ttc'
        elif system == 'Windows':
            font_path = 'C:/Windows/Fonts/simhei.ttf'
        else:  # Other OS
            font_path = 'simhei.ttf'

        wordcloud = WordCloud(background_color='white', max_words=100, font_path=font_path)
        word_frequency = dict(zip(words, frequencies))
        wordcloud.generate_from_frequencies(word_frequency)

        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis('off')
        plt.show()

    @staticmethod
    # Generate horizontal statistics chart
    def get_word_chart(sheet_name: str, workbook):
        sheet = workbook[sheet_name]

        words = []
        frequencies = []
        for row in sheet.iter_rows(min_row=2, values_only=True):
            words.append(row[0])
            frequencies.append(row[1])

        system = platform.system()

        if system == 'Darwin':  
            font_path = '/System/Library/Fonts/STHeiti Light.ttc'
        elif system == 'Windows':
            font_path = 'C:/Windows/Fonts/simhei.ttf'
        else:  
            font_path = 'simhei.ttf'

        custom_font = fm.FontProperties(fname=font_path)

        fig, ax = plt.subplots()
        ax.barh(words, frequencies)
        ax.set_xlabel("Frequency", fontproperties=custom_font)
        ax.set_ylabel("Words", fontproperties=custom_font)

        plt.yticks(fontproperties=custom_font)

        plt.show()

    @staticmethod
    def get_user_info_by_card(user_card_json):
        info = {
            'name': "N/A", 'birthday': "N/A", 'regtime': "N/A",
            'fans': "N/A", 'friend': "N/A"
        }

        try:
            info['name'] = user_card_json['card']['name']
            info['birthday'] = user_card_json['card']['birthday']
            info['regtime'] = Tools.format_timestamp(int(user_card_json['card']['regtime']))
            info['fans'] = user_card_json['card']['fans']
            info['friend'] = user_card_json['card']['friend']
        except KeyError:
            pass

        return tuple(info.values())

class BilibiliExcel:
    @staticmethod
    # Write video basic information
    def write_base_info(workbook, bv_json):
        sheet = workbook.create_sheet(title="Video Info")
        headers = ["Video Title", "Author", "Publish Time", "Views", "Favorites", "Shares", "Total Bullet Comments",
                   "Comments Count", "Video Description", "Category", "Video Link", "Thumbnail Link"]
        sheet.append(headers)

        data = [bv_json["data"]["title"],
                bv_json["data"]["owner"]["name"],
                Tools.format_timestamp(bv_json["data"]["pubdate"]),
                bv_json["data"]["stat"]["view"],
                bv_json["data"]["stat"]["favorite"],
                bv_json["data"]["stat"]["share"],
                bv_json["data"]["stat"]["danmaku"],
                bv_json["data"]["stat"]["reply"],
                bv_json["data"]["desc"],
                bv_json["data"]["tname"],
                video_url,
                bv_json["data"]["pic"]]

        sheet.append(data)

    @staticmethod
    def save_workbook(workbook):
        workbook.save(Tools.get_save())

class PrintInfo:
    # Print basic information
    @staticmethod
    def base_message():
        if 'Windows' == platform.system():
            os.system('cls')
        else:
            os.system('clear')

        text = '''
        ************************************

        Bilibili Video Analysis v2023.6.26
        Author: Github.com/hoochanlon
        Project URL: https://github.com/hoochanlon/scripts

        Features:
        1. Analyze and visualize Bilibili video data.

        Disclaimer: For research and learning purposes only.

        ************************************
        '''
        print(text.center(50, ' '))

if __name__ == '__main__':
    PrintInfo.base_message()

    while True:
        video_url = input("Paste the Bilibili video link: ")
        if re.match(r'.*BV\w+', video_url):
            break
        else:
            print("Invalid link format. Please re-enter.")

    bv_json = BilibiliAPI.get_bv_json(video_url)
    workbook = Workbook()
    workbook.remove(workbook.active)
    BilibiliExcel.write_base_info(workbook, bv_json)
    BilibiliExcel.save_workbook(workbook)

使用上の注意:

Cookie の入力を簡素化するには、key=value; を使用できます。「a=a;」などの形式にすると、不要な手順をスキップできます。
IP の場所を表示するには、Web ドライバー経由で Bilibili アカウントにログインする必要があります。

以上が【Python】Bilibiliビデオコメントとバレットチャットを処理?分析するスクリプトの詳細內(nèi)容です。詳細については、PHP 中國語 Web サイトの他の関連記事を參照してください。

このウェブサイトの聲明

この記事の內(nèi)容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰屬します。このサイトは、それに相當する法的責(zé)任を負いません。盜作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡(luò)ください。

ホットAIツール

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

ホットトピック

Laravel チュートリアル

1597

PHP チュートリアル

1488

Related knowledge

Pythonクラスの多型 Jul 05, 2025 am 02:58 AM

Pythonオブジェクト指向プログラミングのコアコンセプトであるPythonは、「1つのインターフェイス、複數(shù)の実裝」を指し、異なるタイプのオブジェクトの統(tǒng)一処理を可能にします。 1。多型は、メソッドの書き換えを通じて実裝されます。サブクラスは、親クラスの方法を再定義できます。たとえば、Animal ClassのSOCK（）方法は、犬と貓のサブクラスに異なる実裝を持っています。 2.多型の実用的な用途には、グラフィカルドローイングプログラムでdraw（）メソッドを均一に呼び出すなど、コード構(gòu)造を簡素化し、スケーラビリティを向上させる、ゲーム開発における異なる文字の共通の動作の処理などが含まれます。 3. Pythonの実裝多型を満たす必要があります：親クラスはメソッドを定義し、子クラスはメソッドを上書きしますが、同じ親クラスの継承は必要ありません。オブジェクトが同じ方法を?qū)g裝する限り、これは「アヒル型」と呼ばれます。 4.注意すべきことには、メンテナンスが含まれます

Pythonジェネレーターと反復(fù)器を説明します。 Jul 05, 2025 am 02:55 AM

イテレータは、__iter __（）および__next __（）メソッドを?qū)g裝するオブジェクトです。ジェネレーターは、単純化されたバージョンのイテレーターです。これは、収量キーワードを介してこれらのメソッドを自動的に実裝しています。 1. Iteratorは、次の（）を呼び出すたびに要素を返し、要素がなくなると停止例外をスローします。 2。ジェネレーターは関數(shù)定義を使用して、オンデマンドでデータを生成し、メモリを保存し、無限シーケンスをサポートします。 3。既存のセットを処理するときに反復(fù)器を使用すると、大きなファイルを読み取るときに行ごとにロードするなど、ビッグデータや怠zyな評価を動的に生成するときにジェネレーターを使用します。注：リストなどの反復(fù)オブジェクトは反復(fù)因子ではありません。イテレーターがその端に達した後、それらは再作成する必要があり、発電機はそれを一度しか通過できません。

PythonでAPI認証を処理する方法 Jul 13, 2025 am 02:22 AM

API認証を扱うための鍵は、認証方法を正しく理解して使用することです。 1。Apikeyは、通常、リクエストヘッダーまたはURLパラメーターに配置されている最も単純な認証方法です。 2。BasicAuthは、內(nèi)部システムに適したBase64エンコード送信にユーザー名とパスワードを使用します。 3。OAUTH2は、最初にclient_idとclient_secretを介してトークンを取得し、次にリクエストヘッダーにbearertokenを持ち込む必要があります。 4。トークンの有効期限に対処するために、トークン管理クラスをカプセル化し、トークンを自動的に更新できます。要するに、文書に従って適切な方法を選択し、重要な情報を安全に保存することが重要です。

一度に2つのリストを繰り返す方法Python Jul 09, 2025 am 01:13 AM

Pythonで2つのリストを同時にトラバースする一般的な方法は、Zip（）関數(shù)を使用することです。これは、複數(shù)のリストを順番にペアリングし、最短になります。リストの長さが一貫していない場合は、itertools.zip_longest（）を使用して最長になり、欠損値を入力できます。 enumerate（）と組み合わせて、同時にインデックスを取得できます。 1.Zip（）は簡潔で実用的で、ペアのデータ反復(fù)に適しています。 2.zip_longest（）は、一貫性のない長さを扱うときにデフォルト値を入力できます。 3. Enumerate（Zip（））は、トラバーサル中にインデックスを取得し、さまざまな複雑なシナリオのニーズを満たすことができます。

Python Iteratorsとは何ですか？ Jul 08, 2025 am 02:56 AM

inpython、iteratoratorSareObjectsthatallopingthroughcollectionsbyimplementing __（）and__next __（）

Pythonの主張を説明します。 Jul 07, 2025 am 12:14 AM

Assertは、Pythonでデバッグに使用されるアサーションツールであり、條件が満たされないときにアサーションエラーを投げます。その構(gòu)文は、アサート條件とオプションのエラー情報であり、パラメーターチェック、ステータス確認などの內(nèi)部ロジック検証に適していますが、セキュリティまたはユーザーの入力チェックには使用できず、明確な迅速な情報と組み合わせて使用??する必要があります。例外処理を置き換えるのではなく、開発段階での補助デバッグにのみ利用できます。

Pythonタイプのヒントとは何ですか？ Jul 07, 2025 am 02:55 AM

タイプヒントシンパソコンの問題と、ポテンシャルを使用して、dynamivitytedcodedededevelowingdeexpecifeedtypes.theyenhanceReadeadability、inableearlybugdetection、およびrequrovetoolingsusingsupport.typehintsareadddeduneadddedusingolon（:)

Python Fastapiチュートリアル Jul 12, 2025 am 02:42 AM

Pythonを使用して最新の効率的なAPIを作成するには、Fastapiをお勧めします。標準のPythonタイプのプロンプトに基づいており、優(yōu)れたパフォーマンスでドキュメントを自動的に生成できます。 FastAPIおよびASGIサーバーUVICORNをインストールした後、インターフェイスコードを記述できます。ルートを定義し、処理機能を作成し、データを返すことにより、APIをすばやく構(gòu)築できます。 Fastapiは、さまざまなHTTPメソッドをサポートし、自動的に生成されたSwaggeruiおよびRedocドキュメントシステムを提供します。 URLパラメーターはパス定義を介してキャプチャできますが、クエリパラメーターは、関數(shù)パラメーターのデフォルト値を設(shè)定することで実裝できます。 Pydanticモデルの合理的な使用は、開発の効率と精度を改善するのに役立ちます。

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

【Python】Bilibiliビデオコメントとバレットチャットを処理?分析するスクリプト

導(dǎo)入

特徴と原理

スクリプトの機能:

強化された機能:

動作原理:

すぐに始めましょう

ホットAIツール

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

人気の記事

ホットツール

メモ帳++7.3.1

SublimeText3 中國語版

ゼンドスタジオ 13.0.1

ドリームウィーバー CS6

SublimeText3 Mac版

ホットトピック