亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Table of Contents
Preface
Crawling WeChat public account articles (using wechatsogou)
1. Installation
2. Usage method
Generate PDF files
1.Install wkhtmltopdf
2.Install pdfkit
3. How to use
Complete code
Home WeChat Applet WeChat Development Crawl WeChat public account articles and save them as PDF files (Python method)

Crawl WeChat public account articles and save them as PDF files (Python method)

Aug 29, 2020 pm 05:14 PM

Crawl WeChat public account articles and save them as PDF files (Python method)

[Related learning recommendations: WeChat public account development tutorial

Preface

This is my first time writing a blog. The main content is to crawl articles from WeChat public accounts and save the articles locally in PDF format.

Crawling WeChat public account articles (using wechatsogou)

1. Installation

pip install wechatsogou --upgrade

wechatsogou is a WeChat public account crawler interface based on Sogou WeChat search

2. Usage method

The usage method is as follows

import wechatsogou
# captcha_break_time為驗(yàn)證碼輸入錯(cuò)誤的重試次數(shù),默認(rèn)為1
ws_api = wechatsogou.WechatSogouAPI(captcha_break_time=3)
# 公眾號(hào)名稱
gzh_name = ''
# 將該公眾號(hào)最近10篇文章信息以字典形式返回
data = ws_api.get_gzh_article_by_history(gzh_name)

data data structure:

{
    'gzh': {
        'wechat_name': '',  # 名稱
        'wechat_id': '',  # 微信id
        'introduction': '',  # 簡(jiǎn)介
        'authentication': '',  # 認(rèn)證
        'headimage': ''  # 頭像
    },
    'article': [
        {
            'send_id': int,  # 群發(fā)id,注意不唯一,因?yàn)橥淮稳喊l(fā)多個(gè)消息,而群發(fā)id一致
            'datetime': int,  # 群發(fā)datatime 10位時(shí)間戳
            'type': '',  # 消息類型,均是49(在手機(jī)端歷史消息頁有其他類型,網(wǎng)頁端最近10條消息頁只有49),表示圖文
            'main': int,  # 是否是一次群發(fā)的第一次消息 1 or 0
            'title': '',  # 文章標(biāo)題
            'abstract': '',  # 摘要
            'fileid': int,  #
            'content_url': '',  # 文章鏈接
            'source_url': '',  # 閱讀原文的鏈接
            'cover': '',  # 封面圖
            'author': '',  # 作者
            'copyright_stat': int,  # 文章類型,例如:原創(chuàng)啊
        },
        ...
    ]
}

Two pieces of information need to be obtained here: article title and article url.

After getting the article url, you can convert the html page into a pdf file based on the url.

Generate PDF files

1.Install wkhtmltopdf

Download address: https://wkhtmltopdf.org/downloads.html

2.Install pdfkit

pip install pdfkit

3. How to use

import pdfkit
# 根據(jù)url生成pdf
pdfkit.from_url('http://baidu.com','out.pdf')
# 根據(jù)html文件生成pdf
pdfkit.from_file('test.html','out.pdf')
# 根據(jù)html代碼生成pdf
pdfkit.from_string('Hello!','out.pdf')

If you directly use the article URL obtained above to generate pdf, there will be a problem that the pdf file does not display the article image.

Solution:

# 該方法根據(jù)文章url對(duì)html進(jìn)行處理,使圖片顯示
content_info = ws_api.get_article_content(url)
# 得到html代碼(代碼不完整,需要加入head、body等標(biāo)簽)
html_code = content_info['content_html']

Then construct the complete html code based on html_code and call the pdfkit.from_string() method to generate the pdf file. At this time, you will find the pictures in the article It is displayed in the pdf file.

Complete code

import os
import pdfkit
import datetime
import wechatsogou

# 初始化API
ws_api = wechatsogou.WechatSogouAPI(captcha_break_time=3)


def url2pdf(url, title, targetPath):
    '''
    使用pdfkit生成pdf文件
    :param url: 文章url
    :param title: 文章標(biāo)題
    :param targetPath: 存儲(chǔ)pdf文件的路徑
    '''
    try:
        content_info = ws_api.get_article_content(url)
    except:
        return False
    # 處理后的html
    html = f'''
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>{title}</title>
    </head>
    <body>
    <h2 style="text-align: center;font-weight: 400;">{title}</h2>
    {content_info[&#39;content_html&#39;]}
    </body>
    </html>
    &#39;&#39;&#39;
    try:
        pdfkit.from_string(html, targetPath + os.path.sep + f&#39;{title}.pdf&#39;)
    except:
        # 部分文章標(biāo)題含特殊字符,不能作為文件名
        filename = datetime.datetime.now().strftime(&#39;%Y%m%d%H%M%S&#39;) + &#39;.pdf&#39;
        pdfkit.from_string(html, targetPath + os.path.sep + filename)


if __name__ == &#39;__main__&#39;:
    # 此處為要爬取公眾號(hào)的名稱
    gzh_name = &#39;&#39;
    targetPath = os.getcwd() + os.path.sep + gzh_name
    # 如果不存在目標(biāo)文件夾就進(jìn)行創(chuàng)建
    if not os.path.exists(targetPath):
        os.makedirs(targetPath)
    # 將該公眾號(hào)最近10篇文章信息以字典形式返回
    data = ws_api.get_gzh_article_by_history(gzh_name)
    article_list = data[&#39;article&#39;]
    for article in article_list:
        url = article[&#39;content_url&#39;]
        title = article[&#39;title&#39;]
        url2pdf(url, title, targetPath)

Related learning recommendations: python tutorial

The above is the detailed content of Crawl WeChat public account articles and save them as PDF files (Python method). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

PHP Tutorial
1488
72