abstract:本文實例講述了Python3實現(xiàn)并發(fā)檢驗代理池地址的方法。分享給大家供大家參考,具體如下:#encoding=utf-8 #author: walker #date: 2016-04-14 #summary: 用協(xié)程/線程池并發(fā)檢驗代理有效性 import os, sys, time import requests from&n
本文實例講述了Python3實現(xiàn)并發(fā)檢驗代理池地址的方法。分享給大家供大家參考,具體如下:
#encoding=utf-8 #author: walker #date: 2016-04-14 #summary: 用協(xié)程/線程池并發(fā)檢驗代理有效性 import os, sys, time import requests from concurrent import futures cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__)) Headers = { 'Accept': '*/*', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)', } #檢驗單個代理的有效性 #如果有效,返回該proxy;否則,返回空字符串 def Check(desturl, proxy, feature): proxies = {'http': 'http://' + proxy} r = None #聲明 exMsg = None try: r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3) except: exMsg = '* ' + traceback.format_exc() #print(exMsg) finally: if 'r' in locals() and r: r.close() if exMsg: return '' if r.status_code != 200: return '' if r.text.find(feature) < 0: return '' return proxy #輸入代理列表(set/list),返回有效代理列表 def GetValidProxyPool(rawProxyPool, desturl, feature): validProxyList = list() #有效代理列表 pool = futures.ThreadPoolExecutor(8) futureList = list() for proxy in rawProxyPool: futureList.append(pool.submit(Check, desturl, proxy, feature)) print('\n submit done, waiting for responses\n') for future in futures.as_completed(futureList): proxy = future.result() print('proxy:' + proxy) if proxy: #有效代理 validProxyList.append(proxy) print('validProxyList size:' + str(len(validProxyList))) return validProxyList #獲取原始代理池 def GetRawProxyPool(): rawProxyPool = set() #通過某種方式獲取原始代理池...... return rawProxyPool if __name__ == "__main__": rawProxyPool = GetRawProxyPool() desturl = 'http://...' #需要通過代理訪問的目標地址 feature = 'xxx' #目標網(wǎng)頁的特征碼 validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)
更多關于Python3實現(xiàn)并發(fā)檢驗代理池地址的方法請關注PHP中文網(wǎng)(ipnx.cn)其他文章!