QueryList Collector-Entwicklungshandbuch
/ 對于更復(fù)雜的http網(wǎng)絡(luò)操作
對于更復(fù)雜的http網(wǎng)絡(luò)操作
QueryList
本身內(nèi)置的網(wǎng)絡(luò)操作非常簡單,QueryList
關(guān)注于DOM選擇;對于更復(fù)雜的網(wǎng)絡(luò)操作可以選擇使用Request擴(kuò)展
,它可以簡單的實(shí)現(xiàn):攜帶cookie、偽造來路、偽造瀏覽器等功能,但如果覺的它依舊不能滿足你的需求,下面有幾個(gè)可以參考的方案:
例:
function getHtml($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_REFERER, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $result = curl_exec($ch); curl_close($ch); return $result; } $rules = array( //采集規(guī)則 ); //獲取頁面源碼 $html = getHtml('http://xxx.com'); //采集 $data = QueryList::Query($html,$rules)->data;
QueryList
可以無縫與任意第三放http包配合使用,下面以guzzlehttp/guzzle
包為例,Guzzle
是一個(gè)PHP的HTTP客戶端,用來輕而易舉地發(fā)送請求,并集成到我們的WEB服務(wù)上。
Guzzle
中文手冊:http://guzzle-cn.readthedocs.io/zh_CN/latest/
安裝
//安裝QueryList composer require jaeger/querylist //安裝Guzzle composer require guzzlehttp/guzzle
使用
<?php require 'vendor/autoload.php'; //實(shí)例化一個(gè)Http客戶端 $client = new GuzzleHttp\Client(['base_uri' => 'https://phphub.org']); $jar = new \GuzzleHttp\Cookie\CookieJar(); //發(fā)送一個(gè)Http請求 $response = $client->request('GET', '/categories/6', [ 'headers' => [ 'User-Agent' => 'testing/1.0', 'Accept' => 'application/json', 'X-Foo' => ['Bar', 'Baz'] ], 'form_params' => [ 'foo' => 'bar', 'baz' => ['hi', 'there!'] ], // 'cookies' => $jar, 'timeout' => 3.14, // 'proxy' => 'tcp://localhost:8125', // 'cert' => ['/path/server.pem', 'password'], ]); $body = $response->getBody(); //獲取到頁面源碼 $html = (string)$body; //采集規(guī)則 $rules = array( //文章標(biāo)題 'title' => ['.media-heading a','text'], //文章鏈接 'link' => ['.media-heading a','href'], //文章作者名 'author' => ['.img-thumbnail','alt'] ); //列表選擇器 $rang = '.topic-list>li'; //采集 $data = \QL\QueryList::Query($html,$rules,$rang)->data; //查看采集結(jié)果 print_r($data);
結(jié)果:
Array ( [0] => Array ( [title] => 好友動(dòng)態(tài)的實(shí)現(xiàn)原理 [link] => https://phphub.org/topics/2750 [author] => luo975974740 ) [1] => Array ( [title] => 打造完美的 Ubuntu16.04 開發(fā)環(huán)境【持續(xù)更新】 [link] => https://phphub.org/topics/2723 [author] => liuwantao ) //省略........ [19] => Array ( [title] => [Laravel 5.3 新功能] 10. 全文搜索方案 Laravel Scout 介紹 [link] => https://phphub.org/topics/2673 [author] => monkey ) )