abstract:下面是最簡單的實現(xiàn)方法,先將目標網(wǎng)頁抓回來,然后通過正則匹配a標簽中的href屬性來獲得超鏈接代碼如下:import urllib2 import re url = 'http://www.sunbloger.com/' req = urllib2.Request(url)
下面是最簡單的實現(xiàn)方法,先將目標網(wǎng)頁抓回來,然后通過正則匹配a標簽中的href屬性來獲得超鏈接
代碼如下:
import urllib2 import re url = 'http://www.sunbloger.com/' req = urllib2.Request(url) con = urllib2.urlopen(req) doc = con.read() con.close() links = re.findall(r'href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"', doc) for a in links: print a
更多關(guān)于Python提取網(wǎng)頁中超鏈接的方法請關(guān)注PHP中文網(wǎng)(ipnx.cn)其他文章!