亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Web crawler - how to cooperate with requests in python's multi-process
阿神
阿神 2017-06-22 11:52:30
0
2
855

This is the code for sequential execution of a single process:

import requests,time,os,random

def img_down(url):
    with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
        fob.write(requests.get(url).content)

urllist=[]
with open("urllist.txt","r+") as u:
    for a in u.readlines():
        urllist.append(a.strip())

s=time.clock()
for i in range(len(urllist)):
    img_down(urllist[i])
e=time.clock()

print ("time: %d" % (e-s))

This is the code for multi-process:

from multiprocessing import Pool
import requests,os,time,random

def img_down(url):
    with open("{}".format(str(random.random())+os.path.basename(url)),"wb") as fob:
        fob.write(requests.get(url).content)

if __name__=="__main__":
    urllist=[]
    with open("urllist.txt","r+") as urlfob:
        for s in urlfob.readlines():
            urllist.append(s.strip())

    s=time.clock()
    p=Pool()
    for i in range(len(urllist)):
        p.apply_async(img_down,args=(urllist[i],))
    p.close()
    p.join()
    e=time.clock()
    
    print ("time: {}".format(e-s))

But there is almost no difference between the time spent in single process and multi-process. The problem is probably that requests block IO. Is your understanding correct? How to modify the code to achieve the purpose of multi-process?
Thanks!

阿神
阿神

閉關(guān)修行中......

reply all(2)
phpcn_u1582

The bottleneck of writing files is disk IO, not CPU. Parallelism does not have much effect. You can try not to write files and then compare the times

劉奇

Pool without parameters uses
os.cpu_count() or 1
If it is a single-core CPU, or the number cannot be collected, there is only one process.

That should be the reason.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template