Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
modules/useragent		modules/useragent
LICENSE		LICENSE
README.md		README.md
class.txt		class.txt
clear_erro.py		clear_erro.py
crawler.py		crawler.py
crawler_baidu.py		crawler_baidu.py
crawler_sougou.py		crawler_sougou.py
get_frame.py		get_frame.py
rename.py		rename.py
requrements.txt		requrements.txt
zip_file.py		zip_file.py

Repository files navigation

使用requests+lxml爬取网站

爬取的网站

爬取的是董伟明博客标题

爬虫包含6个模块

url管理器
download下载器
parser解析器
output导出数据
crawler爬虫调度器
useragent代理池

使用项目

建议使用virtualenv在独立的环境中运行项目
pip3 install -r requirements.txt
python crawler.py

注意事项

lsxm版本使用3.5.0。目前高于3.5.0会不兼容
python版本使用3.6.0
pip3版本使用10.0.1

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%