Python Monitor Water Falls(4)Crawler and Scrapy

PythonMonitorWaterFalls(4)CrawlerandScrapy

Createavirtualenv

>python3-mvenv./pythonenv

UsethatENV

>source./pythonenv/bin/activate

>pipinstallscrapy

>pipinstallscrapyd

Checkversion

>scrapy--version

Scrapy1.5.0-project:scrapy_clawer

>scrapyd--version

twistd(theTwisteddaemon)17.9.0

Copyright(c)2001-2016TwistedMatrixLaboratories.

SeeLICENSEfordetails.

>pipinstallselenium

InstallPhantomjs

http://phantomjs.org/download.html

Downloadthezipfileandplaceintheworkingdirectory

Checktheversion

>phantomjs--version

2.1.1

WarningMessage:

UserWarning:SeleniumsupportforPhantomJShasbeendeprecated,pleaseuseheadlessversionsofChromeorFirefoxinstead

warnings.warn('SeleniumsupportforPhantomJShasbeendeprecated,pleaseuseheadless'

Mar212018:Noreleases

Solution:

Runtheheadlesschrome

https://intoli.com/blog/running-selenium-with-headless-chrome/

InstallonMAC

>brewinstallchromedriver

Checkifitisthere

>chromedriver

StartingChromeDriver2.36.540469(1881fd7f8641508feb5166b7cae561d87723cfa8)onport9515

Onlylocalconnectionsareallowed.

ChangethePythonCodesasfollow,itworksagain

fromseleniumimportwebdriver

options=webdriver.ChromeOptions()

#options.binary_location='/usr/bin/google-chrome-unstable'

options.add_argument('headless')

options.add_argument('window-size=1200x600')

browser=webdriver.Chrome(chrome_options=options)

browser.get('https://hydromet.lcra.org/riverreport')

tables=browser.find_elements_by_css_selector('table.table-condensed')

tbody=tables[5].find_element_by_tag_name("tbody")

forrowintbody.find_elements_by_tag_name("tr"):

cells=row.find_elements_by_tag_name("td")

if(cells[0].text=='MarbleFalls(Starcke)'):

print(cells[1].text)

browser.quit()

RuneverythingwithScrapyandSeeifitisworkingwell.

AnotherwaytocreateENV

>virtualenvenv

>.env/bin/activate

CheckversionforPYTHONmodule

>pipinstallshow

>pipinstallselenium

>pipshowselenium|grepVersion

Version:3.11.0

Ihaveafilenamedrequirements.txt

selenium==3.11.0

Icanrun

>pipinstall-rrequirements.txt

Hereishowitgeneratetherequirements.txt

>pipfreeze>requirements.txt

HowtoruntheSpiderLocal

>scrapycrawlquotes

PreparetheDeploymentENV

>pipinstallscrapyd

>pipinstallscrapyd-client

StarttheServer

>scrapyd

Deploythespider

>scrapyd-deploy

ListtheProjectsandspiders

curlhttp://localhost:6800/listprojects.json

curlhttp://localhost:6800/listspiders.json?project=default

curlhttp://localhost:6800/schedule.json-dproject=default-dspider=quotes

InvestigatetheRequests

https://requestbin.fullcontact.com/

https://hookbin.com/

Alotofdetailsareinprojectmonitor-water

References:

mysql

haproxy

https://intoli.com/blog/running-selenium-with-headless-chrome/

相关推荐