Python Monitor Water Falls(4)Crawler and Scrapy
PythonMonitorWaterFalls(4)CrawlerandScrapy
Createavirtualenv
>python3-mvenv./pythonenv
UsethatENV
>source./pythonenv/bin/activate
>pipinstallscrapy
>pipinstallscrapyd
Checkversion
>scrapy--version
Scrapy1.5.0-project:scrapy_clawer
>scrapyd--version
twistd(theTwisteddaemon)17.9.0
Copyright(c)2001-2016TwistedMatrixLaboratories.
SeeLICENSEfordetails.
>pipinstallselenium
InstallPhantomjs
http://phantomjs.org/download.html
Downloadthezipfileandplaceintheworkingdirectory
Checktheversion
>phantomjs--version
2.1.1
WarningMessage:
UserWarning:SeleniumsupportforPhantomJShasbeendeprecated,pleaseuseheadlessversionsofChromeorFirefoxinstead
warnings.warn('SeleniumsupportforPhantomJShasbeendeprecated,pleaseuseheadless'
Mar212018:Noreleases
Solution:
Runtheheadlesschrome
https://intoli.com/blog/running-selenium-with-headless-chrome/
InstallonMAC
>brewinstallchromedriver
Checkifitisthere
>chromedriver
StartingChromeDriver2.36.540469(1881fd7f8641508feb5166b7cae561d87723cfa8)onport9515
Onlylocalconnectionsareallowed.
ChangethePythonCodesasfollow,itworksagain
fromseleniumimportwebdriver
options=webdriver.ChromeOptions()
#options.binary_location='/usr/bin/google-chrome-unstable'
options.add_argument('headless')
options.add_argument('window-size=1200x600')
browser=webdriver.Chrome(chrome_options=options)
browser.get('https://hydromet.lcra.org/riverreport')
tables=browser.find_elements_by_css_selector('table.table-condensed')
tbody=tables[5].find_element_by_tag_name("tbody")
forrowintbody.find_elements_by_tag_name("tr"):
cells=row.find_elements_by_tag_name("td")
if(cells[0].text=='MarbleFalls(Starcke)'):
print(cells[1].text)
browser.quit()
RuneverythingwithScrapyandSeeifitisworkingwell.
AnotherwaytocreateENV
>virtualenvenv
>.env/bin/activate
CheckversionforPYTHONmodule
>pipinstallshow
>pipinstallselenium
>pipshowselenium|grepVersion
Version:3.11.0
Ihaveafilenamedrequirements.txt
selenium==3.11.0
Icanrun
>pipinstall-rrequirements.txt
Hereishowitgeneratetherequirements.txt
>pipfreeze>requirements.txt
HowtoruntheSpiderLocal
>scrapycrawlquotes
PreparetheDeploymentENV
>pipinstallscrapyd
>pipinstallscrapyd-client
StarttheServer
>scrapyd
Deploythespider
>scrapyd-deploy
ListtheProjectsandspiders
curlhttp://localhost:6800/listprojects.json
curlhttp://localhost:6800/listspiders.json?project=default
curlhttp://localhost:6800/schedule.json-dproject=default-dspider=quotes
InvestigatetheRequests
https://requestbin.fullcontact.com/
https://hookbin.com/
Alotofdetailsareinprojectmonitor-water
References:
mysql
haproxy
https://intoli.com/blog/running-selenium-with-headless-chrome/