python3爬取数据至mysql的方法
本文实例为大家分享了python3爬取数据至mysql的具体代码,供大家参考,具体内容如下
直接贴代码
#!/usr/local/bin/python3.5
# -*- coding:UTF-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import datetime
import random
import pymysql
connect = pymysql.connect(host='192.168.10.142', unix_socket='/tmp/mysql.sock', user='root', passwd='1234', db='scraping', charset='utf8')
cursor = connect.cursor()
cursor.execute('USE scraping')
random.seed(datetime.datetime.now())
def store(title, content):
execute = cursor.execute("select * from pages WHERE `title` = %s", title)
if execute <= 0:
cursor.execute("insert into pages(`title`, `content`) VALUES(%s, %s)", (title, content))
cursor.connection.commit()
else:
print('This content is already exist.')
def get_links(acticle_url):
html = urlopen('http://en.wikipedia.org' + acticle_url)
soup = BeautifulSoup(html, 'html.parser')
title = soup.h1.get_text()
content = soup.find('div', {'id': 'mw-content-text'}).find('p').get_text()
store(title, content)
return soup.find('div', {'id': 'bodyContent'}).findAll('a', href=re.compile("^(/wiki/)(.)*$"))
links = get_links('')
try:
while len(links) > 0:
newActicle = links[random.randint(0, len(links) - 1)].attrs['href']
links = get_links(newActicle)
print(links)
finally:
cursor.close()
connect.close()以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持安科网。
相关推荐
chuckchen 2020-10-31
Dreamhome 2020-10-09
xirongxudlut 2020-09-28
星辰大海的路上 2020-09-13
chaochao 2020-08-31
猪猪侠喜欢躲猫猫 2020-08-17
快递小可 2020-08-16
shengge0 2020-07-26
巩庆奎 2020-07-21
张文倩数据库学生 2020-07-19
xirongxudlut 2020-07-18
Ericbig 2020-07-18
kyelu 2020-07-09
liangzhouqu 2020-07-07
GuoSir 2020-06-28
chaigang 2020-06-27
pythonxuexi 2020-06-25