Python爬虫:urllib库的基本使用
Python爬虫:urllib库的基本使用
Python爬虫
请求网址获取网页代码- import urllib.request
- url = "http://www.baidu.com"
- response = urllib.request.urlopen(url)
- data = response.read()
- # print(data)
- # 将文件获取的内容转换成字符串
- str_data = data.decode("utf-8")
- print(str_data)
- # 将结果保存到文件中
- with open("baidu.html", "w", encoding="utf-8") as f:
- f.write(str_data)
- import urllib.request
-
- def get_method_params(wd):
- url = "http://www.baidu.com/s?wd="
- # 拼接字符串
- final_url = url + wd
- # 发送网络请求
- response = urllib.request.urlopen(final_url)
- print(response.read().decode("utf-8"))
-
- get_method_params("美女")
直接这么写会报错:
原因是,网址里面包含了汉字,但是ascii码是没有汉字的,需要转义一下:
- import urllib.request
- import urllib.parse
- import string
-
- def get_method_params(wd):
- url = "http://www.baidu.com/s?wd="
- # 拼接字符串
- final_url = url + wd
- # 将包含汉字的网址进行转义
- encode_new_url = urllib.parse.quote(final_url, safe=string.printable)
- # 发送网络请求
- response = urllib.request.urlopen(encode_new_url)
- print(response.read().decode("utf-8"))
-
- get_method_params("美女")
- import urllib.request
- import urllib.parse
- import string
-
- def get_params():
- url = "http://www.baidu.com/s?w"
-
- params = {
- "wd": "美女",
- "key": "zhang",
- "value": "san"
- }
-
- str_params = urllib.parse.urlencode(params)
- print(str_params)
-
- final_url = url + str_params
- # 将带有中文的url转义
- encode_url = urllib.parse.quote(final_url, safe=string.printable)
-
- response = urllib.request.urlopen(encode_url)
- data = response.read().decode("utf-8")
- print(data)
-
- get_params()
相关推荐
sunzhihaofuture 2020-07-19
sunzhihaofuture 2020-06-06
oXiaoChong 2020-06-05
ARCXIANG 2020-06-05
ZHANGRENXIANG00 2020-06-28
kikaylee 2020-05-05
夜斗不是神 2020-11-17
染血白衣 2020-11-16
ARCXIANG 2020-11-02
ARCXIANG 2020-10-28
CycloneKid 2020-10-27
荒谬小孩 2020-10-26
逍遥友 2020-10-26
snakeson 2020-10-09
meylovezn 2020-08-28
囧芝麻 2020-08-17
数据挖掘工人 2020-08-15
cxcxrs 2020-07-28