python基础字符编码转换

JakobHu

2019-11-09

python2

#python2上所有的字符编码都需要先decode到unicode,再从unicode encode到目标编码
str_utf8 = "我就是我"
print("str_utf-8:我就是我：",str_utf8)
#将utf-8转换为unicode
str_utf8_to_unicode = str_utf8.decode("utf-8")
print(str_utf8_to_unicode)
#将unicode转换为gbk
str_utf8_to_unicode_to_gbk = str_utf8_to_unicode.encode("gbk")

上述程序的print是否能够正常显示和运行终端的编码也有关系

python3

字符串与byte类型转换

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
#字符串与二进制的转换只是存储方式的变化，不涉及编码方式的修改，每次encode/decode都需要指定对应字符串的编码格式
str1 = "我就是我"
#字符串转换为二进制，使用encode。此处编码格式需要与字符串原有编码格式一致，否则在python3上会进行转码操作
str1_byte = str1.encode(encoding="utf-8")
#二进制转换为字符串，使用decode，此处二进制的编码格式如果填写错误可能会导致二进制无法转换为字符串，导致程序报错
byte_str1 = str1_byte.decode(encoding="utf-8")
print(str1,str1_byte,byte_str1)

python3上字符编码转换

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
#python3上默认编码是unicode是不需要进行解码这一步，文件头声明-*- coding:gbk -*-只是文件自身的编码，程序里面字符串变量还是unicode，
str2_utf8 = "我就是我"
#python3上str2_utf8默认是unicode编码（字符串本身直接无decode方法），直接encode到对应到编码，同时python3会将其转换为byte类型
str2_utf8_to_gbk = str2_utf8.encode(encoding="gbk")
#打印我就是我的gbk编码byte类型
print(str2_utf8_to_gbk)
#打印我就是我的gbk编码byte类型对应的字符串,只要指定正确的编码类型，也能正确转换为字符串
print(str2_utf8_to_gbk.decode(encoding="gbk"))
#打印我就是我utf-8编码byte类型,转换byte类型将其encode为自身的utf-8类型即可
str2_utf8_to_utf8_byte = str2_utf8.encode(encoding="utf-8")
print(str2_utf8_to_utf8_byte)
#将我就是我gbk类型转换为utf-8，由于编码转换需要unicode作为中间媒介，先将gbk编码decode至unicode，在将其encode至utf-8;
# python在encode是会将其转换为byte类型，所以其输出与str2_utf8_to_utf8_byte打印值一致，如需将其转换为字符串则对其进行自身编码的decode
print(str2_utf8_to_gbk.decode(encoding="gbk").encode(encoding="utf-8"))
#将二进制的utf-8转换为字符串
print(str2_utf8_to_gbk.decode(encoding="gbk").encode(encoding="utf-8").decode("utf-8"))

结果：

b‘\xce\xd2\xbe\xcd\xca\xc7\xce\xd2‘
我就是我
b‘\xe6\x88\x91\xe5\xb0\xb1\xe6\x98\xaf\xe6\x88\x91‘
b‘\xe6\x88\x91\xe5\xb0\xb1\xe6\x98\xaf\xe6\x88\x91‘
我就是我

划重点：

python3中相比于python的decode除了将原有编码转换为unicode功能外还增加将byte转换为字符串功能；encode除了将unicode转换为对应编码格式外还增加了将字符串转换为byte类型的功能

unicode 字符编码编码转换 python字符串操作二进制 python3 utf8 gbk