Python Buildin IO

zhongranxu

2019-06-28

关注关注

file & open

两者的调用方法：

file(name[, mode[, buffering]]) open(filename [, mode [, bufsize]])

两者的区别

file()是file类型的构造函数，open()为Python内置函数。两者的参数、作用大致相同。

但在Python3之中file()已经不存在了，所以从发展趋势来看的话，建议使用open()

两者的参数说明：

如果文件不存在且mode为写或追加时，文件将被创建。

添加b到mode参数中，将对文件以二进制形式操作。

添加+到mode参数中，将允许对文件同时进行读写操作

参数filename：文件名称。

参数mode：r（读）、w（写）、a（追加）。

参数buffering：

如果为0表示不进行缓冲，
如果为1表示进行行缓冲，
如果是一个大于1的数表示缓冲区的大小。

返回值：

两者都返回一个file对象

由于两者的用法区别不大，下面以open为例：

In [15]: f1 = open('test1.txt', 'r')        # 以只读的方式打开

In [16]: type(f1)
Out[16]: file

file对象支持的方法：

方法	描述
`file.close()`	关闭文件，关闭后文件不能再进行读写操作，<br/>但也节省了资源！
`file.flush()`	刷新文件内部缓冲，直接把内部缓冲区的数据立刻写入文件，<br/>而不是被动的等待文件资源关闭后，缓冲区自然输出
`file.fileno()`	返回一个整型的文件描述符（file descriptor FD赘型)，<br/>可以用在如os模块的read方法等一些底层操作上，
`file.isatty()`	如果文件连接到一个终端设备返回：True ，否则返回False
`file.next()`	返回文件下一行。
`file.read (size)`	从文件读取指定的字节数，如果未给定或为负则读取所有。
`file.readline( [size])`	读取整行，包括"n”字符，
`file.readlines([sizehint])`	读取所有行并返回列表，若给定sizehint > 0,<br/>返回总和大约为sizehint字节的行，实际读取值可能比sizhint较大
`file.seek(offset[, whence])]`	设文件指针当前位置
`file.tell()`	返回文件指针当前位置，若是EOF，则等于-1
`file.truncate( [size])`	截取文件，截取的字节通过size指定，默认为当前文件位置开始。
`file.write(str)`	将字符串写入文件，没有返回值
`file.writelines(sequence)`	写入一个字符串列表，需要换行则要加入每行的换行符。(LF或者CRLF)

常用的属性：

属性	描述
`closed`	文件是否关闭
`mode`	文件打开的模式
`encoding`	文件的编码方式
`newlines`	可以取的值有`None`, `\n`, `\r`, `”`, `‘\r\n'`，用于区分换行符

一些栗子

文件描述符

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test1.txt", "wb")
print "文件名为: ", fo.name

fid = fo.fileno()
print "文件描述符为: ", fid

# 关闭文件
fo.close()

运行结果：

文件名为:  runoob.txt
文件描述符为:  3

设备连接情况

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test1.txt", "wb")
print "文件名为: ", fo.name

ret = fo.isatty()
print "返回值 : ", ret

# 关闭文件
fo.close()

运行结果：

文件名为:  test1.txt
返回值 :  False
[Finished in 0.1s]

迭代方法

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test1.txt", "r+")
print "文件名为: ", fo.name

for index in range(5):
    line = fo.next()
    print "第 %d 行 - %s" % (index, line)

# 关闭文件
fo.close()

运行结果：

文件名为:  test1.txt
第 0 行 - 1. Google

第 1 行 - 2. Microsoft

第 2 行 - 3. IBM

第 3 行 - 4. Oracle

第 4 行 - 5. FaceBook

[Finished in 0.1s]

脚本：

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test1.txt", "r+")
print "文件名为: ", fo.name

line = fo.read(10)
print "读取的字符串: %s" % (line)

# 关闭文件
fo.close()

运行结果：

文件名为:  test1.txt
读取的字符串: 1. Google

[Finished in 0.1s]

缓存分块

在处理日志文件的时候，常常会遇到这样的情况：日志文件巨大，不可能一次性把整个文件读入到内存中进行处理，例如需要在一台物理内存为 2GB 的机器上处理一个 2GB 的日志文件，我们可能希望每次只处理其中 200MB 的内容。
在 Python 中，内置的 File 对象直接提供了一个 readlines(sizehint) 函数来完成这样的事情。以下面的代码为例：

file = open('test.log', 'r')
sizehint = 209715200   # 200M
position = 0
lines = file.readlines(sizehint)
while not file.tell() - position < 0:
    position = file.tell()
    lines = file.readlines(sizehint)

每次调用 readlines(sizehint) 函数，会返回大约 200MB 的数据，而且所返回的必然都是完整的行数据，大多数情况下，返回的数据的字节数会稍微比 sizehint 指定的值大一点（除最后一次调用 readlines(sizehint) 函数的时候）。通常情况下，Python 会自动将用户指定的 sizehint 的值调整成内部缓存大小的整数倍。

截断文件

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test1.txt", "r+")
print "文件名为: ", fo.name

line = fo.readline()
print "读取第一行: %s" % (line)

# 截断剩下的字符串
fo.truncate()

# 尝试再次读取数据
line = fo.readline()
print "读取数据: %s" % (line)

# 关闭文件
fo.close()

运行结果：

文件名为:  test1.txt
读取第一行: 1. Google

读取数据: 
[Finished in 0.1s]

truncate方法需要文件对象以可写的方式打开，运行之后文件内容清空。注意操作

写入多选

需要指定换行符

#!/usr/bin/python
# -*- coding: UTF-8 -*-

# 打开文件
fo = open("test.txt", "w")
print "文件名为: ", fo.name
seq = ["hello the cruel world! 1\n", "hello the cruel world! 2"]
fo.writelines(seq)

# 关闭文件
fo.close()

运行结果：

文件名为:  test.txt
[Finished in 0.1s]

查看文件内容：

✐ at [10/30/17][22:59:10] on master✂ ☁☁☁
✎ ➤  cat test.txt
hello the cruel world! 1
hello the cruel world! 2

raw_input & input

Python2 提供了 input，raw_input 等用于输入，但在 Python3 中发生了一些改变，raw_input 已经没有了，input 的用法发生了变化，

input([prompt]) & raw_input([prompt])

input与 repr相似的是两者都要精确，raw_input与str相似的是两者都要可读

首先看 Python2 中的 raw_input，它的用法如下：

raw_input(prompt)

其中，prompt 表示输入提示。raw_input 会读取控制台的输入，并返回字符串类型。

一些栗子

>>> name = raw_input('please enter your name: ')
please enter your name: ethan     # 输入一个字符串
>>> name
'ethan'
>>> type(name)
<type 'str'>
>>>
>>> num = raw_input('please enter your id: ')
please enter your id: 12345       # 输入一个数值
>>> num
'12345'
>>> type(num)
<type 'str'>
>>>
>>> sum = raw_input('please enter a+b: ')
please enter a+b: 3+6             # 输入一个表达式
>>> sum
'3+6'
>>> type(sum)
<type 'str'>

可以看到，不管我们输入一个字符串、数值还是表达式，raw_input 都直接返回一个字符串。

input 的用法跟 raw_input 类似，形式如下：

input(prompt)

事实上，input 本质上是使用 raw_input 实现的，如下：

def input(prompt):
    return (eval(raw_input(prompt)))

也就是说，调用 input 实际上是通过调用 raw_input 再调用 eval 函数实现的。

这里的 eval 通常用来执行一个字符串表达式，并返回表达式的值，它的基本用法如下：

>>> eval('1+2')
3
>>> a = 1
>>> eval('a+9')
10

这就说明需要精确输入表达式才不会报错。现在，让我们看看 input 的用法：

>>> name = input('please input your name: ')
please input your name: ethan         # 输入字符串如果没加引号会出错
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'ethan' is not defined
>>>
>>> name = input('please input your name: ')
please input your name: 'ethan'       # 添加引号
>>> name
'ethan'
>>>
>>> num = input('please input your id: ')
please input your id: 12345           # 输入数值
>>> num                               # 注意返回的是数值类型，而不是字符串
12345
>>> type(num)
<type 'int'>
>>>
>>> sum = input('please enter a+b: ')  # 输入数字表达式，会对表达式求值
please enter a+b: 3+6
>>> sum
9
>>> type(sum)
<type 'int'>
>>>
>>> sum = input('please enter a+b: ')   # 输入字符串表达式，会字符串进行运算
please enter a+b: '3'+'6'
>>> sum
'36'

可以看到，使用 input 的时候，如果输入的是字符串，必须使用引号把它们括起来；如果输入的是数值类型，则返回的也是数值类型；如果输入的是表达式，会对表达式进行运算。

再来看一下 Python3 中的 input。

事实上，Python3 中的 input 就是 Python2 中的 raw_input，也就是说，原 Python2 中的 raw_input 被重命名为 input 了。那如果我们想使用原 Python2 的 input 功能呢？你可以这样做：

eval(input())

也就是说，手动添加 eval 函数。

print

一些栗子

格式化输出整数

>>> strHello = "the length of (%s) is %d" %('Hello World',len('Hello World'))
>>> print strHello
the length of (Hello World) is 11

格式化输出16制整数

#%x --- hex 十六进制
#%d --- dec 十进制
#%o --- oct 八进制

>>> nHex = 0x20
>>> print "nHex = %x,nDec = %d,nOct = %o" %(nHex,nHex,nHex)
nHex = 20,nDec = 32,nOct = 40

格式化输出浮点数(float)

>>> import math
>>> #default
>>> print "PI = %f"%math.pi
PI = 3.141593
>>> #width = 10,precise = 3,align = left
>>> print "PI = %10.3f"%math.pi
PI =      3.142
>>> #width = 10,precise = 3,align = rigth
>>> print "PI = %-10.3f" % math.pi
PI = 3.142     
>>> #前面填充字符
>>> print "PI = %06d" % int(math.pi)
PI = 000003
>>> #显示正负号
>>> print '%+f'% math.pi
+3.141593
>>> print(format(math.pi,'6.2f'))
  3.14
>>> print(format(math.pi,'6f'))
3.141593
>>> print(format(math.pi,'15f'))
       3.141593
>>> print(format(math.pi,'6.0f'))
     3
>>> print(format(math.pi,'6%'))
314.159265%

格式化输出字符串(string)

>>> #precise = 3
>>> print "%.3s " % ("jcodeer")
jco 
>>> #precise = 4
>>> print "%.*s" % (4,"jcodeer")
jcod
>>> #width = 10,precise = 3
>>> print "%10.3s" % ("jcodeer")
       jco

输出列表(list)

>>> l = [1,2,3,4,'jcodeer']
>>> print l
[1, 2, 3, 4, 'jcodeer']

'''6.出字典(dictionary)'''
>>> d = {1:'A',2:'B',3:'C',4:'D'}
>>> print d
{1: 'A', 2: 'B', 3: 'C', 4: 'D'}

python print自动换行

print 会自动在行末加上回车,如果不需回车，只需在print语句的结尾添加一个逗号”,“，就可以改变它的行为。

>>> for i in range(0,5):
        print i

        
0
1
2
3
4
>>> for i in range(0,5):
        print i,

        
0 1 2 3 4

或直接使用下面的函数进行输出：

>>> import sys
>>> sys.stdout.write('Hello World')
Hello World

万能的 %r
%r是一个万能的格式符，它会将后面给的参数原样打印出来，带有类型信息。

>>> formatter = "%r %r %r %r"
>>> print formatter % (1, 2, 3, 4)
1 2 3 4
>>> print formatter % ("one", "two", "three", "four")
'one' 'two' 'three' 'four'
>>> print formatter % (True, False, False, True)
True False False True
>>> print formatter % (formatter, formatter, formatter, formatter)
'%r %r %r %r' '%r %r %r %r' '%r %r %r %r' '%r %r %r %r'
>>> print formatter % (
"I had this thing.",
"That you could type up right.",
 "But it didn't sing.",
 "So I said goodnight."
 )
'I had this thing.' 'That you could type up right.' "But it didn't sing." 'So I said goodnight.'

print语句输出包含转义字符的字符串的处理方法

>>> print "I'm OK"
I'm OK
>>> print 'Learn "Python" in imooc'
Learn "Python" in imooc
>>> print 'Bob said \"I\'m OK\".'
Bob said "I'm OK".
>>> s = 'Python was started in 1989 by \"Guido\".\nPython is free and easy to learn.'
>>> s
'Python was started in 1989 by "Guido".\nPython is free and easy to learn.'
>>> print s
Python was started in 1989 by "Guido".
Python is free and easy to learn.
>>> print r'''"To be, or not to be": that is the question.
Whether it's nobler in the mind to suffer.'''
"To be, or not to be": that is the question.
Whether it's nobler in the mind to suffer.

print语句输出包含中文字符的字符串的处理方法

>>> print u'''静夜思
... 床前明月光，
... 疑是地上霜。
... 举头望明月，
... 低头思故乡。'''
静夜思
床前明月光，
疑是地上霜。
举头望明月，
低头思故乡。

输出类似10000L的字符串类型

>>> print(format(math.pi,'6%'))
314.159265%
>>> print repr('hello,world!')
'hello,world!'
>>> print repr(10000L)
10000L
>>> print str('hello,world!')
hello,world!
>>> print str(10000L)
10000

Python3.x中的print 函数

到3.x时print语句没有了，取而代之的是print()函数。 Python 2.6与2.7部分地支持这种形式的print语法。在Python 2.6与Python 2.7里面，以下四种形式是等价的：

>>> print 'fish'
fish
>>> print "fish"
fish
>>> print ("fish")
fish
>>> print("fish")
fish

然而吉多有一个时光机：

>>> from __future__ import print_function
>>> print("numpy","pandas",'scipy','matplotlib',sep='_')
numpy_pandas_scipy_matplotlib

format

一些栗子

format(value [, format_spec])
本函数把值value按format_spec的格式来格式化，
然而函数解释format_spec是根据value的类型来决定的，不同的类型有不同的格式化解释。
当参数format_spec为空时，本函数等同于函数str(value)的方式。

其实本函数调用时，是把format(value, format_spec)的方式转换为
type(value).__format__(format_spec)方式来调用，
因此在value类型里就查找方法__format__()，
如果找不到此方法，就会返回异常TypeError。

其中format_spec的编写方式如下形式：

format_spec ::= [[fill]align][sign][#][0][width][,][.precision][type]

fill ::= <any character>

align ::= "<" | ">" | "=" | "^"

sign ::= "+" | "-" | " "

width ::= integer

precision ::= integer

type ::= "b"|"c"|"d"|"e"|"E"|"f"|"F"|"g"|"G"|"n"|"o"|"s"|"x"|"X"|"%"

fill是表示可以填写任何字符。

align是对齐方式，<是左对齐， >是右对齐，^是居中对齐。

sign是符号， +表示正号， -表示负号。

width是数字宽度，表示总共输出多少位数字。

precision是小数保留位数。

type是输出数字值是的表示方式，比如b是二进制表示；比如E是指数表示；比如X是十六进制表示。

>>> print(format(2918))
2918
>>> print(format(0x500,'X'))
500
>>> print(format(0x500,'x'))
500
>>> import math
>>> print(format(math.pi,'0=10'))
3.14159265359
>>> print(format(math.pi,'0=20'))
00000003.14159265359
>>> print(format(math.pi,'E'))
3.141593E+00
>>> print(format(math.pi,'e'))
3.141593e+00
>>> print(format(math.pi,'05.3'))
03.14
>>> print(format(math.pi,'5.3'))
 3.14
>>> print(format('test','<20'))
test
>>> print(format('test','>20'))
                test
>>> print(format('test','^20'))
        test
>>> print(format(math.pi,'0=+20'))
+0000003.14159265359
>>> print(format(math.pi,'0^+20'))
000+3.14159265359000

python

Python Buildin IO

file & open

两者的调用方法：

两者的区别

两者的参数说明：

返回值：

一些栗子

文件描述符

设备连接情况

迭代方法

缓存分块

截断文件

写入多选

raw_input & input

一些栗子

print

一些栗子

format

一些栗子

相关推荐