golang中bufio.SplitFunc的深入理解
åè¨
bufio模åæ¯golangæ ååºä¸ç模åä¹ä¸ï¼ä¸»è¦æ¯å®ç°äºä¸ä¸ªè¯»åçç¼åï¼ç¨äºå¯¹æ°æ®ç读åæèåå¥æä½ã该模åå¨å¤ä¸ªæ¶åioçæ ååºä¸è¢«ä½¿ç¨ï¼æ¯å¦http模åä¸ä½¿ç¨buffioæ¥å®æç½ç»æ°æ®ç读åï¼å缩æ件çzip模åå©ç¨bufioæ¥æä½æ件æ°æ®ç读åçã
golangçbufioåéé¢å®ä»¥çSplitFuncæ¯ä¸ä¸ªæ¯è¾éè¦ä¹æ¯è¾é¾ä»¥ç解çä¸è¥¿ï¼æ¬æå¸æéè¿ç»åç®åçå®ä¾ä»ç»SplitFuncçå·¥ä½åç以åå¦ä½å®ç°ä¸ä¸ªèªå·±çSplitFuncã
ä¸ä¸ªä¾å
å¨bufioåéé¢å®ä¹äºä¸äºå¸¸ç¨çå·¥å·æ¯å¦Scanner,ä½ å¯è½éè¦è¯»åç¨æ·å¨æ åè¾å¥éé¢è¾å¥çä¸äºä¸è¥¿ï¼æ¯å¦æ们åä¸ä¸ªå¤è¯»æºï¼è¯»åç¨æ·çæ¯ä¸è¡è¾å¥ï¼ç¶åæå°åºæ¥ï¼
package main import ( "bufio" "fmt" "os" ) func main() { scanner := bufio.NewScanner(os.Stdin) scanner.Split(bufio.ScanLines) for scanner.Scan() { fmt.Println(scanner.Text()) } }
è¿ä¸ªç¨åºå¾ç®åï¼os.Stdinå®ç°äºio.Readeræ¥å£ï¼æ们ä»è¿ä¸ªreaderå建äºä¸ä¸ªscanner,设置åå²å½æ°ä¸ºbufio.ScanLinesï¼ç¶åfor循ç¯ï¼æ¯æ¬¡è¯»å°ä¸è¡æ°æ®å°±å°ææ¬å容æå°åºæ¥ã麻éè½å°äºè俱å¨ï¼è¿ä¸ªå°ç¨åºè½ç¶ç®åï¼å´å¼åºäºæ们ä»å¤©è¦ä»ç»ç对象: bufio.SplitFunc,å®çå®ä¹æ¯è¿ä¸ªæ ·åçï¼
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
golangå®æ¹ææ¡£çæè¿°æ¯è¿ä¸ªæ ·åçï¼
SplitFunc is the signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the Reader has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, if any, plus an error, if any.
Scanning stops if the function returns an error, in which case some of the input may be discarded.
Otherwise, the Scanner advances the input. If the token is not nil, the Scanner returns it to the user. If the token is nil, the Scanner reads more data and continues scanning; if there is no more data--if atEOF was true--the Scanner returns. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, a SplitFunc can return (0, nil, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.
è±æï¼åæ°è¿ä¹å¤ï¼è¿åå¼è¿ä¹å¤ï¼å¥½ç¦ï¼ä¸ç¥éåä½è¯»èéå°è¿ç§ææ¡£ä¼ä¸ä¼æè¿ç§æè§...æ£å¼ç±äºè¿ç§æåµï¼ææå³å®åä¸ç¯æç« ä»ç»ä¸ä¸SplitFuncçå·ä½å·¥ä½åçï¼ç¨ä¸ç§éä¿çæ¹å¼ç»åå·ä½å®ä¾å 以说æï¼å¸æ对读èææ帮å©ã
好äºï¼åºè¯å°è¯´ï¼å¼å§æ£é¢å§ï¼
ScanneråSplitFuncçå·¥ä½æºå¶
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Scanneræ¯æç¼åçï¼æææ¯Scanneråºå±ç»´æ¤äºä¸ä¸ªSliceç¨æ¥ä¿åå·²ç»ä»Readerä¸è¯»åçæ°æ®ï¼Scannerä¼è°ç¨æ们设置SplitFuncï¼å°ç¼å²åºå容(data)åæ¯å¦å·²ç»è¾å¥å®äº(atEOF)以åæ°çå½¢å¼ä¼ éç»SplitFuncï¼èSplitFuncçè责就æ¯æ ¹æ®ä¸è¿°ç两个åæ°è¿åä¸ä¸æ¬¡Scanéè¦åè¿å 个åè(advance)ï¼åå²åºæ¥çæ°æ®(token)ï¼ä»¥åé误(err)ã
è¿æ¯ä¸ä¸ªéä¿¡ååçè¿ç¨ï¼Scanneråè¯æ们çSplitFuncå·²ç»æ«æå°çæ°æ®åæ¯å¦å°ç»å°¾äºï¼æ们çSplitFuncåæ ¹æ®è¿äºä¿¡æ¯å°åå²çç»æè¿ååä¸æ¬¡æ«æéè¦åè¿çä½ç½®è¿åç»Scannerãç¨ä¸ä¸ªä¾åæ¥è¯´æï¼
package main import ( "bufio" "fmt" "strings" ) func main() { input := "abcdefghijkl" scanner := bufio.NewScanner(strings.NewReader(input)) split := func(data []byte, atEOF bool) (advance int, token []byte, err error) { fmt.Printf("%t\t%d\t%s\n", atEOF, len(data), data) return 0, nil, nil } scanner.Split(split) buf := make([]byte, 2) scanner.Buffer(buf, bufio.MaxScanTokenSize) for scanner.Scan() { fmt.Printf("%s\n", scanner.Text()) } }
è¾åº
false 2 ab
false 4 abcd
false 8 abcdefgh
false 12 abcdefghijkl
true 12 abcdefghijkl
è¿éæ们æç¼å²åºçåå§å¤§å°è®¾ç½®ä¸ºäº2ï¼ä¸å¤çæ¶åä¼æ©å±ä¸ºåæ¥ç2åï¼æ大为bufio.MaxScanTokenSize,è¿æ ·ä¸å¼å§æ«æ2个åèï¼æ们çç¼å²åºå°±æ»¡äºï¼readerçå容è¿æ²¡æ读åå°EOFï¼ç¶åsplitå½æ°æ§è¡ï¼è¾åº:
false 2 ab
ç´§æ¥çå½æ°è¿å 0, nil, nilè¿ä¸ªè¿åå¼åè¯Scanneræ°æ®ä¸å¤ï¼ä¸æ¬¡è¯»åçä½ç½®åè¿0ä½ï¼éè¦ç»§ç»ä»readeréé¢è¯»å,æ¤æ¶å 为ç¼å²åºæ»¡äºï¼æ以容éæ©å±ä¸º2 * 2 = 4ï¼readerçå容è¿æ²¡æ读åå°EOFï¼è¾åº
false 4 abcd
éå¤ä¸è¿°æ¥éª¤ï¼ä¸ç´å°æåå¨é¨å容读åå®äºï¼EOFæ¤æ¶åæäºtrue
true 12 abcdefghijkl
çäºä¸é¢çè¿ç¨æ¯ä¸æ¯å¯¹SplitFuncçå·¥ä½åæ¥æäºä¸ç¹ç解äºå¢ï¼åå头çä¸ä¸golangçå®æ¹ææ¡£æ没æè§å¾ç¨å¾®ç解äºä¸ç¹?ä¸é¢æ¯bufio.ScanLinesçå®ç°ï¼è¯»èå¯ä»¥èªå·±ç 究ä¸ä¸è¯¥å½æ°æ¯å¦ä½å·¥ä½ç
æ ååºéçScanLines
func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) { // 表示æ们已ç»æ«æå°ç»å°¾äº if atEOF && len(data) == 0 { return 0, nil, nil } // æ¾å°\nçä½ç½® if i := bytes.IndexByte(data, '\n'); i >= 0 { // æä¸æ¬¡å¼å§è¯»åçä½ç½®åå移å¨i + 1ä½ return i + 1, dropCR(data[0:i]), nil } // è¿éå¤ççreaderå容å¨é¨è¯»åå®äºï¼ä½æ¯å容ä¸ä¸ºç©ºï¼æ以éè¦æå©ä½çæ°æ®è¿å if atEOF { return len(data), dropCR(data), nil } // 表示ç°å¨ä¸è½åå²ï¼åReader请æ±æ´å¤çæ°æ® return 0, nil, nil }
åè
In-depth introduction to bufio.Scanner in Golang
æ»ç»
以ä¸å°±æ¯è¿ç¯æç« çå¨é¨å容äºï¼å¸ææ¬æçå容对大家çå¦ä¹ æèå·¥ä½å·æä¸å®çåèå¦ä¹ ä»·å¼ï¼å¦ææçé®å¤§å®¶å¯ä»¥çè¨äº¤æµï¼è°¢è°¢å¤§å®¶å¯¹èæ¬ä¹å®¶çæ¯æã