golang中bufio.SplitFunc的深入理解
åè¨
bufioæ¨¡åæ¯golangæ ååºä¸ç模åä¹ä¸ï¼ä¸»è¦æ¯å®ç°äºä¸ä¸ªè¯»åçç¼åï¼ç¨äºå¯¹æ°æ®ç读åæèå奿ä½ã该模åå¨å¤ä¸ªæ¶åioçæ ååºä¸è¢«ä½¿ç¨ï¼æ¯å¦http模åä¸ä½¿ç¨buffioæ¥å®æç½ç»æ°æ®ç读åï¼å缩æä»¶çzip模åå©ç¨bufioæ¥æä½æä»¶æ°æ®ç读åçã
golangçbufioåéé¢å®ä»¥çSplitFuncæ¯ä¸ä¸ªæ¯è¾éè¦ä¹æ¯è¾é¾ä»¥çè§£çä¸è¥¿ï¼æ¬æå¸æéè¿ç»åç®åçå®ä¾ä»ç»SplitFuncçå·¥ä½åç以åå¦ä½å®ç°ä¸ä¸ªèªå·±çSplitFuncã
ä¸ä¸ªä¾å
å¨bufioåéé¢å®ä¹äºä¸äºå¸¸ç¨ç工巿¯å¦Scanner,ä½ å¯è½éè¦è¯»åç¨æ·å¨æ åè¾å¥éé¢è¾å¥çä¸äºä¸è¥¿ï¼æ¯å¦æä»¬åä¸ä¸ªå¤è¯»æºï¼è¯»åç¨æ·çæ¯ä¸è¡è¾å¥ï¼ç¶åæå°åºæ¥ï¼
package main import ( "bufio" "fmt" "os" ) func main() { scanner := bufio.NewScanner(os.Stdin) scanner.Split(bufio.ScanLines) for scanner.Scan() { fmt.Println(scanner.Text()) } }
è¿ä¸ªç¨åºå¾ç®åï¼os.Stdinå®ç°äºio.Readeræ¥å£ï¼æä»¬ä»è¿ä¸ªreaderå建äºä¸ä¸ªscanner,设置åå²å½æ°ä¸ºbufio.ScanLinesï¼ç¶åfor循ç¯ï¼æ¯æ¬¡è¯»å°ä¸è¡æ°æ®å°±å°ææ¬å容æå°åºæ¥ã麻éè½å°äºè俱å¨ï¼è¿ä¸ªå°ç¨åºè½ç¶ç®åï¼å´å¼åºäºæä»¬ä»å¤©è¦ä»ç»ç对象: bufio.SplitFunc,å®çå®ä¹æ¯è¿ä¸ªæ ·åçï¼
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
golang宿¹ææ¡£çæè¿°æ¯è¿ä¸ªæ ·åçï¼
SplitFunc is the signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the Reader has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, if any, plus an error, if any.
Scanning stops if the function returns an error, in which case some of the input may be discarded.
Otherwise, the Scanner advances the input. If the token is not nil, the Scanner returns it to the user. If the token is nil, the Scanner reads more data and continues scanning; if there is no more data--if atEOF was true--the Scanner returns. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, a SplitFunc can return (0, nil, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.
è±æï¼åæ°è¿ä¹å¤ï¼è¿åå¼è¿ä¹å¤ï¼å¥½ç¦ï¼ä¸ç¥éåä½è¯»èéå°è¿ç§ææ¡£ä¼ä¸ä¼æè¿ç§æè§...æ£å¼ç±äºè¿ç§æåµï¼ææå³å®åä¸ç¯æç« ä»ç»ä¸ä¸SplitFuncçå·ä½å·¥ä½åçï¼ç¨ä¸ç§éä¿çæ¹å¼ç»åå·ä½å®ä¾å 以说æï¼å¸æå¯¹è¯»èææå¸®å©ã
好äºï¼åºè¯å°è¯´ï¼å¼å§æ£é¢å§ï¼
ScanneråSplitFuncç工使ºå¶
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Scanneræ¯æç¼åçï¼æææ¯Scanneråºå±ç»´æ¤äºä¸ä¸ªSliceç¨æ¥ä¿åå·²ç»ä»Readerä¸è¯»åçæ°æ®ï¼Scannerä¼è°ç¨æä»¬è®¾ç½®SplitFuncï¼å°ç¼å²åºå容(data)忝å¦å·²ç»è¾å¥å®äº(atEOF)以忰çå½¢å¼ä¼ éç»SplitFuncï¼èSplitFuncçèè´£å°±æ¯æ ¹æ®ä¸è¿°çä¸¤ä¸ªåæ°è¿åä¸ä¸æ¬¡Scanéè¦åè¿å 个åè(advance)ï¼åå²åºæ¥çæ°æ®(token)ï¼ä»¥åé误(err)ã
è¿æ¯ä¸ä¸ªéä¿¡ååçè¿ç¨ï¼Scanneråè¯æä»¬çSplitFuncå·²ç»æ«æå°çæ°æ®åæ¯å¦å°ç»å°¾äºï¼æä»¬çSplitFuncåæ ¹æ®è¿äºä¿¡æ¯å°åå²çç»æè¿åå䏿¬¡æ«æéè¦åè¿çä½ç½®è¿åç»Scannerãç¨ä¸ä¸ªä¾åæ¥è¯´æï¼
package main import ( "bufio" "fmt" "strings" ) func main() { input := "abcdefghijkl" scanner := bufio.NewScanner(strings.NewReader(input)) split := func(data []byte, atEOF bool) (advance int, token []byte, err error) { fmt.Printf("%t\t%d\t%s\n", atEOF, len(data), data) return 0, nil, nil } scanner.Split(split) buf := make([]byte, 2) scanner.Buffer(buf, bufio.MaxScanTokenSize) for scanner.Scan() { fmt.Printf("%s\n", scanner.Text()) } }
è¾åº
false 2 ab
false 4 abcd
false 8 abcdefgh
false 12 abcdefghijkl
true 12 abcdefghijkl
è¿éæä»¬æç¼å²åºçåå§å¤§å°è®¾ç½®ä¸ºäº2ï¼ä¸å¤çæ¶å伿©å±ä¸ºåæ¥ç2åï¼æå¤§ä¸ºbufio.MaxScanTokenSize,è¿æ ·ä¸å¼å§æ«æ2个åèï¼æä»¬çç¼å²åºå°±æ»¡äºï¼readerçåå®¹è¿æ²¡æè¯»åå°EOFï¼ç¶åsplit彿°æ§è¡ï¼è¾åº:
false 2 ab
ç´§æ¥ç彿°è¿å 0, nil, nilè¿ä¸ªè¿åå¼åè¯Scanneræ°æ®ä¸å¤ï¼ä¸æ¬¡è¯»åçä½ç½®åè¿0ä½ï¼éè¦ç»§ç»ä»readeréé¢è¯»å,æ¤æ¶å 为ç¼å²åºæ»¡äºï¼æä»¥å®¹éæ©å±ä¸º2 * 2 = 4ï¼readerçåå®¹è¿æ²¡æè¯»åå°EOFï¼è¾åº
false 4 abcd
éå¤ä¸è¿°æ¥éª¤ï¼ä¸ç´å°æåå¨é¨å容读åå®äºï¼EOFæ¤æ¶åæäºtrue
true 12 abcdefghijkl
çäºä¸é¢çè¿ç¨æ¯ä¸æ¯å¯¹SplitFuncçå·¥ä½åæ¥æäºä¸ç¹çè§£äºå¢ï¼åå头çä¸ä¸golangç宿¹ææ¡£ææ²¡æè§å¾ç¨å¾®çè§£äºä¸ç¹?ä¸é¢æ¯bufio.ScanLinesçå®ç°ï¼è¯»èå¯ä»¥èªå·±ç ç©¶ä¸ä¸è¯¥å½æ°æ¯å¦ä½å·¥ä½ç
æ ååºéçScanLines
func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) { // 表示æä»¬å·²ç»æ«æå°ç»å°¾äº if atEOF && len(data) == 0 { return 0, nil, nil } // æ¾å°\nçä½ç½® if i := bytes.IndexByte(data, '\n'); i >= 0 { // æä¸æ¬¡å¼å§è¯»åçä½ç½®ååç§»å¨i + 1ä½ return i + 1, dropCR(data[0:i]), nil } // è¿éå¤ççreaderå容å¨é¨è¯»åå®äºï¼ä½æ¯å容ä¸ä¸ºç©ºï¼æä»¥éè¦æå©ä½çæ°æ®è¿å if atEOF { return len(data), dropCR(data), nil } // 表示ç°å¨ä¸è½åå²ï¼åReaderè¯·æ±æ´å¤çæ°æ® return 0, nil, nil }
åè
In-depth introduction to bufio.Scanner in Golang
æ»ç»
以ä¸å°±æ¯è¿ç¯æç« çå¨é¨å容äºï¼å¸ææ¬æçå容对大家çå¦ä¹ æèå·¥ä½å·æä¸å®çåèå¦ä¹ ä»·å¼ï¼å¦ææçé®å¤§å®¶å¯ä»¥çè¨äº¤æµï¼è°¢è°¢å¤§å®¶å¯¹èæ¬ä¹å®¶çæ¯æã