?
本文檔使用 php中文網(wǎng)手冊(cè) 發(fā)布
import "regexp/syntax"
概述
索引
包語法將正則表達(dá)式解析為解析樹并將解析樹編譯為程序。大多數(shù)正則表達(dá)式的客戶端將使用regexp包的工具(如編譯和匹配)而不是此包。
使用Perl標(biāo)志解析時(shí),所了解的正則表達(dá)式語法如下所示。通過向Parse傳遞備用標(biāo)志可以禁用部分語法。
單個(gè)字符:
. any character, possibly including newline (flag s=true)[xyz] character class[^xyz] negated character class\d Perl character class\D negated Perl character class[[:alpha:]] ASCII character class[[:^alpha:]] negated ASCII character class\pN Unicode character class (one-letter name)\p{Greek} Unicode character class\PN negated Unicode character class (one-letter name)\P{Greek} negated Unicode character class
復(fù)合語句:
xy x followed by y x|y x or y (prefer x)
重復(fù):
x* zero or more x, prefer more x+ one or more x, prefer more x? zero or one x, prefer one x{n,m} n or n+1 or ... or m x, prefer more x{n,} n or more x, prefer more x{n} exactly n x x*? zero or more x, prefer fewer x+? one or more x, prefer fewer x?? zero or one x, prefer zero x{n,m}? n or n+1 or ... or m x, prefer fewer x{n,}? n or more x, prefer fewer x{n}? exactly n x
實(shí)施限制:計(jì)數(shù)形式x {n,m},x {n,}和x {n}拒絕創(chuàng)建超過1000的最小或最大重復(fù)次數(shù)的表單。無限重復(fù)不受此限制。
分組:
(re) numbered capturing group (submatch)(?P<name>re) named & numbered capturing group (submatch)(?:re) non-capturing group(?flags) set flags within current group; non-capturing(?flags:re) set flags during re; non-capturing Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:i case-insensitive (default false)m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)s let . match \n (default false)U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
空字符串:
^ at beginning of text or line (flag m=true)$ at end of text (like \z not Perl's \Z) or line (flag m=true)\A at beginning of text \b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)\B not at ASCII word boundary \z at end of text
轉(zhuǎn)義序列:
\a bell (== \007)\f form feed (== \014)\t horizontal tab (== \011)\n newline (== \012)\r carriage return (== \015)\v vertical tab character (== \013)\* literal *, for any punctuation character *\123 octal character code (up to three digits)\x7F hex character code (exactly two digits)\x{10FFFF} hex character code \Q...\E literal text ... even if ... has punctuation
字符類元素:
x single character A-Z character range (inclusive)\d Perl character class[:foo:] ASCII character class foo\p{Foo} Unicode character class Foo\pF Unicode character class F (one-letter name)
將字符類命名為字符類元素:
[\d] digits (== \d)[^\d] not digits (== \D)[\D] not digits (== \D)[^\D] not not digits (== \d)[[:name:]] named ASCII class inside character class (== [:name:])[^[:name:]] named ASCII class inside negated character class (== [:^name:])[\p{Name}] named Unicode property inside character class (== \p{Name})[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
Perl字符類(全部為ASCII):
\d digits (== [0-9])\D not digits (== [^0-9])\s whitespace (== [\t\n\f\r ])\S not whitespace (== [^\t\n\f\r ])\w word characters (== [0-9A-Za-z_])\W not word characters (== [^0-9A-Za-z_])
ASCII字符類:
[[:alnum:]] alphanumeric (== [0-9A-Za-z])[[:alpha:]] alphabetic (== [A-Za-z])[[:ascii:]] ASCII (== [\x00-\x7F])[[:blank:]] blank (== [\t ])[[:cntrl:]] control (== [\x00-\x1F\x7F])[[:digit:]] digits (== [0-9])[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) [[:lower:]] lower case (== [a-z]) [[:print:]] printable (== [ -~] == [ [:graph:]]) [[:punct:]] punctuation (== [!-/:-@[-`{-~])[[:space:]] whitespace (== [\t\n\v\f\r ])[[:upper:]] upper case (== [A-Z])[[:word:]] word characters (== [0-9A-Za-z_])[[:xdigit:]] hex digit (== [0-9A-Fa-f])
func IsWordChar(r rune) bool
type EmptyOp
func EmptyOpContext(r1, r2 rune) EmptyOp
type Error
func (e *Error) Error() string
type ErrorCode
func (e ErrorCode) String() string
type Flags
type Inst
func (i *Inst) MatchEmptyWidth(before rune, after rune) bool
func (i *Inst) MatchRune(r rune) bool
func (i *Inst) MatchRunePos(r rune) int
func (i *Inst) String() string
type InstOp
func (i InstOp) String() string
type Op
type Prog
func Compile(re *Regexp) (*Prog, error)
func (p *Prog) Prefix() (prefix string, complete bool)
func (p *Prog) StartCond() EmptyOp
func (p *Prog) String() string
type Regexp
func Parse(s string, flags Flags) (*Regexp, error)
func (re *Regexp) CapNames() []string
func (x *Regexp) Equal(y *Regexp) bool
func (re *Regexp) MaxCap() int
func (re *Regexp) Simplify() *Regexp
func (re *Regexp) String() string
compile.go doc.go parse.go perl_groups.go prog.go regexp.go simplify.go
func IsWordChar(r rune) bool
IsWordChar在評(píng)估\ b和\ B在零寬度報(bào)告中r是否被認(rèn)為是“單詞字符”。這些斷言僅為ASCII:?jiǎn)卧~字符為A-Za-z0-9_。
EmptyOp指定一種或多種零寬度斷言的混合。
type EmptyOp uint8
const ( EmptyBeginLine EmptyOp = 1 << iota EmptyEndLine EmptyBeginText EmptyEndText EmptyWordBoundary EmptyNoWordBoundary)
func EmptyOpContext(r1, r2 rune) EmptyOp
EmptyOpContext返回在符號(hào)r1和r2之間的位置滿足的零寬度斷言。傳遞r1 == -1表示該位置在文本的開頭。傳遞r2 == -1表示位置在文本的末尾。
錯(cuò)誤描述了解析正則表達(dá)式失敗并給出違規(guī)表達(dá)式。
type Error struct { Code ErrorCode Expr string}
func (e *Error) Error() string
ErrorCode描述了解析正則表達(dá)式的失敗。
type ErrorCode string
const ( // Unexpected error ErrInternalError ErrorCode = "regexp/syntax: internal error" // Parse errors ErrInvalidCharClass ErrorCode = "invalid character class" ErrInvalidCharRange ErrorCode = "invalid character class range" ErrInvalidEscape ErrorCode = "invalid escape sequence" ErrInvalidNamedCapture ErrorCode = "invalid named capture" ErrInvalidPerlOp ErrorCode = "invalid or unsupported Perl syntax" ErrInvalidRepeatOp ErrorCode = "invalid nested repetition operator" ErrInvalidRepeatSize ErrorCode = "invalid repeat count" ErrInvalidUTF8 ErrorCode = "invalid UTF-8" ErrMissingBracket ErrorCode = "missing closing ]" ErrMissingParen ErrorCode = "missing closing )" ErrMissingRepeatArgument ErrorCode = "missing argument to repetition operator" ErrTrailingBackslash ErrorCode = "trailing backslash at end of expression" ErrUnexpectedParen ErrorCode = "unexpected )")
func (e ErrorCode) String() string
標(biāo)志控制解析器的行為并記錄關(guān)于正則表達(dá)式上下文的信息。
type Flags uint16
const ( FoldCase Flags = 1 << iota // case-insensitive match Literal // treat pattern as literal string ClassNL // allow character classes like [^a-z] and [[:space:]] to match newline DotNL // allow . to match newline OneLine // treat ^ and $ as only matching at beginning and end of text NonGreedy // make repetition operators default to non-greedy PerlX // allow Perl extensions UnicodeGroups // allow \p{Han}, \P{Han} for Unicode group and negation WasDollar // regexp OpEndText was $, not \z Simple // regexp contains no counted repetition MatchNL = ClassNL | DotNL Perl = ClassNL | OneLine | PerlX | UnicodeGroups // as close to Perl as possible POSIX Flags = 0 // POSIX syntax)
Inst是正則表達(dá)式程序中的單個(gè)指令。
type Inst struct { Op InstOp Out uint32 // all but InstMatch, InstFail Arg uint32 // InstAlt, InstAltMatch, InstCapture, InstEmptyWidth Rune []rune}
func (i *Inst) MatchEmptyWidth(before rune, after rune) bool
MatchEmptyWidth報(bào)告指令是否匹配符文之前和之后的空字符串。只應(yīng)在i.Op == InstEmptyWidth時(shí)調(diào)用它。
func (i *Inst) MatchRune(r rune) bool
MatchRune報(bào)告指令是否匹配(并消耗)r。它應(yīng)該只在i.Op == InstRune時(shí)被調(diào)用。
func (i *Inst) MatchRunePos(r rune) int
MatchRunePos檢查指令是否匹配(并消耗)r。如果是這樣,MatchRunePos返回匹配符文對(duì)的索引(或者,當(dāng)len(i.Rune)== 1時(shí),符文單例)。如果不是,則MatchRunePos返回-1。MatchRunePos只應(yīng)在i.Op == InstRune時(shí)調(diào)用。
func (i *Inst) String() string
InstOp是一個(gè)指令操作碼。
type InstOp uint8
const ( InstAlt InstOp = iota InstAltMatch InstCapture InstEmptyWidth InstMatch InstFail InstNop InstRune InstRune1 InstRuneAny InstRuneAnyNotNL)
func (i InstOp) String() string
Op是單一的正則表達(dá)式運(yùn)算符。
type Op uint8
const ( OpNoMatch Op = 1 + iota // matches no strings OpEmptyMatch // matches empty string OpLiteral // matches Runes sequence OpCharClass // matches Runes interpreted as range pair list OpAnyCharNotNL // matches any character except newline OpAnyChar // matches any character OpBeginLine // matches empty string at beginning of line OpEndLine // matches empty string at end of line OpBeginText // matches empty string at beginning of text OpEndText // matches empty string at end of text OpWordBoundary // matches word boundary `\b` OpNoWordBoundary // matches word non-boundary `\B` OpCapture // capturing subexpression with index Cap, optional name Name OpStar // matches Sub[0] zero or more times OpPlus // matches Sub[0] one or more times OpQuest // matches Sub[0] zero or one times OpRepeat // matches Sub[0] at least Min times, at most Max (Max == -1 is no limit) OpConcat // matches concatenation of Subs OpAlternate // matches alternation of Subs)
Prog是編譯的正則表達(dá)式程序。
type Prog struct { Inst []Inst Start int // index of start instruction NumCap int // number of InstCapture insts in re}
func Compile(re *Regexp) (*Prog, error)
編譯將regexp編譯成要執(zhí)行的程序。正則表達(dá)式應(yīng)該已經(jīng)被簡(jiǎn)化了(從re.Simplify返回)。
func (p *Prog) Prefix() (prefix string, complete bool)
前綴返回所有匹配的正則表達(dá)式必須以字符串開頭的文字字符串。如果前綴是整個(gè)匹配,則結(jié)果為真。
func (p *Prog) StartCond() EmptyOp
StartCond返回在任何匹配中必須為true的前導(dǎo)空白條件。如果不可能匹配,它返回^ EmptyOp(0)。
func (p *Prog) String() string
正則表達(dá)式是正則表達(dá)式語法樹中的一個(gè)節(jié)點(diǎn)。
type Regexp struct { Op Op // operator Flags Flags Sub []*Regexp // subexpressions, if any Sub0 [1]*Regexp // storage for short Sub Rune []rune // matched runes, for OpLiteral, OpCharClass Rune0 [2]rune // storage for short Rune Min, Max int // min, max for OpRepeat Cap int // capturing index, for OpCapture Name string // capturing name, for OpCapture}
func Parse(s string, flags Flags) (*Regexp, error)
解析由指定標(biāo)志控制的正則表達(dá)式字符串s,并返回正則表達(dá)式解析樹。該語法在頂級(jí)注釋中進(jìn)行了描述。
func (re *Regexp) CapNames() []string
CapNames使用正則表達(dá)式查找捕獲組的名稱。
func (x *Regexp) Equal(y *Regexp) bool
如果x和y具有相同的結(jié)構(gòu),則相等返回true。
func (re *Regexp) MaxCap() int
MaxCap使用正則表達(dá)式查找最大捕獲索引。
func (re *Regexp) Simplify() *Regexp
簡(jiǎn)化返回相當(dāng)于re的regexp,但不需要重復(fù)計(jì)算和其他各種簡(jiǎn)化,例如重寫/(?: a +)+ / to / a + /。生成的正則表達(dá)式將正確執(zhí)行,但其字符串表示形式不會(huì)生成相同的分析樹,因?yàn)椴东@的括號(hào)可能已被復(fù)制或刪除。例如,/(x){1,2} /的簡(jiǎn)化形式是/(x)(x)?/但兩個(gè)圓括號(hào)都捕獲為$ 1。返回的正則表達(dá)式可能與原始結(jié)構(gòu)共享或成為原始結(jié)構(gòu)。
func (re *Regexp) String() string