regexprops.texi 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701
  1. @c Copyright (C) 1994--2021 Free Software Foundation, Inc.
  2. @c
  3. @c Permission is granted to copy, distribute and/or modify this document
  4. @c under the terms of the GNU Free Documentation License, Version 1.3 or
  5. @c any later version published by the Free Software Foundation; with no
  6. @c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  7. @c A copy of the license is included in the ``GNU Free
  8. @c Documentation License'' file as part of this distribution.
  9. @c this regular expression description is for: findutils
  10. @menu
  11. * findutils-default regular expression syntax::
  12. * emacs regular expression syntax::
  13. * gnu-awk regular expression syntax::
  14. * grep regular expression syntax::
  15. * posix-awk regular expression syntax::
  16. * awk regular expression syntax::
  17. * posix-basic regular expression syntax::
  18. * posix-egrep regular expression syntax::
  19. * egrep regular expression syntax::
  20. * posix-extended regular expression syntax::
  21. @end menu
  22. @node findutils-default regular expression syntax
  23. @subsection @samp{findutils-default} regular expression syntax
  24. The character @samp{.} matches any single character.
  25. @table @samp
  26. @item +
  27. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  28. @item ?
  29. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  30. @item \+
  31. matches a @samp{+}
  32. @item \?
  33. matches a @samp{?}.
  34. @end table
  35. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
  36. GNU extensions are supported:
  37. @enumerate
  38. @item @samp{\w} matches a character within a word
  39. @item @samp{\W} matches a character which is not within a word
  40. @item @samp{\<} matches the beginning of a word
  41. @item @samp{\>} matches the end of a word
  42. @item @samp{\b} matches a word boundary
  43. @item @samp{\B} matches characters which are not a word boundary
  44. @item @samp{\`} matches the beginning of the whole input
  45. @item @samp{\'} matches the end of the whole input
  46. @end enumerate
  47. Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
  48. The alternation operator is @samp{\|}.
  49. The character @samp{^} only represents the beginning of a string when it appears:
  50. @enumerate
  51. @item At the beginning of a regular expression
  52. @item After an open-group, signified by @samp{\(}
  53. @item After the alternation operator @samp{\|}
  54. @end enumerate
  55. The character @samp{$} only represents the end of a string when it appears:
  56. @enumerate
  57. @item At the end of a regular expression
  58. @item Before a close-group, signified by @samp{\)}
  59. @item Before the alternation operator @samp{\|}
  60. @end enumerate
  61. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
  62. @enumerate
  63. @item At the beginning of a regular expression
  64. @item After an open-group, signified by @samp{\(}
  65. @item After the alternation operator @samp{\|}
  66. @end enumerate
  67. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  68. @node emacs regular expression syntax
  69. @subsection @samp{emacs} regular expression syntax
  70. The character @samp{.} matches any single character except newline.
  71. @table @samp
  72. @item +
  73. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  74. @item ?
  75. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  76. @item \+
  77. matches a @samp{+}
  78. @item \?
  79. matches a @samp{?}.
  80. @end table
  81. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}.
  82. GNU extensions are supported:
  83. @enumerate
  84. @item @samp{\w} matches a character within a word
  85. @item @samp{\W} matches a character which is not within a word
  86. @item @samp{\<} matches the beginning of a word
  87. @item @samp{\>} matches the end of a word
  88. @item @samp{\b} matches a word boundary
  89. @item @samp{\B} matches characters which are not a word boundary
  90. @item @samp{\`} matches the beginning of the whole input
  91. @item @samp{\'} matches the end of the whole input
  92. @end enumerate
  93. Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
  94. The alternation operator is @samp{\|}.
  95. The character @samp{^} only represents the beginning of a string when it appears:
  96. @enumerate
  97. @item At the beginning of a regular expression
  98. @item After an open-group, signified by @samp{\(}
  99. @item After the alternation operator @samp{\|}
  100. @end enumerate
  101. The character @samp{$} only represents the end of a string when it appears:
  102. @enumerate
  103. @item At the end of a regular expression
  104. @item Before a close-group, signified by @samp{\)}
  105. @item Before the alternation operator @samp{\|}
  106. @end enumerate
  107. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
  108. @enumerate
  109. @item At the beginning of a regular expression
  110. @item After an open-group, signified by @samp{\(}
  111. @item After the alternation operator @samp{\|}
  112. @end enumerate
  113. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  114. @node gnu-awk regular expression syntax
  115. @subsection @samp{gnu-awk} regular expression syntax
  116. The character @samp{.} matches any single character.
  117. @table @samp
  118. @item +
  119. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  120. @item ?
  121. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  122. @item \+
  123. matches a @samp{+}
  124. @item \?
  125. matches a @samp{?}.
  126. @end table
  127. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  128. GNU extensions are supported:
  129. @enumerate
  130. @item @samp{\w} matches a character within a word
  131. @item @samp{\W} matches a character which is not within a word
  132. @item @samp{\<} matches the beginning of a word
  133. @item @samp{\>} matches the end of a word
  134. @item @samp{\b} matches a word boundary
  135. @item @samp{\B} matches characters which are not a word boundary
  136. @item @samp{\`} matches the beginning of the whole input
  137. @item @samp{\'} matches the end of the whole input
  138. @end enumerate
  139. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
  140. The alternation operator is @samp{|}.
  141. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
  142. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
  143. @enumerate
  144. @item At the beginning of a regular expression
  145. @item After an open-group, signified by @samp{(}
  146. @item After the alternation operator @samp{|}
  147. @end enumerate
  148. Intervals are specified by @samp{@{} and @samp{@}}.
  149. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
  150. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  151. @node grep regular expression syntax
  152. @subsection @samp{grep} regular expression syntax
  153. The character @samp{.} matches any single character.
  154. @table @samp
  155. @item \+
  156. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  157. @item \?
  158. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  159. @item + and ?
  160. match themselves.
  161. @end table
  162. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  163. GNU extensions are supported:
  164. @enumerate
  165. @item @samp{\w} matches a character within a word
  166. @item @samp{\W} matches a character which is not within a word
  167. @item @samp{\<} matches the beginning of a word
  168. @item @samp{\>} matches the end of a word
  169. @item @samp{\b} matches a word boundary
  170. @item @samp{\B} matches characters which are not a word boundary
  171. @item @samp{\`} matches the beginning of the whole input
  172. @item @samp{\'} matches the end of the whole input
  173. @end enumerate
  174. Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
  175. The alternation operator is @samp{\|}.
  176. The character @samp{^} only represents the beginning of a string when it appears:
  177. @enumerate
  178. @item At the beginning of a regular expression
  179. @item After an open-group, signified by @samp{\(}
  180. @item After a newline
  181. @item After the alternation operator @samp{\|}
  182. @end enumerate
  183. The character @samp{$} only represents the end of a string when it appears:
  184. @enumerate
  185. @item At the end of a regular expression
  186. @item Before a close-group, signified by @samp{\)}
  187. @item Before a newline
  188. @item Before the alternation operator @samp{\|}
  189. @end enumerate
  190. @samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
  191. @enumerate
  192. @item At the beginning of a regular expression
  193. @item After an open-group, signified by @samp{\(}
  194. @item After a newline
  195. @item After the alternation operator @samp{\|}
  196. @end enumerate
  197. Intervals are specified by @samp{\@{} and @samp{\@}}.
  198. Invalid intervals such as @samp{a\@{1z} are not accepted.
  199. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  200. @node posix-awk regular expression syntax
  201. @subsection @samp{posix-awk} regular expression syntax
  202. The character @samp{.} matches any single character except the null character.
  203. @table @samp
  204. @item +
  205. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  206. @item ?
  207. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  208. @item \+
  209. matches a @samp{+}
  210. @item \?
  211. matches a @samp{?}.
  212. @end table
  213. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  214. GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
  215. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
  216. The alternation operator is @samp{|}.
  217. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
  218. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
  219. @enumerate
  220. @item At the beginning of a regular expression
  221. @item After an open-group, signified by @samp{(}
  222. @item After the alternation operator @samp{|}
  223. @end enumerate
  224. Intervals are specified by @samp{@{} and @samp{@}}.
  225. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
  226. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  227. @node awk regular expression syntax
  228. @subsection @samp{awk} regular expression syntax
  229. The character @samp{.} matches any single character except the null character.
  230. @table @samp
  231. @item +
  232. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  233. @item ?
  234. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  235. @item \+
  236. matches a @samp{+}
  237. @item \?
  238. matches a @samp{?}.
  239. @end table
  240. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  241. GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively.
  242. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit.
  243. The alternation operator is @samp{|}.
  244. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
  245. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except:
  246. @enumerate
  247. @item At the beginning of a regular expression
  248. @item After an open-group, signified by @samp{(}
  249. @item After the alternation operator @samp{|}
  250. @end enumerate
  251. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  252. @node posix-basic regular expression syntax
  253. @subsection @samp{posix-basic} regular expression syntax
  254. The character @samp{.} matches any single character except the null character.
  255. @table @samp
  256. @item \+
  257. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  258. @item \?
  259. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  260. @item + and ?
  261. match themselves.
  262. @end table
  263. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  264. GNU extensions are supported:
  265. @enumerate
  266. @item @samp{\w} matches a character within a word
  267. @item @samp{\W} matches a character which is not within a word
  268. @item @samp{\<} matches the beginning of a word
  269. @item @samp{\>} matches the end of a word
  270. @item @samp{\b} matches a word boundary
  271. @item @samp{\B} matches characters which are not a word boundary
  272. @item @samp{\`} matches the beginning of the whole input
  273. @item @samp{\'} matches the end of the whole input
  274. @end enumerate
  275. Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}.
  276. The alternation operator is @samp{\|}.
  277. The character @samp{^} only represents the beginning of a string when it appears:
  278. @enumerate
  279. @item At the beginning of a regular expression
  280. @item After an open-group, signified by @samp{\(}
  281. @item After the alternation operator @samp{\|}
  282. @end enumerate
  283. The character @samp{$} only represents the end of a string when it appears:
  284. @enumerate
  285. @item At the end of a regular expression
  286. @item Before a close-group, signified by @samp{\)}
  287. @item Before the alternation operator @samp{\|}
  288. @end enumerate
  289. @samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except:
  290. @enumerate
  291. @item At the beginning of a regular expression
  292. @item After an open-group, signified by @samp{\(}
  293. @item After the alternation operator @samp{\|}
  294. @end enumerate
  295. Intervals are specified by @samp{\@{} and @samp{\@}}.
  296. Invalid intervals such as @samp{a\@{1z} are not accepted.
  297. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  298. @node posix-egrep regular expression syntax
  299. @subsection @samp{posix-egrep} regular expression syntax
  300. The character @samp{.} matches any single character.
  301. @table @samp
  302. @item +
  303. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  304. @item ?
  305. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  306. @item \+
  307. matches a @samp{+}
  308. @item \?
  309. matches a @samp{?}.
  310. @end table
  311. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  312. GNU extensions are supported:
  313. @enumerate
  314. @item @samp{\w} matches a character within a word
  315. @item @samp{\W} matches a character which is not within a word
  316. @item @samp{\<} matches the beginning of a word
  317. @item @samp{\>} matches the end of a word
  318. @item @samp{\b} matches a word boundary
  319. @item @samp{\B} matches characters which are not a word boundary
  320. @item @samp{\`} matches the beginning of the whole input
  321. @item @samp{\'} matches the end of the whole input
  322. @end enumerate
  323. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
  324. The alternation operator is @samp{|}.
  325. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
  326. The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression.
  327. Intervals are specified by @samp{@{} and @samp{@}}.
  328. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1}
  329. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.
  330. @node egrep regular expression syntax
  331. @subsection @samp{egrep} regular expression syntax
  332. This is a synonym for posix-egrep.
  333. @node posix-extended regular expression syntax
  334. @subsection @samp{posix-extended} regular expression syntax
  335. The character @samp{.} matches any single character except the null character.
  336. @table @samp
  337. @item +
  338. indicates that the regular expression should match one or more occurrences of the previous atom or regexp.
  339. @item ?
  340. indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.
  341. @item \+
  342. matches a @samp{+}
  343. @item \?
  344. matches a @samp{?}.
  345. @end table
  346. Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit.
  347. GNU extensions are supported:
  348. @enumerate
  349. @item @samp{\w} matches a character within a word
  350. @item @samp{\W} matches a character which is not within a word
  351. @item @samp{\<} matches the beginning of a word
  352. @item @samp{\>} matches the end of a word
  353. @item @samp{\b} matches a word boundary
  354. @item @samp{\B} matches characters which are not a word boundary
  355. @item @samp{\`} matches the beginning of the whole input
  356. @item @samp{\'} matches the end of the whole input
  357. @end enumerate
  358. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}.
  359. The alternation operator is @samp{|}.
  360. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified.
  361. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed:
  362. @enumerate
  363. @item At the beginning of a regular expression
  364. @item After an open-group, signified by @samp{(}
  365. @item After the alternation operator @samp{|}
  366. @end enumerate
  367. Intervals are specified by @samp{@{} and @samp{@}}.
  368. Invalid intervals such as @samp{a@{1z} are not accepted.
  369. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.