Talk:Regular expression/Archive 2

This is an archive of past discussions about Regular expression. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Likely misunderstanding of "Regular" by the uninitiated

Latest comment: 13 years ago1 comment1 person in discussion

It seems to me that novices to regexps. are likely to misinterpret "regular" to mean "occurring at fixed intervals", "evenly spaced", or similar concepts expressing repetitive similarity or even repetitive identity. Another misunderstanding could well be that "regular" means "commonplace" or "ordinary". Recalling the Spanish word "reglas" for "rules", it seems to me that "regular" in the context of this article means more "according to specific, defined rules" or such. I'd like to see a brief comment in the article explaining what seems to be this atypical use of the word "regular". Regards, Nikevich (talk) 21:05, 25 December 2010 (UTC)

History of the ^ and $ symbols?

Latest comment: 13 years ago1 comment1 person in discussion

Does anyone know where the usage of ^ for start of string and $ for end of string in regular expressions comes from? Did syntax using these symbols for start and end already exist before regular expressions? I know some syntax for newly invented things, such as CSS selectors, also use these symbols. I'm wondering where the usage of these symbols came from because it seems like a very weird arbitrary choice today, especially with $ looking like the S of start but meaning end instead, and there simply be no logical connection between both symbols at all, unlike for example '>' and '<'. Maybe this could be an interesting historical note. 84.253.55.210 (talk) 22:26, 12 January 2011 (UTC)

Intro Example

Latest comment: 13 years ago2 comments2 people in discussion

A quote from the intro:

As an example of the syntax, the regular expression \bex can be used to search for all instances of the string "ex" that occur after "word boundaries" (signified by the \b). Thus \bex will find the matching string "ex" in two possible locations, (1) at the beginning of words, and (2) between two characters in a string, where one is a word character and the other is not a word character.

Isn't (1) a correct description of what \bex will match (ie ex at the beginning of words) and isn't (2) slightly incorrect since the first of the two characters that the string is between must be a non word character and the second of the two characters can be anything? — Preceding unsigned comment added by 184.158.71.3 (talk) 05:32, 4 July 2011 (UTC)

Yes, you're right. I've fixed that, but the example is so confusing and convoluted for a beginner; I'll try to think up a simpler/better example. peterl (talk) 07:42, 4 July 2011 (UTC)

Fuzzy reg exp section - misnomer?

Shouldn't the section on Fuzzy be left out? If their fuzzy, then their not regular. This info is well covered in Approximate string matching. I'm thinking of removing this whole section and moving any missed info to Approximate string matching. What do others think? Have I misunderstood something? 02:57, 11 July 2011 (UTC)

flags

Latest comment: 13 years ago1 comment1 person in discussion

Hi! Serach "flags" at mozilla ... RegExp. I was spending to much time to adapt en:user:Lunchboxhero/externISBN.js to my needs. Regards ‫·‏לערי ריינהארט‏·‏T‏·‏m‏:‏Th‏·‏T‏·‏email me‏·‏‬ 18:51, 5 October 2011 (UTC)

Negative expression

Latest comment: 13 years ago1 comment1 person in discussion

I think there should be at least mentioned about negativity. As i have read in other sources it is not possible to simply inverse the result (like grep -cv or something liek that), so it should be mentioned so people would start to look for other solutions.

P6v53as (talk) 14:22, 8 November 2011 (UTC)

Meaning of "regular"

Latest comment: 12 years ago2 comments2 people in discussion

For readers having active minds, but with less or no formal education in computer science, formal logic, or higher math., the term "regular" might seem to be the antonym of "irregular". With a modest knowledge of Spanish, I came to realize that reglas, "rules" in Spanish, offered a very helpful insight. In this context, "Regular" implies "according to rules"; a naïve independent scholar might not realize this for a while. Perhaps a short paragraph explaining this might be helpful for reducing confusion and misunderstanding. Regards, Nikevich (talk) 08:20, 19 November 2011 (UTC)

In order to avoid original research, you might want to provide a citation. Of course, you can interpret "regular" as "according to the rules" – in the OED it says (among many other explanations) "Characterized by evenness, order, or harmony in physical form, structure, or organization; arranged in or constituting a constant or definite pattern." and "Characterized by the presence or operation of a definite rule or set of rules; marked or distinguished by evenness, order, or harmony in character or operation; steady or uniform in action, procedure, or occurrence."

Still, it's your (likely valid) interpretation of why the word was chosen as name for "regular sets". Even then, it's still confusing, as every thing in formal languages is according to certain rules. For example, the Turing machine, which existed before the invention of regular sets, is regular in the sense of "according to rules". --Zahnradzacken (talk) 11:52, 21 November 2011 (UTC)

Replace example?

Latest comment: 12 years ago3 comments2 people in discussion

None of the examples show replacing a string with another - which at least some implementations allow. Should we show at least one? Martin Packer (talk) 00:28, 28 January 2012 (UTC)

Rather than construct a random set of examples, it would be nicer to point to well thought out WP:RS. Likewise, there are thousands of poor examples on the net to which random IP's insist on linking TEDickey (talk) 00:34, 28 January 2012 (UTC)

I agree. My point is about coverage - these all seem to be checking for match or else doing extraction. That's not all regex's are used for: Replace seems to me part of what the examples - wherever they're sourced from - should cover. (And there may be other use categories I've not thought of.) — Preceding unsigned comment added by MartinPackerIBM (talk • contribs) 16:28, 28 January 2012 (UTC)

Move formal definition to end

Latest comment: 12 years ago1 comment1 person in discussion

I think that the formal definition should be moved to the end of the article. Most of the people accessing this article want to see some example. Therefore the article should present the examples first and the formal definition later. — Preceding unsigned comment added by 89.23.239.59 (talk) 18:19, 25 March 2012 (UTC)

Regex as C standard library

Latest comment: 12 years ago2 comments2 people in discussion

It is said in the article that: "For yet other languages, such as Object Pascal(Delphi) and C and C++, non-core libraries are available"...

But regex does actually exist as a C core library; "man regex" gives me that on BSD OS:

REGEX(3) BSD Library Functions Manual REGEX(3)

NAME

    regcomp, regerror, regexec, regfree -- regular-expression library

LIBRARY

    Standard C Library (libc, -lc)

69.80.96.81 (talk) 04:48, 28 May 2012 (UTC)

The term "core library" means whether it is part of the C language standard (or analogous POSIX standards), not whether some system happens to store the feature in libc. It is easy to find nonstandard functions in most libc's TEDickey (talk) 09:28, 28 May 2012 (UTC)

Problem with "not preceded by"?

Latest comment: 11 years ago4 comments4 people in discussion

One of the examples given is:

* the word "car" when not preceded by the word "motor"

Then it says "These examples are simple."

I don't think that's a simple example - in fact I can't think of a way to match that. Either I'm being dim (quite possible) or that example should be removed. Jj Banana (talk) 15:12, 13 February 2012 (UTC)

Doesn't "[^motor]car" work? -- — T13 ( C • M • Click to learn how to view this signature as intended ) 23:30, 17 April 2012 (UTC)

That excludes things like "rotomcar", e.g., any permutation of m/o/t/r/ TEDickey (talk) 00:12, 18 April 2012 (UTC)

You can express this using a regular expression like "([^m]otor|^otor|[^o]tor|^tor|[^t]or|^or|[^o]r|^r|[^r]|^)(car)" or "((((([^m]|^)o|[^o]|^)t|[^t]|^)o|[^o]|^)r|[^r]|^)(car)" — tedious to type, but still polynomial (the latter even linear) in length with respect to the original words ("motor" and "car"). In Perl or Java etc., you can simply write "(?<!motor)car". 90.190.113.12 (talk) 16:05, 22 February 2013 (UTC)

Citation for original popularity of regular expressions following success of 'ed' and 'grep'

Latest comment: 12 years ago2 comments2 people in discussion

The third sentence of the article is flagged {citation needed}, but I don't know how to fix this within the article.

I would like to suggest this reference:

"Mastering Regular Expressions, 2nd Edition from O'Reilly, by Jeffrey E. F. Friedl; Chapter 3 "Overview of Regular Expression Features and Flavors", page 85, under the heading "The Origins of Regular Expressions", third and fourth paragraphs:

"Although there is evidence of earlier work, the first published computational use of regular expressions I have actually been able to find is Ken Thompson's 1968 article Regular Expression Search Algorithm in which he describes a regular-expression compiler that produced IBM 7094 object code. This led to his work on qed, an editor that formed the basis for the Unix editor ed.

ed's regular expressions were not as advanced as those in qed, but they were the first to gain widespread use in non-technical fields. ed had a command to display lines of the edited file that matched a given regular expression. The command, "g/Regular Expression/p", was read "Global Regular Expression Print." This particular function was so useful that it was made into its own utility, grep (after which egrep--extended grep--was later modeled."

--hope this helps leeeoooooo [002012-06-05] — Preceding unsigned comment added by Leeeoooooo (talk • contribs) 01:12, 6 June 2012 (UTC)

Have this book too and was actually planning to use the above quote as a citation. The book is well researched and would say written by someone who know regular expression very well. So I am hoping no one would object to us using the above book as a citation. 24.212.138.15 (talk) That was me gathima (talk)

The document http://genius.cat-v.org/brian-kernighan/articles/beautiful suggests the same:

"Regular expressions first appeared in a program setting in Ken Thompson's version of the QED text editor in the mid-1960's. In 1967, Ken applied for a patent on a mechanism for rapid text matching based on regular expressions; it was granted in 1971, one of the very first software patents [US Patent 3,568,156, Text Matching Algorithm, March 2, 1971]. [...] Regular expressions moved from QED to the Unix editor ed, and then to the quintessential Unix tool, grep, which Ken created by performing radical surgery on ed."

Also, roughly the same text appears in the book "Beautiful Code: Leading Programmers Explain How They Think" (the above link is a draft).

70.82.120.78 (talk) 19:23, 15 August 2012 (UTC)

Which algorithm is working behind REGEX...

Latest comment: 11 years ago1 comment1 person in discussion

What is the pattern matching algorithm which is actually working at the ground level is it 1. KMP Matching Technique 2. Rabin Karp... Questions araised since the Text is converted internally into char array and the pattern is matched over that.... — Preceding unsigned comment added by 106.51.151.241 (talk) 15:16, 2 December 2012 (UTC)

Isn't javascript a regex core language too?

Latest comment: 11 years ago5 comments2 people in discussion

Read above (-: I've edited it already, please respond when I'm wrong.

It's been discussed before: in Javascript, while you can assign a literal pattern to a variable, that's as far as the integration goes. Here is some discussion. The actual manipulation (need WP:RS for counter examples) uses functions, which puts that outside the language syntax mentioned in the sentence. TEDickey (talk) 16:37, 6 April 2013 (UTC)

But it has regex? So it is a regex (core or not so core) language? — Preceding unsigned comment added by 86.90.98.163 (talk) 11:50, 7 April 2013 (UTC)

The sentence talks about syntax, not whether it is part of the language. If the sentence were reworded to address your point, the list would grow by an order of magnitude (and consequently would lose its point) TEDickey (talk) 13:15, 7 April 2013 (UTC)

Ok, I will delete the changes. By the way, can you close talk topics? (I am a beginner in wikipedia) — Preceding unsigned comment added by 86.90.98.163 (talk) 17:04, 7 April 2013 (UTC)

Topics stay on the talk-pages until someone (or something) clears away old topics into separate pages called archives (this page has two archives linked in the header area). TEDickey (talk) 17:17, 7 April 2013 (UTC)

"backslash-itus"

Latest comment: 11 years ago2 comments2 people in discussion

You mean, like some kind of disease? In which case it would be "-itis" (and probably not even hyphenated, like such: "backslashitis"). --Jerome Potts (talk) 16:03, 1 June 2013 (UTC)

And even so, –osis would be more fitting: it's not that the backslash itself is afflicted. —Tamfang (talk) 07:34, 19 July 2013 (UTC)

Generalization to include graph-based regular expression patterns

Latest comment: 11 years ago1 comment1 person in discussion

The article assumes that regular expressions apply only to strings. At some point we will have to generalize its content to include graph-based regular expression patterns that can find paths in graphs and entire subgraphs. See the paper: Alkhateeb, Faisal, Jean-François Baget, and Jérôme Euzenat. "Extending SPARQL with regular expression patterns (for querying RDF)." Web Semantics: Science, Services and Agents on the World Wide Web 7, no. 2 (2009). Gmelli (talk) 18:52, 4 June 2013 (UTC)

Fuzzy regex matching relation Levenshtein automata

Latest comment: 11 years ago1 comment1 person in discussion

The fuzzy matching section of main article now asks for citations, I believe topics like Levenshtein automata contain bunch of relevant research. Algorithms relevant to error tolerant traversal of finite-state network without using composition are described in e.g. Oflazer's error tolerant matching with finite-state automata, I might be able to write something up later. --Flammie (talk) 04:25, 17 July 2013 (UTC)

Cleanup needed

Latest comment: 11 years ago1 comment1 person in discussion

With all due respect to previous editors, I think the present state of this article is quite poor. I remember coming to this article several years ago, before I knew regexes, and leaving even more confused. It could almost do with a total rewrite in many places. Rather than trying to cover all possible aspects of regex syntax, it should present more information about regexes and defer to (say) Wikibooks (to which some of this content should be moved, if appropriate).

I might have a go at some stage, but the door is wide open for anyone keen for a spot of pruning and rewriting here. — This, that and the other (talk) 11:12, 19 July 2013 (UTC)

History

Latest comment: 11 years ago3 comments3 people in discussion

When did Ken Thompson build Kleene's notation into the editor QED? In other words, when were Regular Expressions first used in software? The QED page is more vague about that. Sam Tomato (talk) 20:33, 30 September 2013 (UTC)

I found this link: http://cm.bell-labs.com/who/dmr/qed.html The historical survey is written by Dennis Ritchie, a long-time colleague of Thompson. There it is noted that Thompson published the idea of converting regular expressions into IBM 7094 machine code (i.e., code corresponding to a nondeterministic finite automaton) in the CACM paper "Programming Techniques: Regular expression search algorithm", which appeared in June 1968. http://dl.acm.org/citation.cfm?doid=363347.363387

Hermel (talk) 19:45, 9 October 2013 (UTC)

Well, that makes three questions: (1) when did Thompson use regex's in QED, (2) when were regex's first used in software, and (3) does the link have anything to do with either of the first two questions (I don't see "QED" or "editor", in the paper) TEDickey (talk) 20:32, 9 October 2013 (UTC)

Thomson-kleene-star.svg is wrong

Latest comment: 10 years ago4 comments3 people in discussion

The file Thomson-kleene-star.svg is wrong. There is a transition missing between the state to the right of q, and the state to the left of f. Also this could be written as a determinalistic finite automata with two states. Using a NFA is unessesary and confusing to new readers.

Here is the offending file:

The Kleene star: "zero or more".

Rekahsoft (talk) 05:41, 30 May 2014 (UTC)

Thanks. Any chance of a fix? Or can you link to a webpage that has a correct state diagram? I have not thought about the issue, but in general you are correct that the article should have a simple and comprehensible diagram rather than some kind of optimisation. A complication is that the diagram is also used at Thompson's construction algorithm and that article is written confidently. Johnuniq (talk) 06:28, 30 May 2014 (UTC)

I agree the picture is difficult to understand; its caption is too brief, and I couldn't find an explaining reference to it in the running text. However, after reading Thompson's construction algorithm, I think the automaton is not wrong. The oval labelled "N(s)" is supposed to denote the subautomaton corresponding to the regular expression s, when the whole picture shows an NFA for s^*, i.e., for "zero or more of s". So if s is just a reg.exp. consisting of a single alphabet letter (e.g. "a"), then N(s) would contain the transition (labelled "a") you missed. If s is e.g. "a|b", then N(s) is more complex. - Jochen Burghardt (talk) 07:44, 30 May 2014 (UTC)

I colored the subautomata in the pictures from Thompson's construction algorithm, and rephrased the lead of Regular expression accordingly. Hope it's better now. - Jochen Burghardt (talk) 10:18, 30 May 2014 (UTC)

Regular Expression development tools

Latest comment: 10 years ago1 comment1 person in discussion

I didn't see a section about regular expression tools. I recently found this interesting Windows program that will allow a person to click on fields to create an expression. http://www.ultrapico.com/Expresso.htm Does anyone know of other similar tools? • Sbmeirow • Talk • 16:12, 1 July 2014 (UTC)

pcre syntax highlighting lost

Latest comment: 9 years ago1 comment1 person in discussion

Please see Talk:Perl Compatible Regular Expressions#pcre syntax highlighting lost. John Vandenberg ^(chat) 06:48, 18 July 2015 (UTC)

Examples

Latest comment: 9 years ago3 comments3 people in discussion

Why all the examples are in PHP? Shouldn't they be in pseudocode or in a more-common-syntax language? I find that the $ in the variables names can be confused with the regexs syntax. 186.136.108.233 (talk) 13:52, 11 February 2014 (UTC)

I'm adding links to search online regular expression testers and to one specific example, which are an excellent way to explore regular expressions with sufficiently equipped browsers, but require Wikipedia:EL#Rich_media exensions. - Tatzelbrumm (talk) 10:52, 22 May 2014 (UTC)

The image example is awful, as it is showing an exact opposite of a regular expression. Positive look-behinds are NOT regular. --141.89.226.146 (talk) 23:30, 13 November 2015 (UTC)

Lookaround is used, but not explained

Latest comment: 8 years ago3 comments3 people in discussion

Hello,

In the example at the top of the page the lookaround "(?<=)" and "(?=)" groups are used but not explained in the text. I've added a link at the bottom to a Quick Start guide with at least a description. Can someone add the operators to some list in the page. The page indeed needs clean up. As a computer scientist I can say that the page is written from the point of view of a theoretical computer scientist, not the point of view of an average Wikipedia user.

Thanks!

Jgamleus (talk) 17:41, 17 July 2015 (UTC)

The "average Wikipedia user" will never use a regex in their life, but I agree that examples shouldn't contain syntax which isn't covered in the text. – Smyth\^talk 13:53, 13 April 2016 (UTC)

There's additional similar functions which are powerful (can help reduce developer logic code, make use of more efficient regexp engine) but not even mentioned. For a reference, here's the Mozilla Developer's Network link ( https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions ). The following 3 examples are not Mozilla-centric, and can be found in e.g. PCRE docs. However, the mozilla-centric nature does omit the "lookaround" "(?=)" and "(?<=)" functionality, and unfortunately I am not familiar with them. Is there a corresponding "(?>=)" as well? They should all have proper examples given, but the only "lookaround" examples given are tacked on to "Word Boundaries" in the "ASCII" examples column, which is inadequate and inappropriate. I believe all of these belong in the Meta-Character(s) table, directly below "( )" "Groups a series" ...

• "non-capturing parentheses" "(?:x)", which is handy if you use parenthesis for grouping, if you optionally want to match something which may or may not be present, but wish to discard the match, as in "discard_this match_everything_else" /(?:|discard_this )match_everything_else/ also matches "match_everything_else" by itself

• "lookahead" "x(?=y)", which allows you to match something ONLY if followed by something else, "match_this but_only_if_that" matches /match_this(?= but_only_if_that)/ but "match_this and_not_anything_else" doesn't.

• "negated lookahead" "x(?!y)" which inverts the lookahead logic, so "match_this but_only_if_that" /match_this(?! but_only_if_that)/ does NOT match, but "match_this and_not_anything_else" DOES match.

P.S. As an "average" WikiPedia user, and a hobby developer, I came here rather than trying to find all the various application-centric pages of each implementation.

P.P.S. You can use my examples, or modify them, for the Wiki page, if someone else can also verify their technical correctness.

Warp9pnt9 (talk) 16:15, 4 June 2016 (UTC)

Which: "?" or "="

Latest comment: 8 years ago2 comments2 people in discussion

I'm not an expert on regular expressions so don't want to edit myself. In the table of metacharacters there are two instances of "?". Should the second be "=" ? SolarMcPanel (talk) 11:12, 2 August 2016 (UTC)

Are you referring to the table at Regular expression##Examples? It has two rows starting:

? Matches the preceding pattern element zero or one time.

? Modifies the *, +, ? or {M,N}'d regex that comes before to match as few times as possible.

They are correct. The first gives H.?e as an example regex where the ? makes the . optional (zero or one occurrences matches). The second regex uses l.+?o where the ? modifies what the preceding + means. On its own, + matches the previous item one or more times, as many as possible. In combination, +? matches the previous item one or more times, as few as possible. Johnuniq (talk) 11:41, 2 August 2016 (UTC)

Pronunciation of regex

Latest comment: 7 years ago4 comments4 people in discussion

I've always heard and said rejex (I don't speak phonetics) and I'd be very surprised if anyone used a hard g, although it could be argued as logical.Andthepharaohs (talk) 19:12, 9 September 2016 (UTC)

Where did you first hear that? I've always heard/said it with a hard 'g', which as you point out is logical. FusionDude (talk) 20:33, 28 October 2016 (UTC)

Maybe its a Locale thing? In the UK, rejex was the norm (but I did retire 10 years ago and it may have suddenly changed). We might ask Ken Thompson who wrote the original unix code, but the usage is in the public domain now. A Google search reveals no concensus. So I would go with Larry Walls, the inventor of Perl, who reckoned, "There's always more than one way to do it". Javalava101 (talk) 12:58, 14 December 2016 (UTC)

Here in Germany in a multilingual work environment I've never heard "rejex" at all. It still sounds most peculiar to me ;-) --Alfe (talk) 13:53, 15 March 2017 (UTC)

DTDs are not regular

Latest comment: 7 years ago1 comment1 person in discussion

The blurb about DTDs is way off base. DTDs are not (in general) regular; they're more like CFGs. DanConnolly (talk) 02:43, 3 October 2017 (UTC)

to complex

Latest comment: 6 years ago1 comment1 person in discussion

as a PhD in molecular biology, I know about explaining tech stuff (trust me) this article lacks a simple example at start, and lacks a sentance in intro that is clear please, add simple stuff (if the proverbial mom or dad can't get it, it ain't simple enough)

someothing like Regex refers both to a theory about how to find certain patterns, and programs that look for certain patterns. Eg, suppose we want to look for the word "serialize" and some common mispellings, and find "serialize" when it is between 1 and 20 characters from the word "journal" we would use... blah blah or something like that — Preceding unsigned comment added by 64.130.228.122 (talk) 18:06, 29 December 2017 (UTC)

Lazy matching

Latest comment: 6 years ago1 comment1 person in discussion

However, this does not ensure that not the whole sentence is matched in some contexts. The question-mark operator does not change the meaning of the dot operator, so this still can match the quotes in the input. A pattern like ".*?" EOF will still match the whole input if this is the string

"Ganymede," he continued, "is the largest moon in the Solar System." EOF

"However, this does not ensure that not the whole sentence is matched" is either incomprehensible or very poorly phrased (double negation). In any case, ".*" EOF will match this part:

Ganymede," he continued, "is the largest moon in the Solar System.

Whereas ".*?" EOF will match the same thing (lazy/minimal/reluctant matching makes no difference here because there's only one possible match). That is to say NOT the whole input. Urhixidur (talk) 19:40, 22 January 2018 (UTC)

poor examples

Latest comment: 6 years ago1 comment1 person in discussion

I came here to get some regex examples and these are too ambiguous https://en.wikipedia.org/wiki/Regular_expression#Formal_definition Examples:

a|b* denotes {ε, "a", "b", "bb", "bbb", …} (a|b)* denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", …} Using short strings and not actual words isn't too comprehensible. If I was a computer examples of how to search binary numbers might be fine but these examples don't teach the concept. — Preceding unsigned comment added by Jawz101 (talk • contribs) 16:54, 2 April 2018 (UTC)

Mistake in the regex for binary multiples of three

Latest comment: 6 years ago2 comments2 people in discussion

I tried to test the regex for binary multiples of three, but it seems to only provide either 0 or binary numbers that have a 1 at the beginning and the end. This can not be right, since e.g. 1100 is binary for 12 and does not end in a 1. — Preceding unsigned comment added by 2A02:8108:1BF:704E:58DB:9A23:AB51:9406 (talk) 22:28, 3 June 2018 (UTC)

The regular expression for binary multiples of three given is (0|(1(01*0)*1))* (note the asterisk at the end), and that certainly matches "1100": the leading "11" is matched by the 1(01*)*1 part of the alternative, and the zeroes are matched by the 0 part of the alternative, twice. – Tea2min (talk) 06:30, 4 June 2018 (UTC)

Typos in the formal definition, or a significant omission of English articles?

Latest comment: 6 years ago2 comments2 people in discussion

(Kleene star) R* denotes the smallest superset of set described by R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from set described by R. 11:03, 17 October 2018 (UTC)

Since R is subject to change, the bolded material above should read a set described by R, no?
Please inform (and forgive) me if this usage was intentional.

Mczuba (talk) 11:03, 17 October 2018 (UTC)

You are right, there should be an article. Since R describes exactly one set, it should even be "the". Thanks for noticing! - Jochen Burghardt (talk) 15:09, 17 October 2018 (UTC)

Bad choice of example image/regex

Latest comment: 5 years ago2 comments2 people in discussion

The example image's regex uses lookaheads/lookbehinds without them being defined anywhere in the article!

I realise this is a result of edits to the image's original caption over time, but the image should probably be removed, or definitions added. — Preceding unsigned comment added by Swith22 (talk • contribs) 22:16, 2 May 2018 (UTC)

Regex "[^"]*+" in possessive matching not gives a different result than "[^"]*" in lazy matching. Tejasvi Singh Tomar (talk) 13:40, 4 June 2019 (UTC)

"Regex" isn't a universally agreed on distinguishing name

Latest comment: 5 years ago1 comment1 person in discussion

The text suggests that "regular expressions" in the modern software sense aren't actual "regular expressions" in the mathematical sense (this is demonstrably true) and that they are actually "regexes". This is misleading. There are those who use the term "regex" to mean "modern software regular-expression-like engines" (this was first clearly articulated in the early 2000s as far as I know, by Larry Wall when developing Perl 6 Rules and has gained some traction). But it's trivial to find counter-examples in the literature.

Here is a researcher referring to mathematical regular expressions as "regexes":

Yang, Yi-Hua E., and Viktor K. Prasanna. "Space-time tradeoff in regular expression matching with semi-deterministic finite automata." 2011 Proceedings IEEE INFOCOM. IEEE, 2011.

And here is a Google patent that refers to software regular expressions with seemingly arbitrary alternation between the two terms:

Chen, Jian, and Xinyu Hu. "Regular expression matching method and system." U.S. Patent No. 8,756,170. 17 Jun. 2014.

You can see that there's just no consensus in Google Scholar search (results above from page 2) and Wikipedia really should not be used to try to assert one where it does not exist...

-Miskaton (talk) 20:32, 7 August 2019 (UTC)

promotional edits for the RE2 library

Latest comment: 4 years ago1 comment1 person in discussion

See the WP:MOS for guidance on ''See Also. This is a topic on regular expressions, which is not the same as a list of applications which implement regular expressions. TEDickey (talk) 11:10, 28 November 2019 (UTC)

What authority has standardized regular expressions

Latest comment: 4 years ago1 comment1 person in discussion

The Patterns section says standard textual syntax. I know that there are definitions of regular expressions within the POSIX standard but that is for within POSIX. Where is the authority saying that the definition within the POSIX standard applies outside of POSIX? Sam Tomato (talk) 21:40, 19 March 2020 (UTC)

Perl and Java are agnostic on encodings

Latest comment: 4 years ago2 comments2 people in discussion

@Irontitan76: Your edit (diff) inserted "of" in the following:

In contrast, Perl and Java are agnostic on encodings, instead of operating on decoded characters internally.

That totally changes the meaning. Are you sure about that? The original seems much more likely to me. Johnuniq (talk) 06:33, 29 October 2020 (UTC)

@Johnuniq: You're completely right and I completely made a mistake. I reverted the change. Irontitan76 (talk) 14:13, 29 October 2020 (UTC)

Not operator !

Latest comment: 3 years ago1 comment1 person in discussion

There is no discussion here of the 'not' operator. It is fleetingly shown in the mention of assertions. I would expect it to be in the metacharacter list as well. Neils51 (talk) 01:23, 26 November 2020 (UTC)

Pronunciation of the word Regex

Latest comment: 2 years ago4 comments4 people in discussion

I think it's worth touching on the pronunciation of the word, as it seems that that some people pronounce it "redge-ex" and others pronounce it "regg-ex". The former seems to be linguistically more natural since there is no glottal stop in the middle of the word, and seems to be used more commonly, however some sources suggest it is pronounced the way indicated in the latter example.

Thoughts? Nabeel_co (talk) 05:15, 25 February 2021 (UTC)

I think there are many such technical constructed words where pronunciations vary, and I don't think we need to cover that. Unless there's a good source (preferably even one discussing the pronunciation rather than merely stating an opinion on it), I don't think we can cover that.--Nø (talk) 09:45, 25 February 2021 (UTC)

most people I know pronounce it "redge-ex" OsamaBinLogin (talk) 06:15, 9 January 2022 (UTC)

there's no definitive answer and no practical way of getting a majority opinion, so best not mentioned. (I and most people I know use "reggex", the logic is that it's the first two syllables from "regular expression") — Preceding unsigned comment added by Mhkay (talk • contribs) 10:53, 31 January 2022 (UTC)

infinite number of equivalent regexes

Latest comment: 2 years ago2 comments2 people in discussion

The text states, without citation: "In most formalisms, if there exists at least one regular expression that matches a particular set then there exists an infinite number of other regular expressions that also match it—the specification is not unique." Well yes, X|X matches the same set of strings as X, as does X|X|X etc. But is this worth saying, does it matter, and if it is worth saying, shouldn't there be a citation? 82.152.109.221 (talk) 10:35, 31 January 2022 (UTC)

Agreed this is a bit weird. I edited to clarify the offending sentence. I don't think it needs an inline citation as it is pretty self-evident. Caleb Stanford (talk) 22:45, 31 January 2022 (UTC)

find and replace redirects here

Latest comment: 2 years ago3 comments3 people in discussion

Find and replace is a related, but quite different, concept. It should be a separate article, discussing the usage of find and replace in UIs and text editors. Caleb Stanford (talk) 00:57, 9 December 2021 (UTC)

For what it's worth, Find and replace was turned into a redirect to Regular expression in 2012, following a regular discussion, see Wikipedia:Articles for deletion/Find and replace. The last version before it was turned into a redirect is here. —Tea2min (talk) 07:58, 9 December 2021 (UTC)

For what it's worth, quite a lot I think, Find and replace is now (since sometime in March 2022) a disambiguation page. I was unfortunate to read the linked Afd page and was wondering at the sheer _stupidity_ expressed there, before I found out that brighter minds had already fixed the problem (if only fairly recently.) No, I can't be bothered to log in, already wasted enough time on WP today. 5.186.55.135 (talk) 15:36, 10 May 2022 (UTC)

importance

Latest comment: 2 years ago2 comments2 people in discussion

@Tacsipacsi: Re Special:Diff/1092712430/1092785488 -- makes sense to me! Thanks Caleb Stanford (talk) 22:02, 12 June 2022 (UTC)

I don't think I agree though; WP Computing has a broader scope than WP Computer Science. Wikipedia:WikiProject Computing/Assessment § Importance says that Top-importance is for "Essential network technology and protocols (...)". Regular expressions are used a lot but I wouldn't say that they are essential to computing in general. ―Jochem van Hees (talk) 22:41, 14 June 2022 (UTC)

Perl as Regex Pioneer

Latest comment: 1 year ago1 comment1 person in discussion

I was *really* surprised that the list of programming languages supporting regular expressions does not include Perl, considering several of the languages listed use PCRE (Perl Compatible Regular Expressions) under the hood and that most regex packages these days follow the regex syntax conventions pioneered by Larry Wall in Perl. Perl is one of the pioneers of regexes in programming languages (along with Awk) and the 800 pound gorilla in the regex supporting programming language. It is really really strange to see it not listed explicitly. I am going to add it. — Preceding unsigned comment added by 121.7.90.69 (talk) 04:07, 4 February 2023 (UTC)

Bad writing style in Unicode>Normalization

Latest comment: 1 year ago1 comment1 person in discussion

The paragraph waits until the end to tell what normalization is, when it should be put in easy words at the beginning of the text.

I tried to understand the meaning of Normalization in the context of RegEx. But when I read the paragraph at Regular_expression#Unicode at the point Normalization it was telling me about Unicode and some Typewriter history just to end with the final words [...] is normalization.

Better writing style would be: Normalization means something something. And then go into examples and history lessons. GavriilaDmitriev (talk • they/them) 03:10, 24 April 2023 (UTC)

I need a regular expression for this

Latest comment: 1 year ago3 comments3 people in discussion

15. 12. 1983 this is the original one, but i want like this 15.12.1983 in AWB advanced setings in find box iam putting this one (\d{1,2}.\s\d{1,2}.\s\d{4}) and in replace putting this one(\d{1,2}.\d{1,2}.\d{4}) but It's not working.--Tmamatha (talk) 07:11, 22 June 2023 (UTC)

Please ask perhaps somewhere linked from WP:AWB. Or, try WP:VPT. Johnuniq (talk) 07:22, 22 June 2023 (UTC)

@Tmamatha: . is a metacharacter, so needs to be escaped with a \; try (\d{1,2}\.)\s+(\d{1,2}\.)\s+(\d{4})

Try [1] for a regular expression tester with explanation. Bazza (talk) 08:54, 22 June 2023 (UTC)