Wikipedia:Reference desk/Archives/Computing/2020 October 12

Computing desk
< October 11 << Sep | October | Nov >> October 13 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


October 12

edit

Java regex help

edit

I am making a Java program that will parse PGN files. I need to make a regex that will identify a valid move in Algebraic notation. The way I understand it, it will need to match Strings that meet the following criteria in this order:

  1. (Optional) One of the following uppercase letters: K, Q, R, B, N
  2. (Optional) One of the following lowercase letters: a, b, c, d, e, f, g, h
  3. (Optional) The lowercase letter x
  4. A valid location on a chess board, such as e4 or c7. This is not optional
  5. (Optional) An equals sign (=) followed by one of the following uppercase letters: Q, R, B, N

I am new to Java regex and I'm having some difficulty making this. I would appreciate if someone can point me in the right direction. Keep in mind, I intend to first trim the following trailing symbols from each token: !, ?, +, #, =, and others, so there is no need to test for those. --PuzzledvegetableIs it teatime already? 01:24, 12 October 2020 (UTC) + minor edit --PuzzledvegetableIs it teatime already? 14:38, 12 October 2020 (UTC)[reply]

I don't know anything about Java, but I don't think you should use a regex for this
Maybe you should use a separate function instead? (deleted lots of text here, about examples: "e8, xe5, R1c4, Qa1xc3")
I found something from google that might help you: https://stackoverflow.com/questions/40007937/regex-help-for-chess-moves-san
 AltoStev Talk 06:20, 12 October 2020 (UTC)[reply]
You could, but it's not LL(1) because of the disambiguation part (#2), so a recursive-descent parser would need lookahead or backtracking, or an ugly conversion. fiveby(zero) 06:49, 12 October 2020 (UTC)[reply]
How about:
(K|Q|R|B|N)?((?:[a-h](?:[1-8])?)|[1-8])?(x)?([a-h][1-8])(=(?:Q|R|B|N))?
or
(K|Q|R|B|N)?((?:[a-h](?:[1-8])?)|[1-8])?(x)?([a-h][1-8])(=(?:Q|R|B|N))?(\+|#)?(!!|\?\?|!\?|\?!|!|\?)?
if decide not to trim the trailing chars. Paste into https://regexr.com to test and mouse over for explanation. You can click on details and enter moves into the text box to see what the capturing groups are doing. That's my reading of a SAN move from the PGN reference, added rank and file+rank disambiguation to your description. Didn't see '=' as a valid move suffix, and there isn't a stalemate indicator. Depending how you parsing you might want separate regex's for castles (O-O-O) rather than extend the above. By the way someone already converted the formal grammar to ANTLR. fiveby(zero) 06:30, 12 October 2020 (UTC)[reply]
Thanks for all your help. While I have you here, how are people learning regex? It seems so complicated. Where should I start if I want to learn this? I can't just post a question here every time I need to use one. --PuzzledvegetableIs it teatime already? 14:37, 12 October 2020 (UTC)[reply]
The reference for java regex is in the javadoc for java.util.regex.Pattern. That site https://regexr.com/ is great for testing and experimenting with regular expressions. I suspect the above would look pretty straightforward if not for the capturing groups. Start with a simple expression such as [a-h][1-8], then try it out in the expression field at regexr.com and look at the explanation provided. Then gradually add more to the expression: (x)?([a-h][1-8]), and try it out. You usually don't need the full power of regular expressions to accomplish what you need, the simple constructs go a long way. By all means, feel free to ask more questions. fiveby(zero) 15:25, 12 October 2020 (UTC)[reply]
Regexes are great, but they're not the solution to every text processing problem. You state that you're writing a parser. Regexes are not what you use to parse a language. You need to read a theoretical computer science book or two on formal languages and parsers. This is assuming you're doing this as self-teaching. If you just have a problem you want to solve, then what you want is to use existing code (there are numerous software packages for handling PGN, many of which are linked in the article), or feed the formal language definition to a parser generator like ANTLR (as mentioned) and have it generate one for you. --05:46, 16 October 2020 (UTC)