Wikipedia:Reference desk/Archives/Computing/2022 December 24

Computing desk
< December 23 << Nov | December | Jan >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


December 24

edit

Four letter word game

edit

(NB not four-letter word, and not Wordle, but possibly related.) ==Background:== Some time ago on a now-defunct web forum I used to play a word game along the lines of "change 'cat' into 'dog' in five moves" (eg cat-bat-bag-bog-dog), but with four letters. I became interested in the underlying mechanics of four-word lists, since there are distinct sets of words that can only be changed into other words within that set. The majority of words (I seem to remember) are capable of being changed into many other words, but others are only capable of a few changes, and perhaps? some only in a straight line (eg A→only B→only C→only D).
Back then I set a challenge (obviously not as a cheat for the game) to come up with a computing technique to produce statistics of the distribution, and results about the sub-sets, answers to eg "By changing a single letter, which word/s have the most direct connections to any other word?" and "are there any words which cannot be changed into any other word?", or "Are there any pairs of words which only have mutual (ie one-to-one) interchangeability?" I got a number of really interested responses from some talented coders, but no-one would say how they achieved their results. I obviously had to be very specific in my questions to get the right information. There are around 4,000 to 5,000 4-letter words in English, depending on proper names etc. Scrabble list (3,997), here's one with 5,272.

So, my quer(ies): How might this task be best achieved - C++, SQL, Unix shell programming (grep etc.)? What sort of pseudo-code might it entail? Has this sort of thing been tried much previously? Does it have any wider application? What might it reveal about language (or anything else) in general? MinorProphet (talk) 15:58, 24 December 2022 (UTC)[reply]

Some if not most of these I’d program in Python. Given a four-letter word, say upon, you can make 25 different words by changing the first letter (giving apon to zpon), the second letter (uaon to uzon), and likewise for the third and fourth letters, giving 100 words in total. Using random-access lists, you can look each up in a dictionary of all (valid) four-letter words to see whether it is a valid English word, and count the number of hits, a number that must lie in the range from 0 to 100. BTW, I think upon scores 0; it is isolated. Viewing the words as the vertices of a graph whose edges represent one-letter changes, you can use graph-theoretical libraries for many questions, such as the number of connected components, the largest connected component, the pair of connected vertices with the greatest distance between them, and the longest (simple) path.  --Lambiam 20:38, 24 December 2022 (UTC)[reply]
@MinorProphet: You could adapt this Python script (it needs a text file english.txt with one word per line, in uppercase or lowercase). Good luck! cmɢʟeeτaʟκ 13:36, 27 December 2022 (UTC)[reply]
#!/usr/bin/env python

import collections

path_input = 'english.txt'
n_char_max = 15

class Word_ladder:
  def __init__(self, path_input):
    self.wordss = [[] for n_char in range(n_char_max + 1)] ## [n_char][i_word]
    with open(path_input) as f:
      for line in f.readlines():
        word = line.strip().lower()
        self.wordss[len(word)].append(word)

  def find_ladders(self, word_source, word_target=None):
    n_char = len(word_source)
    if (not word_target is None):
      if (    word_target  == word_source       ): return [word_source]
      if (len(word_target) != n_char            ): return 'words have different length'
      if (not word_target in self.wordss[n_char]): return '%s is not in word list' % (word_target)
      if (not word_source in self.wordss[n_char]): return '%s is not in word list' % (word_source)
    queue = collections.deque()
    queue.append(word_source)
    ladderss = { word_source:[] }
    is_done  = False
    while len(queue) > 0:
      word_queue = queue.popleft()
      ladders    = ladderss[word_queue]
      for word in self.wordss[n_char]:
        if (not word in ladderss and self.is_rung(word_queue, word)):
          queue.append(word)
          ladderss[word] = ladders + [word]
          if (word == word_target): is_done = True; break
      if (is_done): break
    if (not word_target is None):
      if (word_target in ladderss): return [word_source] + ladderss[word_target]
      else:                         return 'no solution was found'
    else:
      return [[word] + (ladderss[word][-2::-1] + [word_source] if (word in ladderss) else [])
              for word in self.wordss[n_char]]

  @staticmethod
  def is_rung(word_source, word_target):
    n_char_diff = 0
    for (i_char, char_source) in enumerate(word_source):
        if (word_target[i_char] != char_source): n_char_diff += 1
    return (n_char_diff == 1)

word_ladder = Word_ladder(path_input)
print(word_ladder.find_ladders('head', 'tail' ))
print(word_ladder.find_ladders('cold', 'warm' ))
Thanks all for your interesting[1] and otherwise helpful replies, I best brush up my Python skills. MinorProphet (talk) 08:58, 28 December 2022 (UTC)[reply]

Spell checking

edit

Spell check in, say, e-mail or Word, etc. is great, but it's always a bit limited. My typing is atrocious, so I'm constantly getting words like "teh" and "ti" in my text. Spellcheck programs, including the one I'm using right now in Firefox, only recognize one of those two as an error though, because "ti" is a real word - a real word that gets used in a few very specific use cases and kind of nowhere else besides the text of bad typists. Are there more sophisticated spellcheck dictionaries that know or that could be tweaked to know that, for most people at most times, "ti" is an error just the same as "teh"? I hate errors in my writing, so I try to check my writing carefully, but it would be great if Outlook or whatever knew that some words, although not wrong, are rarely used and therefore possibly errors. Are those options out there? Like, instead of highlighting in red for an error, they highlight in yellow as a rare or specialized word? Matt Deres (talk) 19:11, 24 December 2022 (UTC)[reply]

I don't know any, but here is something I witnessed the other day. Someone had written “there parents”. When their attention was drawn to the fact this was not the correct spelling, they promptly corrected it to “they're parents”. But in the context, “[the children and] th*r* parents”, the correct spelling was “their”.  --Lambiam 20:53, 24 December 2022 (UTC)[reply]
FYI:
their
there
they're
This user thinks that there are too many people who don’t know that they're worse than their own children at spelling!
Martin of Sheffield (talk) 22:23, 24 December 2022 (UTC)[reply]
A good spellchecker will allow you to delete unwanted words from its dictionary as well as add new ones like proper nouns. I haven't figured out how to do this in MS Word, but it certainly allows you to autocorrect words like ti and teh as you type. Checkout File > Options > Proofing > AutoCorrect options. Shantavira|feed me 09:16, 25 December 2022 (UTC)[reply]
A grammar checker is what you need. MS Word seems to have it built-in, but you may need to enable it:
https://support.microsoft.com/en-us/office/check-spelling-and-grammar-in-office-5cdeced7-d81d-47de-9096-efd0ee909227
It's rather pedantic, though, and flags non-standard or informal language. cmɢʟeeτaʟκ 08:38, 26 December 2022 (UTC)[reply]
I tried a grammar checker for some time but turned it off, as it was only producing false positives.  --Lambiam 11:03, 26 December 2022 (UTC)[reply]
I doubt a grammar checker would work, unless they've gotten a lot smarter recently. It and ti both function as nouns, so it would have to be very smart to catch when one is used in place of the other. Matt Deres (talk) 02:05, 30 December 2022 (UTC)[reply]