Wikipedia talk:WikiProject Chemistry/CAS validation/CASCommons

Latest comment: 3 years ago by Graeme Bartlett in topic Curation notes

CAS registry number validation results based on the following idea:

  • CAS numbers are validated against the 2021 CAS Common Chemistry databases, based on InChIKey match
  • validated numbers are detected on Wikipedia using a substring match

Therefore, a missing can have multiple reasons. The "category" pages likely will have to be solved; the missing CAS number on Wikipedia here reflects that the Wikidata and Wikipedia pages reflect different things: particularly, the sitelink is likely wrong.

Curation Feedback edit

The below expectations may be incorrect. If you analyzed the situation and the given CAS is not correct for the Wikipedia entry, the following could be the case:

  1. the Wikipedia-Wikidata sitelink is incorrect (e.g. stereochemistry mismatch)
  2. the InChIKey used for the match is incorrect.

Please report these things below under 'Curation notes' which I will monitor.

2021-05-14 edit

The Wikipedia ChemBox has been updated to make use of verified CAS RNs in Wikidata. The below "Not matching" link should get a lot shorter.

2021-04-18 edit

About 13 thousand validated CAS registry numbers in Wikidata linked to Wikipedia pages (https://w.wiki/3A88).

12291 CAS registry numbers in Wikipedia match Wikidata and CAS Common Chemistry. 595 do not, below.

Not matching edit

Curation notes edit

  • Bismuth oxynitrate is about a family of related chemicals. The commonchemistry entry is for a substance that was believed to exist in olden times, but probably does not exist. Probably we need new wikidata items for each substance and one for the group of things with the same name. Graeme Bartlett (talk) 23:01, 7 May 2021 (UTC)Reply
Done. --Egon Willighagen (talk) 21:50, 14 May 2021 (UTC)Reply
  • Wikidata entry for Polystyrene sulfonate is clearly wrong, having ids for the monomer, but the linked articles are all about the polymer. 9002-23-7 is not checkable based on inchi key or name, but might be right. Graeme Bartlett (talk) 00:34, 8 May 2021 (UTC)Reply
Being fixed in Wikidata. --Egon Willighagen (talk) 21:15, 14 May 2021 (UTC)Reply
Done. --Egon Willighagen (talk) 21:50, 14 May 2021 (UTC)Reply
  • Nebivolol CAS entry, 118457-14-0, has a specific stereoisomer specified, which agrees with KEGG and chembl. 99200-09-6 has no stereocentres specified, but matches wikidata, drugbank, chemspider, pubchem and chebi. 118457-14-0 matches the name on common chemistry, unlike the so-called validated 99200-09-6. So we have to decide if the drug is one particular stereoisomer or not. Graeme Bartlett (talk) 23:26, 14 May 2021 (UTC)Reply

2021-04-11 edit

About 13 thousand validated CAS registry numbers in Wikidata linked to Wikipedia pages (https://w.wiki/3A88). 12885 CAS registry numbers in Wikipedia match Wikidata and CAS Common Chemistry. 612 do not, below. Therefore, current estimate is: at least 95.3% is correct.

2021-04-10 edit

89022 validated CAS registry numbers in Wikidata linked to Wikipedia pages (https://w.wiki/3A88). 8115 CAS registry numbers in Wikipedia match Wikidata and CAS Common Chemistry. 430 do not, below. Therefore, current estimate is: at least 95.0% is correct.

2021-04-08 edit

22143 validated CAS registry numbers in Wikidata linked to Wikipedia pages. 1978 CAS registry numbers in Wikipedia match Wikidata and CAS Common Chemistry. 86 do not, below. Therefore, current estimate is: at least 95.8% is correct.

2021-04-06 edit

5678 validated CAS registry numbers in Wikidata linked to Wikipedia pages. 552 CAS registry numbers in Wikipedia match Wikidata and CAS Common Chemistry. 21 do not, below.