Template:Did you know nominations/Frequent subtree mining

The following is an archived discussion of the DYK nomination of the article below. Please do not modify this page. Subsequent comments should be made on the appropriate discussion page (such as this nomination's talk page, the article's talk page or Wikipedia talk:Did you know), unless there is consensus to re-open the discussion at this page. No further edits should be made to this page.

The result was: promoted by Oceanh (talk) 19:46, 4 July 2014 (UTC)

Frequent subtree mining edit

Created by APerson (talk). Self nominated at 22:57, 18 June 2014 (UTC).


General: Article is new enough and long enough

Policy compliance:

  • Adequate sourcing: No - each paragraph is indeed cited. However there are several issues to me at least, please let me know if these aren't issues for Wikipedia:
    • citations appear to be to academic journals and conferences. Is this ok? I understand that these kinds of sources are generally considered primary and to be avoided. There are no secondary sources such as textbooks.
    • clicking on the link to the first citation (Chi) results in a malicious site warning. And the citation info is wrong, the article is in 2005, not 2001, volume 66, not 21, page 161, not 1001. However, the actual article (I went and got it from the journal website, not the bad link) does correctly include the quote as written.
    • all links are to personal copies of articles rather than the journal/conference website so it is not clear these are the actual published versions
    • only year and page numbers are provided but in most cases the volume number is also needed to find the article.
  • Neutral: Yes
  • Free of copyright violations, plagiarism, and close paraphrasing: No - ?
    • for citation 2 (Deepak), I can see where it says frequent subtree is better than a maximum agreement subtree but not where it is a subset, as the article says.
    • citation 7 (Xiao) I am not sure says what the article claims for it. Xiao: "The applications of tree mining arise from Web usage mining, mining semi-structured data, and bioinformatics", the article: "Other domains in which frequent subtree mining is useful include ... bioinformatics.".
    • citation 8 (Zou) is a close paraphrase. Zou: "However, because of the combinatorial explosion, mining all frequent subtree patterns becomes infeasible for a large and dense tree database.", the article: "Furthermore, due to combinatorial explosion, frequent subtree mining is often impractical with large, dense tree databases".

Hook eligibility:

  • Cited: No - I am not sure if the hook is backed up by the citations. The source for the RNA claim just seems to say that RNA structures are essentially trees, not that frequent subtree mining is used to analyze them. You might be better using citation 5 (Chi-Canonical) and its biology information? The web history mining is mentioned in citation 1 (Chi).
  • Interesting: Yes
QPQ: Done.

Overall: Trying out this new template - I'm sorry about all the red X's! I can't see how to remove them. the "y" or "?" are the real evaluations. 142.150.38.155 (talk) 17:00, 2 July 2014 (UTC)

  • I fixed up the template a little. Currently fixing some of the problems with the article. APerson (talk!) 14:42, 3 July 2014 (UTC)
All that I think I need to fix right now is the plagiarism issue. I'm doing that now. APerson (talk!) 15:17, 3 July 2014 (UTC)
Responding to your 3 citation issues:
  • Deepak: The paper states that "This is because an MAST is, by definition, also an FST, because it is supported by every input tree."
  • Xiao: I don't get what is unclear about this. The paper says that the applications of tree mining arise from bioinformatics. Isn't that equivalent to saying that bioinformatics is a domain in which frequent subtree mining is applicable?
  • Zou: I changed the statement in the article to a direct quotation.
So, I think that all the issues are fixed now. APerson (talk!) 15:29, 3 July 2014 (UTC)
  • Above review by the IP, and confirmation of full correction by the nominator, are both taken on trust, especially regarding the specialised subject-matter. I have double-checked the easy bits (newness, length, policy, copyvio etc) and all appears to be OK. It is a pity that a sample Graph isomorphism pattern for this cannot be shown, but it is not necessary for DYK. Grey tick for AGF due to my own subject-ignorance, even though sources are online. Good to go. --Storye book (talk) 17:15, 3 July 2014 (UTC)