Data mining

edit

Please provide a source for your changes. The given -- well sourced -- definition of "data mining" even contains the word databases, and is distinct from machine learning (which is why this is included in "not to be confused with"). The field are related, but not the same (otherwise, we wouldn't need a separate article!)

http://www.sigkdd.org/curriculum.php

Database and Data Management Issues is even the first component listed for the "data mining curriculum" proposal. Also note the role of "machine learning".

Please provide appropriate sources for your changes. --Chire (talk) 09:20, 27 October 2011 (UTC)Reply

You are still mixing things. Please try instead to separate things: i.e. what is machine learning, and what is data mining - what is the difference.
Obviously - and that was already in the article - data mining is closely related to machine learning and they often share the same methods (which mostly come from AI and statistics, actually).
Also check out the references that are already in there! Fayyad essentially has defined the term. And check the definition used by Encyclopedia Britannica: [1]:
The field combines tools from statistics and artificial intelligence (such as neural networks and machine learning) with database management
Are you really sure that you are using the term properly, and not just because "data mining" sounds cooler than "machine learning" or "artificial intelligence", in particular for business users ...
Also, for example putting backpropagation into the introduction of data mining does not make the article clearer or more readable. It may be an interesting topic for you, but it is primarily an artificial intelligence topic and should be discussed there. Instead of throwing at the user everything that is related to data mining, we should try to help him understand when it is better to call things machine learning, artificial intelligence or - when in a database context - data mining.
P.S. check your references syntax. There is an error reported from the references you added! --Chire (talk) 11:35, 27 October 2011 (UTC)Reply
Neural networks are a prime example for a technique that clearly is AI, and is just being used in ML and DM. But for actual data mining, there are lot of extra requirements. This is why true data mining techniques have the strong ties to database management for efficiency, and AI and ML are usually listed as a kind of "preexisting technique". PAMI is indeed an authority in the "pure AI" field, but for the data mining area, SIGKDD, ICDE, TKDE and VLDB are the top authorities. AI people tend to downplay (in an arrogant and often insulting way) the efforts of data miners. But in fact, it is a different (although obviously not disjoint) community. Some examples for this view listing them in parallel:
* Computer Science conference ranking (note that they have: Databases, Data Mining, AI (including ML), NLP, IR + others such as web)
* Microsoft academic search has computer science subdomains differentiating between "AI", "Databases", "Data mining", "Machine learning and pattern recognition", "Information Retrieval".
* Arnetminer: KDD, ICDM, PAKDD, SDM.
* You will find plenty of formulations such as "Machine Learning and Data Mining" (so they are not redundant)
Only when you leave out the "databases" component of data mining, then it essentially becomes statistics+ai+ml
P.S.: Data mining conference ranking of Microsoft Data mining journal ranking of Microsoft. --Chire (talk) 07:07, 28 October 2011 (UTC)Reply

re: by analogy

edit

Well, this seems to be your opinion. I've given numerous sources, including encyclopedia britannica, microsoft academic, ACM's SIGKDD, conference rankings, IEEE. And Wikipedia is about reliable sources. And even your AAAI source says literally: "Data mining is an AI powered tool that can discover useful information within a database". So: using AI and ML within a database. This is they key difference between AI/ML and Data mining: DM uses essentially the same techniques, but tries to make them work within a database context, for example by exploiting the indexes that are already there for better performance. If you don't do this, you are doing regular AI/ML (and why call it by a different name then instead of calling it AI and ML?) --Chire (talk) 10:24, 1 November 2011 (UTC)Reply

P.S. the IEEE journal for data mining is IEEE TKDE: "Specific topics include, but are not limited to: (a) knowledge discovery and data mining, (b) data modeling and management, (c) underlying computational platforms for knowledge and data engineering tools, techniques and systems, and (d) emerging knowledge and data engineering applications". If you look at the IEEE web page [2] this is well parallel to PAMI, which does not mention "data mining" in its "scope" document. --Chire (talk) 10:38, 1 November 2011 (UTC)Reply

Your recent edits

edit

  Hello. In case you didn't know, when you add content to talk pages and Wikipedia pages that have open discussion, you should sign your posts by typing four tildes ( ~~~~ ) at the end of your comment. You could also click on the signature button   or   located above the edit window. This will automatically insert a signature with your username or IP address and the time you posted the comment. This information is useful because other editors will be able to tell who said what, and when. Thank you. --SineBot (talk) 01:18, 28 October 2011 (UTC)Reply