Talk:Convolutional neural network

Learn more about this page

This is the talk page for discussing improvements to the Convolutional neural network article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

A fact from this article appeared on Wikipedia's Main Page in the "Did you know?" column on December 9, 2013.

The text of the entry was: Did you know ... that convolutional neural networks have achieved performance double that of humans on some image recognition problems?

Computing Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Feature Maps edit

Latest comment: 7 years ago1 comment1 person in discussion

Need to introduce what feature maps are for nontechnical readers. — Preceding unsigned comment added by Shsh16 (talk • contribs) 18:24, 15 February 2017 (UTC)Reply

Non-linear Pooling edit

Latest comment: 7 years ago1 comment1 person in discussion

It says in the article: "Another important concept of CNNs is pooling, which is a form of non-linear down-sampling."

I don't think this is correct. There are pooling techniques, like average pooling which is mentioned in this same section, which are forms of linear down-sampling. I would remove the "non-linear." 194.117.26.63 (talk) 15:06, 13 May 2016 (UTC)Reply

Plagiarism in "Layer patterns" edit

Latest comment: 7 years ago2 comments2 people in discussion

The text seems is copied from https://cs231n.github.io/convolutional-networks/#layerpat without any attribution — Preceding unsigned comment added by Jkoab (talk • contribs) 01:41, 8 June 2016 (UTC)Reply

Indeed. Deleted copyvio text, see below. Maproom (talk) 09:55, 8 June 2016 (UTC)Reply

Copyright problem removed edit

Latest comment: 7 years ago1 comment1 person in discussion

Prior content in this article duplicated one or more previously published sources. The material was copied from: https://cs231n.github.io/convolutional-networks/#layerpat. Copied or closely paraphrased material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.)

For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, and, if allowed under fair use, may copy sentences and phrases, provided they are included in quotation marks and referenced properly. The material may also be rewritten, providing it does not infringe on the copyright of the original or plagiarize from that source. Therefore, such paraphrased portions must provide their source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Maproom (talk) 09:55, 8 June 2016 (UTC)Reply

Suggestion: Move the section "Regularization methods" to a new page edit

Latest comment: 5 years ago1 comment1 person in discussion

The methods listed here are applicable to deep learning in general. This topic should be moved into a new page. OhadRubin (talk) 06:38, 27 November 2018 (UTC)Reply

Parameter Sharing Clarifications edit

Latest comment: 4 years ago1 comment1 person in discussion

In the "Parameter sharing" section, "relax the parameter sharing scheme" is written, but what this actually means is unclear. — Preceding unsigned comment added by Ephsc (talk • contribs) 16:22, 27 September 2019 (UTC)Reply

What is convolutional about a convolutional neural network? edit

Latest comment: 3 years ago1 comment1 person in discussion

The article fails to explain what the connection between CNNs and convolutions are in any meaningful way. In particular, convolutions don't act on vectors; they act on functions. Comparing with the equation on the page for convolutions, there's obviously something analogous. --Stellaathena (talk) 16:51, 14 December 2020 (UTC)Reply

its actually the dsp version of a cross correlation, not a convolution. its a misnomer to call it convolution.-AS

Inaccurate information about Convolutional layers edit

Convolutional layers do not do convolutions. They do what is called "Cross correlation" in DSP, which is different than the statistics definition of cross correlation. https://en.wikipedia.org/wiki/Cross-correlation

This article says multiple times that the convolution operation is being done, and it links to the convolution article https://en.wikipedia.org/wiki/Convolution

This is misleading because it does not do this operation linked in the article. It does the operation linked in the cross correlation articles. -AS

Inacurate information: Convolutional models are not regularized versions of fully connected neural networks edit

In the second paragraph of the introduction, it is mentioned that "CNNs are regularized versions of multilayer perceptions." I think the idea is inaccurate. The entire paragraph describe convolutional models as regularized versions of fully connected models, and I don't think that is a good description. I think the idea of inductive bias would be better then that of regularization to explain convolutions.

I would also suggest merging the section "Definition" into the introduction. The definition section is only two sentences and it feels it would be better placed at the introduction.

Misleading use of the term tensor edit

Latest comment: 2 years ago1 comment1 person in discussion

The article uses the term tensor in the sense of multi-dimensional array. But the link redirects to the article [1] with mathematical definition. These terms in computer science (namely in the library tensorflow) and in mathematics are completely different. It's necessary to change at least the reference to [2]. But it's better to avoid the ambiguous use of mathematical terminology.

Max 88.201.254.120 (talk) 22:39, 10 April 2022 (UTC)Reply

Merge Architecture and Building Blocks sectdions edit

Latest comment: 1 year ago1 comment1 person in discussion

Much overlap with no clear distinction. Lfstevens (talk) 00:36, 7 February 2023 (UTC)Reply

Acronym ANN edit

Latest comment: 1 year ago1 comment1 person in discussion

The use or the acronym ANN for artificial neural networks is novel to me, and I wonder whether it needlessly clutters the opening sentence. Have others worked in areas where ANN is common? Babajobu (talk) 04:55, 24 March 2023 (UTC)Reply

Article is incomprehensible to the intelligent layman edit

Latest comment: 5 months ago2 comments2 people in discussion

No blame, it's an excellent start, but I think we can write this so that it's more easily parsed by an intelligent person outside the field who is willing to put in some mental work. Babajobu (talk) 04:57, 24 March 2023 (UTC)Reply

No kidding. Whoever wrote this seemed in a hurry to jump right into how CNNs work and what the technical differences are between CNNs and other machine learning architectures, with numerical examples.

That information does belong here, but further down in the article. This whole thing needs to be rearranged by an Expert who is also a good Explainer, to lead off with answers to simple questions.

What is a CNN?

What problems can it solve that other approaches can not, or solve more efficiently?

Is CNN an example of a wider family of architectures? If so, compare and contrast with its relatives in that family tree.

Some of these answers may already be embedded in the article, but the article makes the reader work too hard to find them.

You gotta tell people where you are taking them, and WHY, before you start describing, in detail, the steps you take to get there. 2601:283:4F81:4B00:35A1:9FF5:C8CF:11AF (talk) 21:10, 28 October 2023 (UTC)Reply

Hyperparameters edit

Latest comment: 1 year ago1 comment1 person in discussion

I have a question or a problem with explanation of hyperparameters.

1. Hyperparameters are first explained in Spatial arrangement subsection of Convolutional layer. Three hyperparameters are listed, which affect the output size. Here, I believe, kernel size K is missing, which is mentioned right away in the next paragraph.

2. In the Hyperparameters section, we have kernel size and filter size. By my understanding, these two parameters should be the same thing? Additionally, number of filters uses depth as the number of convolutional+pooling layers, whereas depth in the Spatial arrangement (my previous point) uses depth as a number of filters. En odveč (talk) 12:32, 30 March 2023 (UTC)Reply

Incorrect description of feed-forward neural network under "Architecture" edit

Latest comment: 1 year ago2 comments1 person in discussion

In the "Architecture"-section, the article states: " In any feed-forward neural network, any middle layers are called hidden because their inputs and outputs are masked by the activation function and final convolution."

This is not correct:

- There is not a final convolution in all feed-forward neural networks.

- The middle layers are called hidden, but not "because their inputs and outputs are masked by the activation function and final convolution." They are called hidden because they are not "externally visible".

Rfk732 (talk) 15:48, 8 April 2023 (UTC)Reply

I have removed the sentence. Rfk732 (talk) 10:38, 13 April 2023 (UTC)Reply

Empirical and explicit regularization? edit

Latest comment: 5 months ago1 comment1 person in discussion

The section Regularization methods has two different subsections: Empirical and Explicit. What do we mean by empirical? And what do we mean by explicit? —Kri (talk) 12:43, 20 November 2023 (UTC)Reply

Introduction edit

Latest comment: 1 month ago1 comment1 person in discussion

"only 25 neurons are required to process 5x5-sized tiles". Shouldn't that be "weights" and not "neurons"? Earlier it said "10,000 weights would be required for processing an image sized 100 × 100 pixels". Ulatekh (talk) 15:53, 19 March 2024 (UTC)Reply

Add topic