Talk:Gradient boosting

Learn more about this page

This is the talk page for discussing improvements to the Gradient boosting article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Computing Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Low	This article has been rated as Low-importance on the project's importance scale.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

The URL for Reference 6 appears to have changed edit

Latest comment: 11 years ago2 comments2 people in discussion

The URL for Reference 6 returns Error 404. I've found a new URL: cran.r-project.org/web/packages/gbm/gbm.pdf

69.116.252.118 (talk) 18:37, 29 March 2013 (UTC)Reply

I've updated the reference; thanks for bringing this to attention! Sophus Bie ^(talk) 18:42, 29 March 2013 (UTC)Reply

Gradient Tree Boosting edit

Latest comment: 10 years ago1 comment1 person in discussion

The following equation in the Gradient Tree Boosting section is confusing:

\gamma _{jm}={\underset {\gamma }{\operatorname {arg\,min} }}\sum _{x_{i}\in R_{jm}}L(y_{i},F_{m-1}(x_{i})+\gamma h_{m}(x_{i})).

In the formula before, $h_{m}$ was explicitly written out as the indicator function of the Region $R_{jm}$ , so the equation should be more simply be written:

\gamma _{jm}={\underset {\gamma }{\operatorname {arg\,min} }}\sum _{x_{i}\in R_{jm}}L(y_{i},F_{m-1}(x_{i})+\gamma ).

This corresponds also to step (c) of Algorithm 10.3 in 'The Elements of Statistical Learning'. Can anyone object/confirm ?

Andre.holzner (talk) 11:29, 19 August 2013 (UTC)Reply

Grammar issue edit

Latest comment: 7 years ago1 comment1 person in discussion

"the goal is to learn a model "? This makes no sense. Can someone correct this? I took a stab, but it might need more work... — Preceding unsigned comment added by 128.210.106.76 (talk) 16:08, 8 December 2016 (UTC)Reply

External links modified edit

Latest comment: 6 years ago1 comment1 person in discussion

Hello fellow Wikipedians,

I have just modified 2 external links on Gradient boosting. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 02:24, 22 October 2017 (UTC)Reply

Names reference to MART trademark edit

Latest comment: 5 years ago1 comment1 person in discussion

Current final line in Names section claims Salford Systems have trademarked "MART" however I can't see how this is plausible given it was coined by Friedman & Meulman in 2006 (http://www.datatheory.nl/pages/cdc7.pdf). The claim is not substantiated by a citation therefore I propose the reference to trademark be removed as it may be incorrect and seems unlikely to be of high interest/relevance to the anticipated typical reader of this page. I've not edited Wikipedia much thus am reluctant to presume to make the change first and justify it second! — Preceding unsigned comment added by SimonDedman (talk • contribs) 23:02, 31 August 2018 (UTC)Reply

math error (inaccuracy) edit

Latest comment: 4 years ago2 comments2 people in discussion

I wanted to point out that the formulae

F_{0}(x)={\underset {\gamma }{\arg \min }}{\sum _{i=1}^{n}{L(y_{i},\gamma )}}

,

F_{m}(x)=F_{m-1}(x)+{\underset {h_{m}\in {\mathcal {H}}}{\operatorname {arg\,min} }}\left[{\sum _{i=1}^{n}{L(y_{i},F_{m-1}(x_{i})+h_{m}(x_{i}))}}\right]

,

should probably read something like

F_{0}(x)={\underset {\gamma }{\arg \min }}\left[{\sum _{i=1}^{n}{L(y_{i},\gamma )}}\right](x)

,

F_{m}(x)=F_{m-1}(x)+{\underset {h_{m}\in {\mathcal {H}}}{\operatorname {arg\,min} }}\left[{\sum _{i=1}^{n}{L(y_{i},F_{m-1}(x_{i})+h_{m}(x_{i}))}}\right](x)

,

or

F_{0}={\underset {\gamma }{\arg \min }}{\sum _{i=1}^{n}{L(y_{i},\gamma )}}

,

F_{m}=F_{m-1}+{\underset {h_{m}\in {\mathcal {H}}}{\operatorname {arg\,min} }}{\sum _{i=1}^{n}{L(y_{i},F_{m-1}(x_{i})+h_{m}(x_{i}))}}

,

Do you agree? Since I'm new here I haven't edited the main page by myself. — Preceding unsigned comment added by Toedtli (talk • contribs) 12:03, 26 April 2019 (UTC)Reply

What is the point of omitting the (x)? — Preceding unsigned comment added by Zjplab (talk • contribs) 23:08, 21 October 2019 (UTC)Reply

Should published references show authored date or published date? edit

Latest comment: 1 year ago1 comment1 person in discussion

Newbie here, so apologies if this is general knowledge a Wikipedian ought to have.

I'm confused about what date should be reported for a published reference, the date it first appeared, or the date it was published?

For example, consider two references by Freidman.

The first, "Greedy Function Approximation: A Gradient Boosted Machine" is reported as February 1999, which is the date listed in a pdf freely available at a URL which may belong to Freidman. However, the paper was eventually published in Annals of Statistics in October 2001.

Similarly, "Stochastic Gradient Boosting" is reported as March 1999, which appears to be the date it was given as a lecture, available for free at Stanford stats domain, although it was published in Computational Statistics & Data Analysis in February 2002 Rmwenz (talk) 15:47, 26 August 2022 (UTC)Reply

Potential changes to history section, which is very brief and may contain historical inaccuracies. edit

Latest comment: 1 year ago2 comments2 people in discussion

As it stands, the history section is very brief, and probably due for expansion, and possibly a few corrections.

As it's worded, in my view, there is ambiguity about the contributions of Friedman and Mason et al. I read this section as implying that Friedman merely adapted boosting to the regression context by constructing some specific algorithms, while Mason et al are responsible for the connection between boosting and gradient descent in function space, which doesn't appear historically accurate.

In my opinion, this section doesn't adequately recognize the primacy of Friedman's contribution, and is potentially misleading in crediting Mason et al with the functional gradient descent view instead. Moreover, it doesn't adequately convey the breakthrough importance of Freidman's work. Here's the current wording

"Explicit regression gradient boosting algorithms were subsequently developed, by Jerome H. Friedman, simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean. The latter two papers introduced the view of boosting algorithms as iterative functional gradient descent algorithms."

It's somewhat difficult to determine who first made the observation that boosting algorithms perform optimization by gradient descent in function space. Friedman makes the observation in Greedy Function Approximation: A Gradient Boosting Machine (GFA). From the abstract:

"Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion."

Mason et al made a more abstract mathematically general version of this observation as well Boosting Algorithms as Gradient Descent (BAGD). From the abstract:

"We provide an abstract characterization of boosting algorithms as gradient descent on cost-functionals in an inner-product function space."

Assigning primacy or priority to the observation is potentially confounded by the difference between first appearance, authorship and publication dates. On closer inspection, GFA appears to have been first authored in February 1999, but not published until 2001 in Annals of Statistics, while BAGD first appeared at the NeurIPs conference November 29- December 4, 1999 and was later published in its proceedings.

While may not be strictly necessary to recognize primacy of this observation, a perusal of the literature indicates the connection is of very important theoretical significance. The literature is full of references to the Freidman paper, which at the time of writing has a citation count of 18687, while the paper by Mason et al has a current citation count of 1257. There may be some difference in citation counts here due to the sub-discipline - statistical learning (Freidman) versus machine learning (Mason et al) - but Friedman's paper seems clearly more influential.

Freidman's paper is published in the prestigious Annals of Statistics;, and it appears responsible for coining the term "Gradient Boosted Machines", while the Mason et al paper is published in NeurIPS conference proceedings. I could find plenty of references crediting Freidman with gradient boosting, including the scikit learn documentation, textbooks by Murphy, p. 642 and Zhou p.35, and well-cited blog articles. I could find no mentions of the paper by Mason et al in my reading of the wider literature.

I believe this section should be edited to reflect the significance of Friedman's contribution and clarify the potential confusion. The section could also stand to flesh out the work of Breiman on ARCing classifiers, which appears influential in directing the attention of the statistics community to AdaBoost specifically and boosting in general.

I'm willing to make these changes myself but since I'm new page editing, I thought I'd start with the talk page. Rmwenz (talk) 17:04, 26 August 2022 (UTC)Reply

تحكم في حسابك وحمايته وأمنه ، كل ذلك في مكان واحد. يمنحك حساب Google الخاص بك وصولاً سريعًا إلى الإعدادات والأدوات التي تتيح لك حماية بياناتك وحماية خصوصيتك. 212.237.121.31 (talk) 21:30, 7 March 2023 (UTC)Reply

Add topic