Great Papers in Statistics & Machine Learning
This is a list of statistics and machine learning research papers that I found to be interesting or are significant.
Most of these papers had great impact or discussed concepts that had great impact in the fields of statistics and machine learning. In fact, a lot of these papers are regarded as major historical advancements in their respective fields.
The number of citations for each entry is a rough guide, since they are only correct when I first added the entry. I might add brief thoughts and comments for individual papers over time.
I hope you have as much fun as I did exploring them!
Statistics
Classic Papers

Statistical Modeling: The Two Cultures  Leo Breiman  2001  4,049 citations

ComputerIntensive Methods in Statistics  Persi Diaconis and Bradley Efron  1983  1,581 citations

Stein’s Paradox in Statistics  Bradley Efron and Carl Morris  1977  823 citations
Others

Surprised by the Gambler’s and Hot Hand Fallacies?  Joshua Miller and Adam Sanjurjo  2015  151 citations

To Explain or to Predict?  Galit Shmueli  2010  2,322 citations
Bootstrapping

The Jackknife, A Review  Rupert Miller  1974  2,352 citations

Bootstrap Methods: Another Look at the Jackknife  Bradley Efron  1979  22,433 citations

The Bayesian Bootstrap  Donald Rubin  1,125 citations
Machine Learning
No Free Lunch Theorems

No Free Lunch Theorems for Optimization  David Wolpert and William Macready  1997  11162 citations

Coevolutionary Free Lunches  David Wolpert and William Macready  2005  261 citations

What is important about the No Free Lunch theorems?  David Wolpert  2020  5 citations

A Conservation Law for Generalization Performance  Cullen Schaffer  1994  519 citations

The Lack of A Priori Distinctions Between Learning Algorithms  David Wolpert  1996  1,767 citations

No Free Lunch Theorem: A Review  Stavros et al  2019  222 citations
DBSCAN

A DensityBased Algorithm for Discovering Clusters  Ester, Kriegel, Sander and Xu  1996  23,405 citations

Why and How You Should (Still) Use DBSCAN  Schubert, Ester, Kriegel, Sander and Xu  2015  890 citations
Naive Bayes Classifier
The Optimality of Naive Bayes  Harry Zhang  2004  2,180 citations
Tackling the Poor Assumptions of Naive Bayes Text Classifiers  Rennie et al  2003  1,445 citations
Estimating Continuous Distributions in Bayesian Classifiers  John and Langley  1995  4,629 citations
BiasVariance Tradeoff

Reconciling modern machinelearning practice and the classical biasvariance tradeoff

A Modern Take on the BiasVariance Tradeoff in Neural Networks  Brady Neal et al  2018  85 citations

Neural Networks and the Bias/Variance Dilemma  Geman, Bienenstock and Doursat  1992  4,495 citations
Support Vector Machines

A training algorithm for optimal margin classifiers  Boser, Guyon and Vapnik  1992  14,145 citations

Supportvector networks  Corinna Cortes and Vladimir Vapnik  1995  51,803 citations
Word2vec

Word2vec Explained  Yoav Goldberg and Omer Levy  2014  1,538 citations

Efficient Estimation of Word Representations in Vector Space  Mikolov et al  2013  26,104 citations
Neural Networks

Who Invented the Reverse Mode of Differentiation?  Andreas Griewank  2010  81 citations

Automatic Differentiation in Machine Learning: a Survey  Baydin et al  2015  1,339 citations

Learning Representations by BackPropagating Errors  Rumelhart et al  1986  27,188 citations
Tree Algorithms, Bagging, Boosting

Fifty Years of Classification and Regression Trees  Wei Yin Loh  2014  493 citations

Bagging Predictors  Leo Breiman  1996  29,038 citations

Random Forests  Leo Breiman  1996  84,549 citations

A DecisionTheoretic Generalization of OnLine Learning  Freund and Schapire  1997  23,098 citations

A Short Introduction to Boosting  Freund and Schapire  1999  4,273 citations

XGBoost: A Scalable Tree Boosting System  Chen and Guestrin  2016  15,135 citations
Others

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets  Robert Holte  1993  2,538 citations

Top 10 Algorithms in Data Mining  X Wu et al  2008  5,982 citations

MapReduce: Simplified Data Processing on Large Clusters  Dean and Ghemawat  2008  22,857 citations

A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection  Ron Kohavi  1995  14,423 citations