Skip to content

mpalenciaolivar/Awesome-Decision-Science

Repository files navigation

Awesome Decision Science Awesome

An evergrowing, professionally curated list of resources on everything decision-making: videos, tutorials, books, papers, theses, articles, datasets, and open-source libraries.

🍔Click on the hamburger next to the file name for a better browsing experience.

🤖 Artificial Intelligence, Computational Intelligence, and Machine Learning

Books

Computational Intelligence

  • Engelbrecht, Andries P. Computational intelligence: an introduction. John Wiley & Sons, 2007. [Link]

Deep Learning

  • Bishop, Christopher M., and Hugh Bishop. "Deep learning: foundations and concepts." Springer, 2024. [Link]
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
  • Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016. [Link]
  • Grohs, Philipp, and Gitta Kutyniok, eds. Mathematical aspects of deep learning. Cambridge University Press, 2022. [Link]
  • Prince, Simon JD. Understanding Deep Learning. MIT press, 2023. [Link]
  • Zhang, Aston, et al. Dive into deep learning. Cambridge University Press, 2023. [Link]

Explainable AI

  • Biecek, Przemyslaw, and Tomasz Burzykowski. Explanatory model analysis: explore, explain, and examine predictive models. CRC Press, 2021. [Link]
  • Hall, Curtis and Pandey. Machine Learning for High-Risk Applications. O'Reilly, 2023. [Link]
  • Molnar, Christoph. Interpretable machine learning. Lulu. com, 2020. [Link]

Machine Learning

  • Bishop, Christopher M., and Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4. No. 4. New York: Springer, 2006. [Link]
  • Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for machine learning. Cambridge University Press, 2020. [Link]
  • Efron, Bradley, and Trevor Hastie. Computer age statistical inference, student edition: algorithms, evidence, and data science. Vol. 6. Cambridge University Press, 2021. [Link]
  • Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015. [Link]
  • Huber, Martin. Causal analysis: Impact evaluation and Causal Machine Learning with applications in R. MIT Press, 2023. [Link]
  • James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023. [Link]
  • Katsov, Ilya. Introduction to algorithmic marketing: Artificial intelligence for marketing operations. Grid Dynamics, 2017. [Link]
  • MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003. [Link]
  • Murphy, Kevin P. Probabilistic machine learning: Advanced topics. MIT Press, 2023. [Link]
  • Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT Press, 2022. [Link]
  • Siddiqi, Naeem. Intelligent credit scoring: Building and implementing better credit risk scorecards. John Wiley & Sons, 2017. [Link]

Courses and lecture notes, posts

Deep Learning

  • Lippe, Phillip. UvA Deep Learning Tutorials. 2022. [Link]
  • Ollion, Charles, and Olivier Grisel. Deep Learning course: lecture slides and lab notebooks. Institut Polytechnique de Paris, 2017. [Link]

Explainable AI

  • Galli, Soledad. Interpreting Machine Learning Models [Link]
  • Lakkaraju, Hima, et al. Explainable Artificial Intelligence: From Simple Predictors to Complex Generative Models. Harvard University, 2023. [Link]

Machine Learning

  • Christensen, Henrik I. Support Vector Machines - SVM & RVM. Georgia Insitute of Technology. [Link]
  • Inria. Machine learning in Python with scikit-learn. FUN, 2023. [Link]
  • MLU-Explain Team. MLU-Explain. Amazon (2021). [Link]

Reinforcement Learning and Control Theory

  • Dimitry Bertsekas. Reinforcement Learning and Optimal Control. [Link]
  • Elad Hazan, Karan Singh. Introduction to Online Nonstochastic Control. [Link]

Datasets

  • Andreas Luttens, et al. Large-scale Docking Datasets for Machine Learning. 2, Zenodo, 8 May 2023. [Link]
  • Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, page 36. [Paper] [Code]

Packages

Data loading

  • mlx-data. Efficient framework-agnostic data loading. Apple, 2023. [Link]

Explainable AI

  • Alibi explain. Open-source interpretability library supporting black box, white box, global and local interpratability methods. [Link]
  • Dalex. Responsible Machine Learning in Python. [Link]
  • Scikit-explain. User-friendly Python module for machine learning explainability with a comprehensive toolset of interpretability methods. [Link]
  • Shapash. Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models. MAIF, 2021.[Link]
  • Sudjianto, Agus, et al. "PiML Toolbox for Interpretable Machine Learning Model Development and Validation." arXiv preprint arXiv:2305.04214

Feature Engineering

  • Feature_engine. Feature engineering package with sklearn like functionality. [Link]

Hyperparameter optimization

  • Optuna. A hyperparameter optimization framework. [Link]

Machine Learning techniques

  • Catboost. A fast, scalable, high-performance Gradient Boosting on Decision Trees library used for ranking, classification, regression, and other machine learning tasks for Python, R, Java, and C++. Supports computation on CPU and GPU. [Link]
  • Khuat, Thanh Tung, and Bogdan Gabrys. "hyperbox-brain: A Toolbox for Hyperbox-based Machine Learning Algorithms." arXiv preprint arXiv:2210.02704 (2022). [Link]
  • quantile-forest. Quantile Regression Forests compatible with scikit-learn. [Link]

Papers

Deep Learning

Bayesian approaches
  • Arbel, Julyan, et al. A Primer on Bayesian Neural Networks: Review and Debates. arXiv preprint arXiv:2309.16314 (2023). [Link]
  • Hellström, Fredrik, et al. Generalization bounds: perspectives from information theory and PAC-Bayes. arXiv preprint arXiv:2309.04381 (2023). [Link]
  • Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013). [Link]
  • Nalisnick, Eric, and Padhraic Smyth. "Stick-breaking variational autoencoders." arXiv preprint arXiv:1605.06197 (2016). [Link]
Generative aspects
  • Coste, Simon. Diffusion. University of Paris, 2023. [Link]
  • Galerne, Bruno, and Valentin De Bortoli. Generative Modelling. ENS Paris-Saclay, 2023. [Link]
Mathematical aspects: approximation and generalization
  • Bartlett, Peter L., Andrea Montanari, and Alexander Rakhlin. Deep learning: a statistical viewpoint. Acta numerica 30 (2021): 87-201. [Link]
  • Berner, Julius, et al. The modern mathematics of deep learning. arXiv preprint arXiv:2105.04026 (2021): 86-114. [Link]
  • DeVore, Ronald, Boris Hanin, and Guergana Petrova. Neural network approximation. Acta Numerica 30 (2021): 327-444. [Link]
  • Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: Convergence and generalization in neural networks." Advances in neural information processing systems 31 (2018). [Link]
  • Hornik, Kurt. "Approximation capabilities of multilayer feedforward networks." Neural networks 4.2 (1991): 251-257. [Link]
  • Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. "Multilayer feedforward networks are universal approximators." Neural networks 2.5 (1989): 359-366. [Link]
  • Petersen, Philipp Christian. Neural network theory. University of Vienna 535 (2020). [Link]
Mathematical aspects: optimization
  • Khaled, Ahmed, and Peter Richtárik. "Better theory for SGD in the nonconvex world." arXiv preprint arXiv:2002.03329 (2020). [Link]
  • Sun, Ruoyu. Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019). [Link]

Machine Learning

Conformal Prediction
  • Angelopoulos, Anastasios N., and Stephen Bates. "A gentle introduction to conformal prediction and distribution-free uncertainty quantification." arXiv preprint arXiv:2107.07511 (2021). [Link]
  • Fontana, Matteo, Gianluca Zeni, and Simone Vantini. "Conformal prediction: a unified review of theory and new challenges." arXiv preprint arXiv:2005.07972 (2020). [Link]
  • Manokhin, Valery. (2022). Awesome Conformal Prediction (v1.0.0). Zenodo. [Link]
Explainable AI
  • Bilodeau, Blair, et al. "Impossibility theorems for feature attribution." Proceedings of the National Academy of Sciences 121.2 (2024): e2304406120. [Link]
  • Ibrahim Amoukou, Salim. Trustworthy machine learning: explainability and distribution-free uncertainty quantification. Diss. université Paris-Saclay, 2023. [Link]
  • Huang, Xuanxiang, and Joao Marques-Silva. "The inadequacy of Shapley values for explainability." arXiv preprint arXiv:2302.08160 (2023). (2023). [Link]
Fuzzy sets
  • Khuat, Thanh Tung, Dymitr Ruta, and Bogdan Gabrys. "Hyperbox-based machine learning algorithms: a comprehensive survey." Soft Computing 25.2 (2021): 1325-1363. [Link]
Imbalanced data problems
  • Elor, Yotam, and Hadar Averbuch-Elor. "To SMOTE, or not to SMOTE?." arXiv preprint arXiv:2201.08528 (2022). [Link]
  • van den Goorbergh, Ruben, et al. "The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression." Journal of the American Medical Informatics Association 29.9 (2022): 1525-1534. [Link]
Training ML models
  • Mirzasoleiman, Baharan, Jeff Bilmes, and Jure Leskovec. "Coresets for data-efficient training of machine learning models." International Conference on Machine Learning. PMLR, 2020. [Link]

Posts and threads

Explainable AI (XAI)

  • Of Models and Meanings. SHAP is the Blockchain of xAI. Of Models and Meanings, 2022. [Link]
  • Of Models and Meanings. What You Could Do with the Shapley Computation. Of Models and Meanings, 2022. [Link]

Imbalanced data problems

  • Mougan, Carl. Why SMOTE is not used in prize-winning Kaggle solutions?. Data Science, 2021. [Link]

Talks, conferences, and videos

  • Dieng, Adji B. Learning From Data: The Two Cultures. Association for Computing Machinery, 2021. [Link]
  • Rich, DJ. Mutual Information. True Theta LLC, 2020. [Link]

📊 Business Intelligence, Data Visualization, Communicating and Reporting

Books

  • Duarte, Nancy. Resonate: Present visual stories that transform audiences. John Wiley & Sons, 2013. [Link]
  • Duarte, Nancy. Slide: ology: The art and science of creating great presentations. Vol. 1. Sebastapol: O'Reilly Media, 2008. [Link]
  • Knaflic, Cole Nussbaumer. Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons, 2015. [Link]
  • Knaflic, Cole Nussbaumer. Storytelling with data: let's practice!. John Wiley & Sons, 2019. [Link]
  • Wexler, Steve, Jeffrey Shaffer, and Andy Cotgreave. The big book of dashboards: visualizing your data using real-world business scenarios. John Wiley & Sons, 2017. [Link]
  • Wilke, Claus O. Fundamentals of data visualization: a primer on making informative and compelling figures. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Datasets

Packages

Data structures

Python
  • Polars. Dataframes powered by a multithreaded, vectorized query engine, written in Rust. [Link]

Data Visualization and Reporting

Julia
  • Genie. 🧞The highly productive Julia web framework. [Link]
Python
  • Marimo. marimo is an open-source reactive notebook for Python — reproducible, git-friendly, executable as a script, and shareable as an app. [Link]
  • PyGWalker. Turn your pandas dataframe into an interactive UI for visual analysis. [Link]
  • Streamlit. A faster way to build and share data apps. [Link]
  • Vizro. Vizro is a toolkit for creating modular data visualization applications. [Link]

Papers

Posts and threads

Talks, conferences, and videos

💻 Computer Science and Software Engineering

Books

Algorithmics, data structures, and programming languages

  • Downey, Allen. Think complexity: complexity science and computational modeling. " O'Reilly Media, Inc.", 2018. [Link]
  • Downey, Allen. Think data structures: algorithms and information retrieval in Java. " O'Reilly Media, Inc.", 2017. [Link]
  • Downey, Allen. Think Python. " O'Reilly Media, Inc.", 2012. [Link]
  • Johnston, Nathaniel, and Dave Greene. Conway's Game of Life: Mathematics and Construction. Self-published, 2022. [Link]
  • Miller, Brad, and David Ranum. Problem-solving with algorithms and data structures. University of Auckland, 2013. [Link] [Website]
  • Nipkow, Tobias. "Functional Data Structures and Algorithms A Proof Assistant Approach." (2023). [Link]

Scientific programming

  • Blondel, Mathieu, and Vincent Roulet. "The Elements of Differentiable Programming." arXiv preprint arXiv:2403.14606 (2024). [Link]

Software development

  • Chacon, Scott, and Ben Straub. Pro git. Springer Nature, 2014. [Link]

Databases

  • Petrov, Alex. Database Internals: A deep dive into how distributed data systems work. O'Reilly Media, 2019. [Link]

Courses and lecture notes, posts

Algorithms

  • Roughgarden, Tim. Lecture Notes. Columbia University. [Link]

Scientific programming

  • Raschka, Sebastian. Scientific Computing in Python: Introduction to NumPy and Matplotlib. sebastianraschka.com, 2020. [Link]

Software engineering

  • Atlassian. Gitflow workflow. [Link]
  • Atlassian. Trunk-based development. [[Link]](Trunk-based development)
  • Shvets, Alexander. Refactoring Guru. 2014. [Link]

Packages

Python

Data processing
  • Bytewax. Python Stream Processing. [Link]
GUI
  • Textual. The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser. [Link]

Papers

Posts and threads

Talks, conferences, and videos

🗺️ Geospatial Analysis

Books

  • Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. Geocomputation with R. CRC Press, 2019. [Link]
  • Moraga, Paula. Geospatial health data: Modeling and visualization with R-INLA and shiny. CRC Press, 2019. [Link]
  • Moraga, Paula. Spatial Statistics for Data Science: Theory and Practice with R. CRC Press, 2023. [Link]

Courses and lecture notes, posts

Datasets

Packages

Papers

Posts and threads

Talks, conferences, and videos

👩‍🔬 Mathematics, Operations Research, Game Theory, and Simulations

Books

Algebra

  • Axler, Sheldon. Linear algebra done right. Springer Nature, 2023. [Link]

Applied Mathematics

  • Isoz, Vincent. Opera Magistris (Elements of Applied Mathematics). Sciences.ch, 2016. [Link]

Game Theory and Simulations

  • Downey, Allen B. Modeling and Simulation in Python: An Introduction for Scientists and Engineers. No Starch Press, 2023. [Link]

Graph Theory

  • McNulty, Keith. Handbook of graphs and networks in people analytics: with examples in R and Python. CRC Press, 2022. [Link]
  • Sargent, Thomas J., and John Stachurski. Economic Networks: Theory and Computation. QuantEcon, 2022. [Link]

Optimization

  • Boumal, Nicolas. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. [Link]
  • Boyd, Stephen P., and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004. [Link]
  • Kwon, Changhyun. Julia Programming for Operations Research. Changhyun Kwon, 2019. [Link]
  • Martins, J. R. R. A. and Ning, A., Engineering Design Optimization, Cambridge University Press, 2022. [Link]
  • Nesterov, Yurii. Lectures on convex optimization. Vol. 137. Berlin: Springer, 2018. [Link]
  • Sargent, Thomas J., and John Stachurski. Dynamic Programming Volume 1. QuantEcon, 2023. [Link]

Sequential Problems

  • Powell, Warren B. Sequential decision analytics and modeling: modeling with Python. Now, 2022. [Link]

Courses and lecture notes, posts

Mathematical Finance

  • Kempthorne, Peter, et al. "Topics in mathematics with applications in finance." Massachusetts Institute of Technology: MIT OpenCouseWare, 2013. [Link]
  • Roncalli, Thierry, Course 2023-2024 in Portfolio Allocation and Asset Management. SSRN, 2024. [Link]

Probability

  • Arya, Nisha. Learn Probability in Computer Science with Stanford University for FREE. KDNuggets, 2023. [Link]

Datasets

Packages

Optimization

  • Diamond, Steven, and Stephen Boyd. "CVXPY: A Python-embedded modeling language for convex optimization." Journal of Machine Learning Research 17.83 (2016): 1-5. [Link to the paper] [Link to the package]
  • PyPortfolioOpt. Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity. [Link]
  • scikit-portfolio. A portfolio optimization tool with scikit-learn interface. Hyperparameters selection and easy plotting of efficient frontiers. [Link]

Sensitivity analysis

  • SALib. Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods. [Link]

Papers

Posts and threads

Optimization

  • Jones, Andy. Natural gradients. Andy Jones. [Link]

Talks, conferences, and videos

  • MATLAB. Why Padé Approximations Are Great! | Control Systems in Practice. YouTube, 2022. [Link]

🤯 Methodology, interactions, and philosophical aspects of Science

Building theories

  • Jaccard, James, and Jacob Jacoby. Theory construction and model-building skills: A practical guide for social scientists. Guilford publications, 2019. [Link] [Website]

Computational Science

  • Judd, Kenneth. The Potential Partnership Between Economics and Computational Science. PyData Chicago, 2021. [Link]

Machine Learning and Statistics

  • Breiman, Leo. "Statistical modeling: The two cultures (with comments and a rejoinder by the author)." Statistical science 16.3 (2001): 199-231. [Link]
  • Harrell, Frank. "Classification vs. Prediction". Statistical Thinking, 2017. [Link]

Mathematics

  • Polya, George. How to solve it: A new aspect of mathematical method. Vol. 85. Princeton university press, 2004. [Link]

Scientific approaches

  • Wolfram, Stephen. A new kind of science. Vol. 5. Champaign, IL: Wolfram media, 2002. [Link]

📈 Statistics, Econometrics, and Data Mining

Books

Clustering

  • Govaert, Gérard, and Mohamed Nadif. Co-clustering: models, algorithms and applications. John Wiley & Sons, 2013. [Link]
  • Scrucca, Luca, et al. Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC, 2023. [Link]

Econometrics

  • Ding, Peng. "Linear Model and Extensions." arXiv preprint arXiv:2401.00649 (2024). [Link]
  • Evans, Richard W., Computational Methods for Economists using Python, Open access Jupyter Book, v#.#.#, 2023. [Link]
  • Wooldridge, Jeffrey M.. Introductory Econometrics: A Modern Approach. Brésil, Cengage Learning, 2020. [Link]

Statistics

Bayesian Statistics
  • Martin, Osvaldo A., Ravin Kumar, and Junpeng Lao. Bayesian modeling and computation in Python. CRC Press, 2021. [Link]
  • McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2020. [Link]
Exponential family
  • Agresti, Alan. Categorical data analysis. Vol. 792. John Wiley & Sons, 2012. [Link]
  • Efron, Bradley. Exponential families in theory and practice. Cambridge University Press, 2022. [Link]
Historical aspects
  • Fischer, Hans. A history of the central limit theorem: from classical to modern probability theory. Vol. 4. New York: Springer, 2011. [Link]
Inference and mathematical aspects
  • Soch, Joram, et al. StatProofBook/StatProofBook.Github.Io: StatProofBook 2021. 2021, Zenodo, 2022. [Link]
  • Wasserman, Larry. All of nonparametric statistics. Springer Science & Business Media, 2006. [Link]
  • Wasserman, Larry. All of statistics: a concise course in statistical inference. Vol. 26. New York: Springer, 2004. [Link]
Missing data
  • Van Buuren, Stef. Flexible imputation of missing data. CRC Press, 2018. [Link]
Regression modeling
  • McNulty, Keith. Handbook of regression modeling in people analytics: with examples in R and Python. CRC Press, 2021. [Link]
Statistical software
  • Kuhn, Max, and Julia Silge. Tidy modeling with R. " O'Reilly Media, Inc.", 2022. [Link]
  • Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. " O'Reilly Media, Inc.". [Link]

Time Series

  • Cochrane, John H. "Time series for macroeconomics and finance." (1997). [Link]
  • Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. [Link]
  • Neusser, Klaus. Time series econometrics. Springer publication, 2016. [Link]

Courses and lecture notes, posts

Causal Inference

  • Cunningham, Scott et al. Mixtape Sessions: Causal Inference. 2022. [Link]
  • Ding, Peng. "A First Course in Causal Inference." arXiv preprint arXiv:2305.18793 (2023). [Link]

Econometrics

  • Canay, Ivan. Econ 480-3 - Introduction to Econometrics. Northwestern University, 2021. [Link]
  • De Haan, Monique. ECON4150 - Introductory Econometrics. University of Oslo, 2018. [Link]

Statistics & Probability

  • Dunn, Peter  K. The Theory of Distributions, 2023. [Link]
  • Kozyrkov, Cassie. Statistical Thinking. YouTube, 2019. [Link]
  • Kunin, Daniel, et al. Seeing Theory. Brown University, 2016. [Link]

Forecasting

  • Manani, Galli. Feature Engineering for Time Series Forecasting, 2022. [Link]

Datasets

Forecasting

  • Godahewa, Rakshitha, et al. "Monash time series forecasting archive." arXiv preprint arXiv:2105.06643 (2021). [Link]
  • Lotsa Data. Salesforce, Hugging Face (2024). [Link]

Marketing applications

  • "6 Free, High-Quality, Marketing Mix Modeling Datasets | Forecastegy." Web. 10/14/2023 [Link]
  • Gaël Bernard and Periklis Andritsos. Datasets Simulating Customer Journeys. [Link]

Packages

Python

Time Series
  • Alexandrov, Alexander, et al. "Gluonts: Probabilistic and neural time series modeling in python." The Journal of Machine Learning Research 21.1 (2020): 4629-4634. [Link]
  • Salvador, Stan, and Philip Chan. "Toward accurate dynamic time warping in linear time and space." Intelligent Data Analysis 11.5 (2007): 561-580. [Link]
  • Fold. Fast Adaptive Time Series ML Engine. [Link]
  • Functime. Time-series machine learning at scale. Built on Polars for embarrassingly parallel feature engineering and forecasts. [Link]
  • HierarchicalForecast. Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods. [Link]
  • MFLES. A Specific implementation from ThymeBoost written with the help of Numba. [Link]
  • mlforecast. Scalable machine 🤖 learning for time series forecasting. [Link]
  • NeuralForecast. Scalable and user-friendly neural 🧠 forecasting algorithms. [Link]
  • SKForecast. Simplifies using sklearn models to do single and multistep forecasting and backtesting. [Link]
  • StatsForecast. Lightning ⚡️ fast forecasting with statistical and econometric models. [Link]
  • ThymeBoost. Forecasting with Gradient Boosted Time Series Decomposition. [Link]
  • vectorbt. Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research. [Link]

R

  • Ross, Gordon J., and Dean Markwick. "dirichletprocess: An R package for fitting complex Bayesian nonparametric models." (2018). [Link]
  • van Buuren, S., and K. Groothuis-Oudshoorn. “Mice: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software, vol. 45, no. 3, Dec. 2011, pp. 1-67, doi:10.18637/jss.v045.i03. [Paper] [Package]

Papers

Clustering

  • Keribin, Christine, Gilles Celeux, and Valérie Robert. "The latent block model: a useful model for high dimensional data." ISI 2017-61st world statistics congress. 2017. [Link]
  • Pham, Tung, et al. "Fast support vector clustering." Vietnam Journal of Computer Science 4 (2017): 13-21. [Link]
  • Pham, Tung, Trung Le, and Hang Dang. "Scalable support vector clustering using budget." arXiv preprint arXiv:1709.06444 (2017).

Probabilistic Graphical Models and associated optimization techniques

  • Blei, David M. Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Application 1 (2014): 203-232. [Link]
  • Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American Statistical Association 112.518 (2017): 859-877. [Link]
  • Dieng, Adji Bousso. Deep Probabilistic Graphical Modeling. Columbia University, 2020. [Link]
  • Figurnov, Mikhail, Shakir Mohamed, and Andriy Mnih. "Implicit reparameterization gradients." Advances in neural information processing systems 31 (2018). [Link]
  • Gelman, Andrew, Xiao-Li Meng, and Hal Stern. "Posterior predictive assessment of model fitness via realized discrepancies." Statistica sinica (1996): 733-760. [Link]
  • Kim, Kyurae, et al. "Black-Box Variational Inference Converges." arXiv preprint arXiv:2305.15349 (2023). [Link]

Statistics

Bayesian Statistics
  • Clarke, Bertrand, and Yuling Yao. "A Cheat Sheet for Bayesian Prediction." arXiv preprint arXiv:2304.12218 (2023). [Link]
Causality
  • Assaad, Charles K., Emilie Devijver, and Eric Gaussier. "Survey and evaluation of causal discovery methods for time series." Journal of Artificial Intelligence Research 73 (2022): 767-819. [Link]
Distributions
  • Leemis, Lawrence M., and Jacquelyn T. McQueston. "Univariate distribution relationships." The American Statistician 62.1 (2008): 45-53. [Paper] [Website].
  • Olszewski, Adrian. Challenging the cult of the prevalent normal distribution in nature. 2KMM, 2023. [Link]
Statistical hypothesis testing (NHST)
  • Gelman, Andrew. “Commentary: P Values and Statistical Practice.” Epidemiology, vol. 24, no. 1, 2013, pp. 69–72. JSTOR. Accessed 10 Dec. 2023. [Link]
  • Greenland, Sander et al. “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.” European journal of epidemiology vol. 31,4 (2016): 337-50. doi:10.1007/s10654-016-0149-3 [Link]
  • Lakens, Daniël. “Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses.” Social psychological and personality science vol. 8,4 (2017): 355-362. doi:10.1177/1948550617697177 [Link]
  • Lin, Mingfeng, et al. “Research Commentary: Too Big to Fail: Large Samples and the p-Value Problem.” Information Systems Research, vol. 24, no. 4, 2013, pp. 906–17. JSTOR. Accessed 10 Dec. 2023. [Link]
  • Lumley, Thomas et al. “The importance of the normality assumption in large public health data sets.” Annual review of public health vol. 23 (2002): 151-69. doi:10.1146/annurev.publhealth.23.100901.140546 [Link]
  • Mohd Razali, Nornadiah, and Bee Yap. ‘Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests’. J. Stat. Model. Analytics, vol. 2, 01 2011. [Link]
  • Morey, Richard D et al. “The fallacy of placing confidence in confidence intervals.” Psychonomic bulletin & review vol. 23,1 (2016): 103-23. doi:10.3758/s13423-015-0947-8 [Link]
  • Olzsewski, Adrian. Mann-Whitney (Wilcoxon) and Kruskal-Wallis FAIL to compare medians in general. Quantile regression should be used to compare medians instead. [Link]
  • Olszewski, Adrian. On the p-values - links library significance ditching. Adrian Olszewski, 2022. [Link]
  • Olzsewski, Adrian. Testing hypotheses through statistical models opens a universe of new possibilities. Learn how to improve your daily work with this approach. [Link]Pernet, Cyril. “Null hypothesis significance testing: a short tutorial.” F1000Research vol. 4 621. 25 Aug. 2015, doi:10.12688/f1000research.6963.3 [Link]
  • Serdar, Ceyhan Ceran et al. “Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies.” Biochemia medica vol. 31,1 (2021): 010502. doi:10.11613/BM.2021.010502 [Link]
  • The American Statistician, Volume 73, Issue sup1 (2019) [Link]
  • Verhagen, Arianne P., et al. ‘Is the p Value Really so Significant?*’. Australian Journal of Physiotherapy, vol. 50, no. 4, 2004, pp. 261–262. [Link]

Posts and threads

Bayesian Statistics

  • Camara-Escudero, Mauro. Variational Auto-Encoders and the Expectation-Maximization Algorithm. Mauro Camara-Escudero, 2020. [Link]
  • Patacchiola, Massimiliano. Evidence, KL-divergence, and ELBO. Massimiliano Patacchiola, 2021. [Link]
  • Yao, Yuling. Bayes is guaranteed to overfit, for any model, any prior, and every data point. Yuling Yao, 2023. [Link]

General topics

  • Harrell, Frank. Classification vs. Prediction. Statistical Thinking, 2017. [Link]

Variable selection / Feature selection

Talks, conferences, and videos

Bayesian Statistics

  • Chopin, Nicolas, et al. "Bayesian Causal Inference for Real World Interactive Systems." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021. [Link]
  • Jordan, Michael. Nonparametric Bayesian Methods: Models, Algorithms, and Applications II. UC Berkeley, 2017 [Link]
  • Maxim Kochurov. State of Bayes Lecture Series. PyMC Labs, 2023. [Link]
  • Pragmatic Data Scientists. Making Informed Decisions with Bayesianism: A Conversation with Kenneth, Statistician at Meta. Pragmatic Data Scientist, 2023. [Link]

Stochastic Processes

  • Hakenes, Hendrik. Ito's Lemma -- Some intuitive explanations on the solution of stochastic differential equations. University of Bonn, 2021. [Link]

📄 Text Mining and Natural Language Processing

Books

  • Silge, Julia, and David Robinson. Text mining with R: A tidy approach. " O'Reilly Media, Inc.", 2017. [Link]

Courses and lecture notes, posts

Datasets

  • Horwood, Ghraham V. Humanitarian Assistance and Disaster Relief (HA/DR) Articles and Lexicon. V1, Harvard Dataverse, 2017, doi:10.7910/DVN/TGOPRU. [Link]

Packages

Papers

  • Goldberg, Yoav. "A primer on neural network models for natural language processing." Journal of Artificial Intelligence Research 57 (2016): 345-420. [Link]
  • Minaee, Shervin, et al. "Large Language Models: A Survey." arXiv preprint arXiv:2402.06196 (2024). [Link]

Posts and threads

Talks, conferences, and videos

About

An evergrowing, professionally curated list of resources on everything decision-making: videos, tutorials, books, papers, theses, articles, datasets, and open-source libraries.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published