{ "cells": [ { "cell_type": "markdown", "source": [ "# Les statistiques" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Python permet de tirer des nombres aléatoires et de travailler avec des distributions. Il permet aussi de calculer des statistiques. Plusieurs de ces fonctions sont dans `scipy.stats` mais certaines se retrouvent aussi dans `numpy`. \n", "\n", "Prenons l'exemple de la distribution normale..." ], "metadata": {} }, { "cell_type": "code", "execution_count": 7, "source": [ "from scipy.stats import norm\n", "import numpy as np " ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Plusieurs fonctions sont disponibles" ], "metadata": {} }, { "cell_type": "code", "execution_count": 4, "source": [ "dir(norm)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['__call__',\n", " '__class__',\n", " '__delattr__',\n", " '__dict__',\n", " '__dir__',\n", " '__doc__',\n", " '__eq__',\n", " '__format__',\n", " '__ge__',\n", " '__getattribute__',\n", " '__getstate__',\n", " '__gt__',\n", " '__hash__',\n", " '__init__',\n", " '__init_subclass__',\n", " '__le__',\n", " '__lt__',\n", " '__module__',\n", " '__ne__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__setattr__',\n", " '__setstate__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " '__weakref__',\n", " '_argcheck',\n", " '_argcheck_rvs',\n", " '_cdf',\n", " '_cdf_single',\n", " '_cdfvec',\n", " '_construct_argparser',\n", " '_construct_default_doc',\n", " '_construct_doc',\n", " '_ctor_param',\n", " '_entropy',\n", " '_fit_loc_scale_support',\n", " '_fitstart',\n", " '_get_support',\n", " '_isf',\n", " '_logcdf',\n", " '_logpdf',\n", " '_logsf',\n", " '_mom0_sc',\n", " '_mom1_sc',\n", " '_mom_integ0',\n", " '_mom_integ1',\n", " '_munp',\n", " '_nnlf',\n", " '_nnlf_and_penalty',\n", " '_open_support_mask',\n", " '_parse_args',\n", " '_parse_args_rvs',\n", " '_parse_args_stats',\n", " '_pdf',\n", " '_penalized_nnlf',\n", " '_ppf',\n", " '_ppf_single',\n", " '_ppf_to_solve',\n", " '_ppfvec',\n", " '_random_state',\n", " '_reduce_func',\n", " '_rvs',\n", " '_rvs_size_warned',\n", " '_rvs_uses_size_attribute',\n", " '_sf',\n", " '_stats',\n", " '_stats_has_moments',\n", " '_support_mask',\n", " '_unpack_loc_scale',\n", " '_updated_ctor_param',\n", " 'a',\n", " 'b',\n", " 'badvalue',\n", " 'cdf',\n", " 'entropy',\n", " 'expect',\n", " 'extradoc',\n", " 'fit',\n", " 'fit_loc_scale',\n", " 'freeze',\n", " 'generic_moment',\n", " 'interval',\n", " 'isf',\n", " 'logcdf',\n", " 'logpdf',\n", " 'logsf',\n", " 'mean',\n", " 'median',\n", " 'moment',\n", " 'moment_type',\n", " 'name',\n", " 'nnlf',\n", " 'numargs',\n", " 'pdf',\n", " 'ppf',\n", " 'random_state',\n", " 'rvs',\n", " 'sf',\n", " 'shapes',\n", " 'stats',\n", " 'std',\n", " 'support',\n", " 'var',\n", " 'vecentropy',\n", " 'xtol']" ] }, "metadata": {}, "execution_count": 4 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Créons un vecteur numpy de valeurs aléatoires provenant d'une distribution normale. " ], "metadata": {} }, { "cell_type": "code", "execution_count": 6, "source": [ "xs = norm(0,1).rvs(1000)" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 9, "source": [ "np.mean(xs), np.std(xs)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(-0.007637470068240482, 1.0197635816492145)" ] }, "metadata": {}, "execution_count": 9 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "On peut aussi vouloir calculer la CDF à une valeur de x donnée:" ], "metadata": {} }, { "cell_type": "code", "execution_count": 10, "source": [ "norm(0,1).cdf(0)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.5" ] }, "metadata": {}, "execution_count": 10 } ], "metadata": {} }, { "cell_type": "code", "execution_count": 11, "source": [ "norm(0,1).pdf(0)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.3989422804014327" ] }, "metadata": {}, "execution_count": 11 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "ou la CDF inverse (quantile):" ], "metadata": {} }, { "cell_type": "code", "execution_count": 12, "source": [ "norm(0,1).ppf(0.5)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.0" ] }, "metadata": {}, "execution_count": 12 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Un truc utile est de simplifier notre notation à l'aide d'un alias. Par exemple, si on sait qu'on travaille avec la normale (0,1), alors on peut faire" ], "metadata": {} }, { "cell_type": "code", "execution_count": 13, "source": [ "std_norm = norm(0,1)\n", "phi = std_norm.pdf\n", "Phi = std_norm.cdf\n", "invPhi = std_norm.ppf\n", "\n", "phi(0),Phi(0),invPhi(0.5)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(0.3989422804014327, 0.5, 0.0)" ] }, "metadata": {}, "execution_count": 13 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Il est possible de faire des estimations non-paramétriques, des tests statistiques et de l'économétrie. Nous y reviendrons dans des leçons ultérieures. " ], "metadata": {} } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.8.5", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.8.5 64-bit ('base': conda)" }, "interpreter": { "hash": "ba2340ab882356406e091df0706039b4b3cc5191eef6c073d3fb97005dbe0324" } }, "nbformat": 4, "nbformat_minor": 2 }