{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Les données\n",
    "\n",
    "Travailler avec des données est assez facile en Python. Le module `pandas` permet de manipuler des grandes bases de données facilement. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Les dictionnaires"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Supposons que nous avons deux listes. L'une contient des noms de pays, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "countries = ['Canada','United States','Germany','France','Italy']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "L'autre contient la population de chacun des pays, trouvé ici"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pop = [38.068,332.915,83.9,64.426,60.367]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On aimerait pouvoir travailler avec ces données, obtenir la population du Pays en invoquant son nom, etc. Une base de donnée pandas, c'est-à-dire un dataframe n'est rien d'autre qu'un objet qui est crée autour d'un type de données en Python, le dictionnaire. Voyons voir ce qu'est un dictionnaire en le déclarant basé sur deux listes jumelées en utilisant `zip`: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "map_pop = dict(zip(countries,pop))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Maintenant, on veut obtenir la population de l'Allemagne. On n'a qu'à l'invoquer: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.9"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_pop['Germany']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Un dictionnaire est composé d'une clé et d'items. Voyons voir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['Canada', 'United States', 'Germany', 'France', 'Italy'])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_pop.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_values([38.068, 332.915, 83.9, 64.426, 60.367])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_pop.values()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_items([('Canada', 38.068), ('United States', 332.915), ('Germany', 83.9), ('France', 64.426), ('Italy', 60.367)])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_pop.items()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut aussi composer le dictionnaire de la façon suivante:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "map_pop2 = {'Canada':38.068,'United States':332.815,'Germany':83.9,'France':64.426,'Italy':60.367}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.9"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "map_pop2['Germany']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Un dataframe est construit autour d'un dictionnaire sur des listes (ou arrays). Par exemple,  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'country':countries,'population':pop})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Canada</td>\n",
       "      <td>38.068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>United States</td>\n",
       "      <td>332.915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Germany</td>\n",
       "      <td>83.900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>France</td>\n",
       "      <td>64.426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Italy</td>\n",
       "      <td>60.367</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         country  population\n",
       "0         Canada      38.068\n",
       "1  United States     332.915\n",
       "2        Germany      83.900\n",
       "3         France      64.426\n",
       "4          Italy      60.367"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut aussi créer en utilisant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = pd.DataFrame(index=countries,columns=['population'],data=pop)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population\n",
       "Canada             38.068\n",
       "United States     332.915\n",
       "Germany            83.900\n",
       "France             64.426\n",
       "Italy              60.367"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Il y a une différence entre les deux dataframe. Le premier a deux colonnes, country et pop. Il a une colonne en gras qui débute par zéro et qui semble indiqué le numéro de l'observation. Le deuxième n'a pas la variable country mais a plutôt cette colonne qui est en gras and les noms de pays. Cette colonne en gras est l'index. L'avantage de l'index est qu'il me permet d'obtenir la valeur pour un pays plus facilement que si j'avais utilisé la colonne country. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.9"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.loc['Germany','population']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Il n'y a qu'un seul élément, donc c'est un scalaire. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Supossons que je veux deux pays, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Germany    83.900\n",
       "Italy      60.367\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.loc[['Germany','Italy'],'population']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Le résultat n'est pas un scalaire, mais plutôt ce qu'on appelle une `Series` de Pandas, puisque c'est seulement une colonne."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Je peux faire un sort sur mon index. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population\n",
       "Canada             38.068\n",
       "France             64.426\n",
       "Germany            83.900\n",
       "Italy              60.367\n",
       "United States     332.915"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Je peux rajouter une variable, le PIB de chaque pays, en milliards"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "gdp = [1711.39,20494.1,4000.39,2775.25,2072.2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2['gdp'] = gdp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>gdp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "      <td>1711.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "      <td>20494.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "      <td>4000.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "      <td>2775.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "      <td>2072.20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population       gdp\n",
       "Canada             38.068   1711.39\n",
       "United States     332.915  20494.10\n",
       "Germany            83.900   4000.39\n",
       "France             64.426   2775.25\n",
       "Italy              60.367   2072.20"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Les fonctions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Un paquet de statistiques sont disponible sous pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "population     115.9352\n",
       "gdp           6210.6660\n",
       "dtype: float64"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "population      579.676\n",
       "gdp           31053.330\n",
       "dtype: float64"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>gdp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>115.935200</td>\n",
       "      <td>6210.666000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>122.383425</td>\n",
       "      <td>8032.345163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>38.068000</td>\n",
       "      <td>1711.390000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>60.367000</td>\n",
       "      <td>2072.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>64.426000</td>\n",
       "      <td>2775.250000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>83.900000</td>\n",
       "      <td>4000.390000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>332.915000</td>\n",
       "      <td>20494.100000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       population           gdp\n",
       "count    5.000000      5.000000\n",
       "mean   115.935200   6210.666000\n",
       "std    122.383425   8032.345163\n",
       "min     38.068000   1711.390000\n",
       "25%     60.367000   2072.200000\n",
       "50%     64.426000   2775.250000\n",
       "75%     83.900000   4000.390000\n",
       "max    332.915000  20494.100000"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut transposer le dernier tableau"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>population</th>\n",
       "      <td>5.0</td>\n",
       "      <td>115.9352</td>\n",
       "      <td>122.383425</td>\n",
       "      <td>38.068</td>\n",
       "      <td>60.367</td>\n",
       "      <td>64.426</td>\n",
       "      <td>83.90</td>\n",
       "      <td>332.915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gdp</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6210.6660</td>\n",
       "      <td>8032.345163</td>\n",
       "      <td>1711.390</td>\n",
       "      <td>2072.200</td>\n",
       "      <td>2775.250</td>\n",
       "      <td>4000.39</td>\n",
       "      <td>20494.100</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            count       mean          std       min       25%       50%  \\\n",
       "population    5.0   115.9352   122.383425    38.068    60.367    64.426   \n",
       "gdp           5.0  6210.6660  8032.345163  1711.390  2072.200  2775.250   \n",
       "\n",
       "                75%        max  \n",
       "population    83.90    332.915  \n",
       "gdp         4000.39  20494.100  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.describe().transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Les calculs sur les colonnes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Le nom des colonnes se trouve dans "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['population', 'gdp'], dtype='object')"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculons le PIB par habitant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2['gdp_per_cap'] = df2['gdp']*1e3/df2['population']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>gdp</th>\n",
       "      <th>gdp_per_cap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "      <td>1711.39</td>\n",
       "      <td>44956.131134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "      <td>20494.10</td>\n",
       "      <td>61559.557244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "      <td>4000.39</td>\n",
       "      <td>47680.452920</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "      <td>2775.25</td>\n",
       "      <td>43076.552944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "      <td>2072.20</td>\n",
       "      <td>34326.701675</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population       gdp   gdp_per_cap\n",
       "Canada             38.068   1711.39  44956.131134\n",
       "United States     332.915  20494.10  61559.557244\n",
       "Germany            83.900   4000.39  47680.452920\n",
       "France             64.426   2775.25  43076.552944\n",
       "Italy              60.367   2072.20  34326.701675"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut classer sur la base du PIB per capita"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>gdp</th>\n",
       "      <th>gdp_per_cap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "      <td>20494.10</td>\n",
       "      <td>61559.557244</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "      <td>4000.39</td>\n",
       "      <td>47680.452920</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "      <td>1711.39</td>\n",
       "      <td>44956.131134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "      <td>2775.25</td>\n",
       "      <td>43076.552944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "      <td>2072.20</td>\n",
       "      <td>34326.701675</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population       gdp   gdp_per_cap\n",
       "United States     332.915  20494.10  61559.557244\n",
       "Germany            83.900   4000.39  47680.452920\n",
       "Canada             38.068   1711.39  44956.131134\n",
       "France             64.426   2775.25  43076.552944\n",
       "Italy              60.367   2072.20  34326.701675"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.sort_values(by='gdp_per_cap',ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut référer aux colonnes en utilisant deux notations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Canada           44956.131134\n",
       "United States    61559.557244\n",
       "Germany          47680.452920\n",
       "France           43076.552944\n",
       "Italy            34326.701675\n",
       "Name: gdp_per_cap, dtype: float64"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.gdp_per_cap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Canada           44956.131134\n",
       "United States    61559.557244\n",
       "Germany          47680.452920\n",
       "France           43076.552944\n",
       "Italy            34326.701675\n",
       "Name: gdp_per_cap, dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2['gdp_per_cap']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Une nouvelle colonne doit être crée par la dernière notation. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Merge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Supposons une autre base de donnée qui contient le continent des pays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "df3 = pd.DataFrame(index=countries,columns=['continent'],data=['North America','North America','Europe','Europe','Europe'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   continent\n",
       "Canada         North America\n",
       "United States  North America\n",
       "Germany               Europe\n",
       "France                Europe\n",
       "Italy                 Europe"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On peut joindre ces données dans notre première base de donnée"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "df4 = df2.merge(df3,left_index=True,right_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>population</th>\n",
       "      <th>gdp</th>\n",
       "      <th>gdp_per_cap</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>38.068</td>\n",
       "      <td>1711.39</td>\n",
       "      <td>44956.131134</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>332.915</td>\n",
       "      <td>20494.10</td>\n",
       "      <td>61559.557244</td>\n",
       "      <td>North America</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>83.900</td>\n",
       "      <td>4000.39</td>\n",
       "      <td>47680.452920</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>64.426</td>\n",
       "      <td>2775.25</td>\n",
       "      <td>43076.552944</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>60.367</td>\n",
       "      <td>2072.20</td>\n",
       "      <td>34326.701675</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               population       gdp   gdp_per_cap      continent\n",
       "Canada             38.068   1711.39  44956.131134  North America\n",
       "United States     332.915  20494.10  61559.557244  North America\n",
       "Germany            83.900   4000.39  47680.452920         Europe\n",
       "France             64.426   2775.25  43076.552944         Europe\n",
       "Italy              60.367   2072.20  34326.701675         Europe"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Les graphiques\n",
    "\n",
    "On peut faire des graphiques directement à partir d'un objet pandas!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEyCAYAAAD6Lqe7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAccElEQVR4nO3dfdhc9V3n8feHpFBsG8pDQEwowZJ1C9gnItKyl7ZEJd3WBhWuTa0lanazVqqturpQrdhqXPBaxbIWuljaBsSFLLUlWtEitHatCE1aWuRJYqEQoSTyVGyFNvjdP87vbiY3d5KZJNznDvN+XddcM/Odc4bvDEk+c87v/M5JVSFJ0j59NyBJmhkMBEkSYCBIkhoDQZIEGAiSpMZAkCQBMLvvBnbVIYccUgsWLOi7DUnaq6xfv/6fq2ruVK/ttYGwYMEC1q1b13cbkrRXSfLl7b3mLiNJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKkxECRJgIEgSWr22olpe8KCsz7edwvcc+7r+25BkgC3ECRJzVCBkOSFSa5KckeS25O8KslBSa5Ncle7P3Bg+bOTbEhyZ5JTBurHJ7mlvXZBkrT6fkmubPUbkyzY459UkrRDw24hvBf4i6r698DLgNuBs4DrqmohcF17TpJjgGXAscAS4MIks9r7XASsBBa225JWXwE8UlVHA+cD5+3m55IkjWingZBkDvB9wCUAVfWNqnoUWAqsboutBk5tj5cCV1TVk1V1N7ABOCHJ4cCcqrqhqgq4dNI6E+91FbB4YutBkjQ9htlC+E5gM/ChJJ9P8oEkzwMOq6oHANr9oW35ecB9A+tvbLV57fHk+jbrVNUW4DHg4MmNJFmZZF2SdZs3bx7yI0qShjFMIMwGXglcVFWvAL5G2z20HVP9sq8d1He0zraFqouralFVLZo7d8rTeUuSdtEwgbAR2FhVN7bnV9EFxINtNxDtftPA8kcMrD8fuL/V509R32adJLOBA4CHR/0wkqRdt9NAqKqvAPcl+a5WWgzcBqwFlrfacuDq9ngtsKwdOXQU3eDxTW230uNJTmzjA2dMWmfivU4Drm/jDJKkaTLsxLSfAy5Psi/wJeCn6MJkTZIVwL3A6QBVdWuSNXShsQU4s6qeau/zVuDDwP7ANe0G3YD1ZUk20G0ZLNvNzyVJGtFQgVBVNwOLpnhp8XaWXwWsmqK+DjhuivoTtECRJPXDmcqSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUDBUISe5JckuSm5Osa7WDklyb5K52f+DA8mcn2ZDkziSnDNSPb++zIckFSdLq+yW5stVvTLJgD39OSdJOjLKF8NqqenlVLWrPzwKuq6qFwHXtOUmOAZYBxwJLgAuTzGrrXASsBBa225JWXwE8UlVHA+cD5+36R5Ik7Yrd2WW0FFjdHq8GTh2oX1FVT1bV3cAG4IQkhwNzquqGqirg0knrTLzXVcDiia0HSdL0GDYQCvhEkvVJVrbaYVX1AEC7P7TV5wH3Day7sdXmtceT69usU1VbgMeAg0f7KJKk3TF7yOVOqqr7kxwKXJvkjh0sO9Uv+9pBfUfrbPvGXRitBHjRi160444lSSMZaguhqu5v95uAjwInAA+23UC0+01t8Y3AEQOrzwfub/X5U9S3WSfJbOAA4OEp+ri4qhZV1aK5c+cO07okaUg7DYQkz0vygonHwA8Bfw+sBZa3xZYDV7fHa4Fl7ciho+gGj29qu5UeT3JiGx84Y9I6E+91GnB9G2eQJE2TYXYZHQZ8tI3xzgb+uKr+IslngTVJVgD3AqcDVNWtSdYAtwFbgDOr6qn2Xm8FPgzsD1zTbgCXAJcl2UC3ZbBsD3w2SdIIdhoIVfUl4GVT1B8CFm9nnVXAqinq64Djpqg/QQsUSVI/nKksSQIMBElSYyBIkoDh5yHoWW7BWR/vuwXuOff1fbcgjTW3ECRJgIEgSWoMBEkSYCBIkhoDQZIEGAiSpMZAkCQBBoIkqXFimjSJk/Q0rtxCkCQBBoIkqTEQJEmAgSBJagwESRLgUUaSdsAjrsaLWwiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVIzdCAkmZXk80n+rD0/KMm1Se5q9wcOLHt2kg1J7kxyykD9+CS3tNcuSJJW3y/Jla1+Y5IFe/AzSpKGMMoWwtuB2weenwVcV1ULgevac5IcAywDjgWWABcmmdXWuQhYCSxstyWtvgJ4pKqOBs4HztulTyNJ2mVDBUKS+cDrgQ8MlJcCq9vj1cCpA/UrqurJqrob2ACckORwYE5V3VBVBVw6aZ2J97oKWDyx9SBJmh7DbiH8PvArwL8N1A6rqgcA2v2hrT4PuG9guY2tNq89nlzfZp2q2gI8Bhw87IeQJO2+nQZCkjcAm6pq/ZDvOdUv+9pBfUfrTO5lZZJ1SdZt3rx5yHYkScMYZgvhJOCNSe4BrgBOTvJHwINtNxDtflNbfiNwxMD684H7W33+FPVt1kkyGzgAeHhyI1V1cVUtqqpFc+fOHeoDSpKGs9NAqKqzq2p+VS2gGyy+vqp+AlgLLG+LLQeubo/XAsvakUNH0Q0e39R2Kz2e5MQ2PnDGpHUm3uu09t942haCJOmZsztnOz0XWJNkBXAvcDpAVd2aZA1wG7AFOLOqnmrrvBX4MLA/cE27AVwCXJZkA92WwbLd6EuStAtGCoSq+hTwqfb4IWDxdpZbBayaor4OOG6K+hO0QJEk9cOZypIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSszvXQ5CksbHgrI/33QL3nPv6Z/T93UKQJAEGgiSpMRAkSYCBIElqDARJEmAgSJIaA0GSBBgIkqTGQJAkAQaCJKnZaSAkeW6Sm5J8IcmtSd7d6gcluTbJXe3+wIF1zk6yIcmdSU4ZqB+f5Jb22gVJ0ur7Jbmy1W9MsuAZ+KySpB0YZgvhSeDkqnoZ8HJgSZITgbOA66pqIXBde06SY4BlwLHAEuDCJLPae10ErAQWttuSVl8BPFJVRwPnA+ft/keTJI1ip4FQnX9pT5/TbgUsBVa3+mrg1PZ4KXBFVT1ZVXcDG4ATkhwOzKmqG6qqgEsnrTPxXlcBiye2HiRJ02OoMYQks5LcDGwCrq2qG4HDquoBgHZ/aFt8HnDfwOobW21eezy5vs06VbUFeAw4eIo+ViZZl2Td5s2bh/qAkqThDBUIVfVUVb0cmE/3a/+4HSw+1S/72kF9R+tM7uPiqlpUVYvmzp27k64lSaMY6SijqnoU+BTdvv8H224g2v2mtthG4IiB1eYD97f6/Cnq26yTZDZwAPDwKL1JknbPMEcZzU3ywvZ4f+AHgDuAtcDytthy4Or2eC2wrB05dBTd4PFNbbfS40lObOMDZ0xaZ+K9TgOub+MMkqRpMswV0w4HVrcjhfYB1lTVnyW5AViTZAVwL3A6QFXdmmQNcBuwBTizqp5q7/VW4MPA/sA17QZwCXBZkg10WwbL9sSHkyQNb6eBUFVfBF4xRf0hYPF21lkFrJqivg542vhDVT1BCxRJUj+cqSxJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEnNTgMhyRFJPpnk9iS3Jnl7qx+U5Nokd7X7AwfWOTvJhiR3JjlloH58klvaaxckSavvl+TKVr8xyYJn4LNKknZgmC2ELcAvVdVLgBOBM5McA5wFXFdVC4Hr2nPaa8uAY4ElwIVJZrX3ughYCSxstyWtvgJ4pKqOBs4HztsDn02SNIKdBkJVPVBVn2uPHwduB+YBS4HVbbHVwKnt8VLgiqp6sqruBjYAJyQ5HJhTVTdUVQGXTlpn4r2uAhZPbD1IkqbHSGMIbVfOK4AbgcOq6gHoQgM4tC02D7hvYLWNrTavPZ5c32adqtoCPAYcPEpvkqTdM3QgJHk+8BHgHVX11R0tOkWtdlDf0TqTe1iZZF2SdZs3b95Zy5KkEQwVCEmeQxcGl1fVn7Tyg203EO1+U6tvBI4YWH0+cH+rz5+ivs06SWYDBwAPT+6jqi6uqkVVtWju3LnDtC5JGtIwRxkFuAS4vap+b+CltcDy9ng5cPVAfVk7cugousHjm9pupceTnNje84xJ60y812nA9W2cQZI0TWYPscxJwFuAW5Lc3GrvBM4F1iRZAdwLnA5QVbcmWQPcRneE0plV9VRb763Ah4H9gWvaDbrAuSzJBrotg2W797EkSaPaaSBU1d8w9T5+gMXbWWcVsGqK+jrguCnqT9ACRZLUD2cqS5IAA0GS1BgIkiTAQJAkNQaCJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJKAIQIhyQeTbEry9wO1g5Jcm+Sudn/gwGtnJ9mQ5M4kpwzUj09yS3vtgiRp9f2SXNnqNyZZsIc/oyRpCMNsIXwYWDKpdhZwXVUtBK5rz0lyDLAMOLatc2GSWW2di4CVwMJ2m3jPFcAjVXU0cD5w3q5+GEnSrttpIFTVp4GHJ5WXAqvb49XAqQP1K6rqyaq6G9gAnJDkcGBOVd1QVQVcOmmdife6Clg8sfUgSZo+uzqGcFhVPQDQ7g9t9XnAfQPLbWy1ee3x5Po261TVFuAx4OBd7EuStIv29KDyVL/sawf1Ha3z9DdPViZZl2Td5s2bd7FFSdJUdjUQHmy7gWj3m1p9I3DEwHLzgftbff4U9W3WSTIbOICn76ICoKourqpFVbVo7ty5u9i6JGkquxoIa4Hl7fFy4OqB+rJ25NBRdIPHN7XdSo8nObGND5wxaZ2J9zoNuL6NM0iSptHsnS2Q5P8ArwEOSbIROAc4F1iTZAVwL3A6QFXdmmQNcBuwBTizqp5qb/VWuiOW9geuaTeAS4DLkmyg2zJYtkc+mSRpJDsNhKp603ZeWryd5VcBq6aorwOOm6L+BC1QJEn9caayJAkwECRJjYEgSQIMBElSYyBIkgADQZLUGAiSJMBAkCQ1BoIkCTAQJEmNgSBJAgwESVJjIEiSAANBktQYCJIkwECQJDUGgiQJMBAkSY2BIEkCDARJUmMgSJIAA0GS1BgIkiTAQJAkNQaCJAmYQYGQZEmSO5NsSHJW3/1I0riZEYGQZBbwPuB1wDHAm5Ic029XkjReZkQgACcAG6rqS1X1DeAKYGnPPUnSWElV9d0DSU4DllTVf27P3wJ8b1W9bdJyK4GV7el3AXdOa6NTOwT4576bmCH8Ljp+D1v5XWw1U76LI6tq7lQvzJ7uTrYjU9SellRVdTFw8TPfzvCSrKuqRX33MRP4XXT8Hrbyu9hqb/guZsouo43AEQPP5wP399SLJI2lmRIInwUWJjkqyb7AMmBtzz1J0liZEbuMqmpLkrcBfwnMAj5YVbf23NawZtQurJ75XXT8Hrbyu9hqxn8XM2JQWZLUv5myy0iS1DMDQZIEGAi7Jck+Seb03Yck7QkGwoiS/HGSOUmeB9wG3Jnkl/vuS9LMk+S4vnsYhYEwumOq6qvAqcCfAy8C3tJrRz1J8rYkB/bdx0yS5NAkL5q49d1PH5J8W5J3JfnD9nxhkjf03VdP3p/kpiQ/m+SFfTezMwbC6J6T5Dl0gXB1VX2TKWZVj4lvBz6bZE07W+1UM87HQpI3JrkLuBv4a+Ae4Jpem+rPh4AngVe15xuB3+qvnf5U1X8A3kw38XZd28Pwgz23tV0Gwuj+N91f9ucBn05yJPDVXjvqSVX9GrAQuAT4SeCuJL+d5MW9NtaP3wROBP6hqo4CFgOf6bel3ry4qn4H+CZAVf0rU5+eZixU1V3ArwH/Hfh+4IIkdyT50X47ezoDYURVdUFVzauq/1idLwOv7buvvlQ3keUr7bYFOBC4Ksnv9NrY9PtmVT0E7JNkn6r6JPDynnvqyzeS7E/bcm4/EJ7st6V+JHlpkvOB24GTgR+uqpe0x+f32twUZsRM5b1JksOA3wa+o6pe167b8Cq6X8ljJcnPA8vpzuD4AeCXq+qbSfYB7gJ+pc/+ptmjSZ4PfBq4PMkmuoAcR+cAfwEckeRy4CS6Lchx9Ad0fzfe2baUAKiq+5P8Wn9tTc2ZyiNKcg3dPtJfraqXJZkNfL6qvrvn1qZdkvcAl7StpMmvvaSqbu+hrV60o86eoNs18mbgAODyttUwdpIcTLcLLcDfVdVMOO2zdsJAGFGSz1bV9yT5fFW9otVurqqX99xaL9rV7g5jYGuzqu7tryP1LcmPANdX1WPt+QuB11TVx/rsazoluYWpDzYJ3Z7Wl05zS0Nxl9HovtZ+/UzsHz0ReKzflvrRTkj4G8CDwL+1cgEz8g/7MyHJ4+zgKLOqGseJi+dU1UcnnlTVo0nOAT7WX0vTbq88zNZAGN0v0p2a+8VJPgPMBU7vt6XevAP4rnHdLQJQVS+Ab+0++wpwGVt3G72gx9b6NNXBKmP1b81Uu1H3Bu4yGlGS/YCn6C7hGbrLeO5TVWN3FEWSTwI/WFXjOnj6LUlurKrv3VltHCT5IPAo8D66raefAw6sqp/ssa1etD0I/wt4CbAv3en9vzZTtxzHKrX3kBuq6pXAt67XkORzwCv7a6k3XwI+leTjDBxWWFW/119LvXkqyZuBK+j+EXwT3Q+HcfRzwLuAK+l+NH0COLPXjvrzB3QX/Pq/wCLgDODoXjvaAQNhSEm+HZgH7J/kFWydaDMH+LbeGuvXve22b7uNsx8H3ttuRTcp7cd77agnVfU14Ky++5gpqmpDkllV9RTwoSR/23dP22MgDO8UumOp5wODv4AfB97ZR0N9q6p3993DTFFV9wBL++5jJkjy74D/Bixg26PPTu6rpx59vV0W+OY2WfMBurMczEiOIYwoyY9V1Uf67mMmSDKXbvLZscBzJ+rj+Bc/yXOBFTz9u/jp3prqSZIvAO8H1jOw26yq1vfWVE/aqW0epNuC/gW6+Snvq6p/7LWx7XALYURV9ZEkr+fpf/Hf019Xvbmcbj/xG4CfoZu1vLnXjvpzGXAH3Zbke+iOMhqbiXmTbKmqi/puYoY4tareSzdp8d0ASd5Ot2txxvFcRiNK8n7gP9ENnIXukNMje22qPwdX1SV05/H56/Zr+MS+m+rJ0VX1LrojSFYDrwfGbvZ686ftdM+HJzlo4tZ3Uz1ZPkXtJ6e7iWG5hTC6V1fVS5N8sareneR3gT/pu6mefLPdP9C2mu6nG2MZRxPfxaPtoihfoduHPo4m/hEcvHBUAd/ZQy+9SPImuoMKjkqyduClFwAzdt6OgTC6iRNUfT3Jd9D9zz2qx3769FtJDgB+ie5Y6zl0+0nH0cXtYkHvopu4+Hzg1/ttqR/t9N/j7m/pBpAPAX53oP448MVeOhqCg8ojSvIuun/8FrN14s0H2u4CSXzr0pHHsO0426X9daRhGAgjSrLfxKzkNmv5ucATYzpT+Si6sZQFbHt44Rv76qkv7c/Cj/H072LsDjZo5y16DV0g/DnwOuBvquq0PvuaTjs4x9XEye2cqfwscQNtVnILgSfHeKbyx+iuA/GnbD253bi6mu4kh+sZ04vBDDgNeBndaeF/ql1D5AM99zStJs5xtbcxEIbkTOUpPVFVF/TdxAwxv6qW9N3EDPGvVfVvSbYkmQNsYowGlPdmBsLwBmcq/y5bA2FsZyoD7227Bz7Btucy+lx/LfXmb5N8d1Xd0ncjM8C6dg2EP6TbYvoX4KZeO9JQHEMYkTOVt0ryP4C3AP/IwPUQxnSm8m10Jy27my4cZ/SFUJ4pSUK3tXRfe74AmFNVM/bIGm1lIAwpyQ8DX5w4z3mSX6cbRPwy8PaqurvP/vqQ5A7gpVX1jb576Vs7RcHT7K3nxd8dSdZX1fF996HROVN5eKtop2VI8gbgJ4Cfpjvm/P099tWnLwAv7LuJmaCqvtz+8f9XuqNLJm7j6O+SfE/fTWh0jiEMr6rq6+3xj9JdXH49sD7Jz/bYV58OA+5I8lm2HUMYx8NO30g3tvQddIOoR9Kdy+jYPvvqyWuBn0lyD/A1xnT32d7IQBhekjwf+DrdpLQLB1577tSrPOud03cDM8hv0p3H6a+q6hVJXkt3kZyxkeRFVXUv3bwD7YUMhOH9PnAz8FXg9qpaB9AOQX2gv7b6kWQfutP4Htd3LzPEN6vqoST7JNmnqj6Z5Ly+m5pmHwNeWVVfTvKRqvqxvhvSaAyEIVXVB5P8JXAo3b7zCV8BfqqfrvrTjjP/wsCvwnH3aNuC/DRweZJNwLhdazoDj513sBcyEEZQVf8E/NOk2thtHQw4HLg1yU10+4qB8RpDSHI03VjKUroB5V+guxbCkXSn9RgntZ3H2kt42Kl2WZLvn6peVX893b30JcmfAe+cfJx9kkXAOVX1w/10Nv2SPMXWQeT96cbbYIafv0dbGQjaLe34+4VV9VdJvg2YVVWP993XdEny99sbR0lyS1WN60VytBdyl9GQdnbFp6p6eLp6mSmS/BdgJXAQ8GK6cz29n+4orHGxoyPM9p+2LqQ9wIlpw1sPrGv3m4F/AO5qj8fu4uHNmcBJdEdeUVV30Q26j5PPtmDcRpIVjO+fC+2l3EIY0sRVoNo1lddW1Z+3568DfqDP3nr0ZFV9ozt9DSSZzfgNJr4D+GiSN7M1ABYB+wI/0ldT0q5wDGFEU52nJcm6qlrUV099SfI7wKPAGXRH1PwscFtV/WqfffWhTUSbGEu4taqu77MfaVcYCCNqcxH+H/BHdL+GfwL4vqo6pdfGetAmp60AfqiV/rKqxupCKNKziYEwoja4fA7wfXSB8GngPeM0qJxkKd0pjt/Xnt8EzKX7Pn6lqq7qsz9Ju8ZA2EVJnl9V/9J3H31I8hlg2cA5728GTgaeD3yoqsbpKCPpWcOjjEaU5NXtYii3tecvS3LhTlZ7ttl3Igyav6mqh9spLJ7XV1OSdo+BMLrz6S6n+RBAVX2BbvfRODlw8ElVvW3g6dxp7kXSHmIg7IJJv44Bnuqlkf7cuJ1j7/8rXjtX2ms5D2F09yV5NVBJ9gV+nu5CKOPkF4CPJflx4HOtdjywH3BqX01J2j0OKo8oySHAe+kmowX4BPDz43SU0YQkJ7P1imAeey/t5QyEESU5qao+s7OaJO1tDIQRJflcVb1yZzVJ2ts4hjCkJK8CXg3MTfKLAy/NAWb105Uk7TkGwvD2pZt4NRt4wUD9q8BpvXQkSXuQu4xGlOTIqvpy331I0p5mIAwpye9X1TuS/ClTnOJ5nK4jLOnZyV1Gw7us3f/PXruQpGeIWwiSJMAthJElOQn4DeBIuu8vQFXVd/bZlyTtLrcQRpTkDrpTN6xn4BxGVfVQb01J0h7gFsLoHquqa/puQpL2NLcQRpTkXLqJaH8CPDlRr6rPbXclSdoLGAgjSvLJKcpVVSdPezOStAcZCJIkwDGEoU06fxF0k9P+me7ykXf30JIk7VFeMW14L5h0mwMsAq5JsqzPxiRpT3CX0W5KchDwV57+WtLezi2E3dSulJa++5Ck3WUg7KZ2GclH+u5DknaXg8pDSnILTz/L6UHA/cAZ09+RJO1ZjiEMKcmRk0oFPFRVX+ujH0na0wwESRLgGIIkqTEQJEmAgSBJagwESRJgIEiSmv8Pkvq8csYEl84AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "df2['gdp_per_cap'].sort_values(ascending=False).plot.bar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sauvegarde et exportation\n",
    "\n",
    "Plusieurs formats sont possibles. Notons Excel, Stata, LaTex, etc. Le format natif Python, qui est très efficace est pickle (pkl). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.to_excel('countries.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lecture de données\n",
    "\n",
    "On peut télécharger des données de plus types directement, même du web. Voyons cet exemple célèbre d'ne basee de donnée sur les films (Kaggle IMDB Scores data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "url = 'https://dq-blog-files.s3.amazonaws.com/movies.xls'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_excel(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Il y a beaucoup de films..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1338"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Regardons les premiers films"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Year</th>\n",
       "      <th>Genres</th>\n",
       "      <th>Language</th>\n",
       "      <th>Country</th>\n",
       "      <th>Content Rating</th>\n",
       "      <th>Duration</th>\n",
       "      <th>Aspect Ratio</th>\n",
       "      <th>Budget</th>\n",
       "      <th>Gross Earnings</th>\n",
       "      <th>...</th>\n",
       "      <th>Facebook Likes - Actor 1</th>\n",
       "      <th>Facebook Likes - Actor 2</th>\n",
       "      <th>Facebook Likes - Actor 3</th>\n",
       "      <th>Facebook Likes - cast Total</th>\n",
       "      <th>Facebook likes - Movie</th>\n",
       "      <th>Facenumber in posters</th>\n",
       "      <th>User Votes</th>\n",
       "      <th>Reviews by Users</th>\n",
       "      <th>Reviews by Crtiics</th>\n",
       "      <th>IMDB Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Intolerance: Love's Struggle Throughout the Ages</td>\n",
       "      <td>1916</td>\n",
       "      <td>Drama|History|War</td>\n",
       "      <td>NaN</td>\n",
       "      <td>USA</td>\n",
       "      <td>Not Rated</td>\n",
       "      <td>123</td>\n",
       "      <td>1.33</td>\n",
       "      <td>385907.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>436</td>\n",
       "      <td>22</td>\n",
       "      <td>9.0</td>\n",
       "      <td>481</td>\n",
       "      <td>691</td>\n",
       "      <td>1</td>\n",
       "      <td>10718</td>\n",
       "      <td>88</td>\n",
       "      <td>69.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Over the Hill to the Poorhouse</td>\n",
       "      <td>1920</td>\n",
       "      <td>Crime|Drama</td>\n",
       "      <td>NaN</td>\n",
       "      <td>USA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>110</td>\n",
       "      <td>1.33</td>\n",
       "      <td>100000.0</td>\n",
       "      <td>3000000.0</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>The Big Parade</td>\n",
       "      <td>1925</td>\n",
       "      <td>Drama|Romance|War</td>\n",
       "      <td>NaN</td>\n",
       "      <td>USA</td>\n",
       "      <td>Not Rated</td>\n",
       "      <td>151</td>\n",
       "      <td>1.33</td>\n",
       "      <td>245000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>81</td>\n",
       "      <td>12</td>\n",
       "      <td>6.0</td>\n",
       "      <td>108</td>\n",
       "      <td>226</td>\n",
       "      <td>0</td>\n",
       "      <td>4849</td>\n",
       "      <td>45</td>\n",
       "      <td>48.0</td>\n",
       "      <td>8.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Metropolis</td>\n",
       "      <td>1927</td>\n",
       "      <td>Drama|Sci-Fi</td>\n",
       "      <td>German</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Not Rated</td>\n",
       "      <td>145</td>\n",
       "      <td>1.33</td>\n",
       "      <td>6000000.0</td>\n",
       "      <td>26435.0</td>\n",
       "      <td>...</td>\n",
       "      <td>136</td>\n",
       "      <td>23</td>\n",
       "      <td>18.0</td>\n",
       "      <td>203</td>\n",
       "      <td>12000</td>\n",
       "      <td>1</td>\n",
       "      <td>111841</td>\n",
       "      <td>413</td>\n",
       "      <td>260.0</td>\n",
       "      <td>8.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pandora's Box</td>\n",
       "      <td>1929</td>\n",
       "      <td>Crime|Drama|Romance</td>\n",
       "      <td>German</td>\n",
       "      <td>Germany</td>\n",
       "      <td>Not Rated</td>\n",
       "      <td>110</td>\n",
       "      <td>1.33</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9950.0</td>\n",
       "      <td>...</td>\n",
       "      <td>426</td>\n",
       "      <td>20</td>\n",
       "      <td>3.0</td>\n",
       "      <td>455</td>\n",
       "      <td>926</td>\n",
       "      <td>1</td>\n",
       "      <td>7431</td>\n",
       "      <td>84</td>\n",
       "      <td>71.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               Title  Year  \\\n",
       "0  Intolerance: Love's Struggle Throughout the Ages   1916   \n",
       "1                    Over the Hill to the Poorhouse   1920   \n",
       "2                                    The Big Parade   1925   \n",
       "3                                        Metropolis   1927   \n",
       "4                                     Pandora's Box   1929   \n",
       "\n",
       "                Genres Language  Country Content Rating  Duration  \\\n",
       "0    Drama|History|War      NaN      USA      Not Rated       123   \n",
       "1          Crime|Drama      NaN      USA            NaN       110   \n",
       "2    Drama|Romance|War      NaN      USA      Not Rated       151   \n",
       "3         Drama|Sci-Fi   German  Germany      Not Rated       145   \n",
       "4  Crime|Drama|Romance   German  Germany      Not Rated       110   \n",
       "\n",
       "   Aspect Ratio     Budget  Gross Earnings  ... Facebook Likes - Actor 1  \\\n",
       "0          1.33   385907.0             NaN  ...                      436   \n",
       "1          1.33   100000.0       3000000.0  ...                        2   \n",
       "2          1.33   245000.0             NaN  ...                       81   \n",
       "3          1.33  6000000.0         26435.0  ...                      136   \n",
       "4          1.33        NaN          9950.0  ...                      426   \n",
       "\n",
       "  Facebook Likes - Actor 2 Facebook Likes - Actor 3  \\\n",
       "0                       22                      9.0   \n",
       "1                        2                      0.0   \n",
       "2                       12                      6.0   \n",
       "3                       23                     18.0   \n",
       "4                       20                      3.0   \n",
       "\n",
       "  Facebook Likes - cast Total  Facebook likes - Movie  Facenumber in posters  \\\n",
       "0                         481                     691                      1   \n",
       "1                           4                       0                      1   \n",
       "2                         108                     226                      0   \n",
       "3                         203                   12000                      1   \n",
       "4                         455                     926                      1   \n",
       "\n",
       "   User Votes  Reviews by Users  Reviews by Crtiics  IMDB Score  \n",
       "0       10718                88                69.0         8.0  \n",
       "1           5                 1                 1.0         4.8  \n",
       "2        4849                45                48.0         8.3  \n",
       "3      111841               413               260.0         8.3  \n",
       "4        7431                84                71.0         8.0  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regroupement\n",
    "\n",
    "Pensons à un calcul difficile, comme le nombre de titre par genres pour les 10 genres avec le plus de titre. C'est là que pandas va faire sa magie à l'aide de la fonction `groupby` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Genres\n",
       "Drama                        62\n",
       "Comedy|Drama                 51\n",
       "Comedy                       48\n",
       "Comedy|Drama|Romance         46\n",
       "Drama|Romance                39\n",
       "Action|Adventure|Thriller    28\n",
       "Crime|Drama                  27\n",
       "Comedy|Romance               23\n",
       "Crime|Drama|Thriller         20\n",
       "Comedy|Crime                 17\n",
       "Name: Title, dtype: int64"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Genres').count()['Title'].sort_values(ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Regardons par pays, l'écrasante majorité vient des États-Unis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "USA             1073\n",
       "UK               130\n",
       "France            26\n",
       "Canada            25\n",
       "Australia         18\n",
       "Germany           12\n",
       "Italy             11\n",
       "Japan              9\n",
       "Spain              4\n",
       "West Germany       3\n",
       "Name: Title, dtype: int64"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country').count()['Title'].sort_values(ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combien en fait? 82% environ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "co_lead = df.groupby('Country').count()['Title'].sort_values(ascending=False).head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "USA             0.818459\n",
       "UK              0.099161\n",
       "France          0.019832\n",
       "Canada          0.019069\n",
       "Australia       0.013730\n",
       "Germany         0.009153\n",
       "Italy           0.008391\n",
       "Japan           0.006865\n",
       "Spain           0.003051\n",
       "West Germany    0.002288\n",
       "Name: Title, dtype: float64"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "co_lead = co_lead/co_lead.sum()\n",
    "co_lead"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "ba2340ab882356406e091df0706039b4b3cc5191eef6c073d3fb97005dbe0324"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}