pmf (probability mass function) and rvs (random variates) of scipy.stats. I will.#Import Numerical Library
import numpy as np
import scipy as sp
import pandas as pd
from pandas import Series, DataFrame
#Import visualization library
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline
#Japanese display module of matplotlib
!pip install japanize-matplotlib
import japanize_matplotlib
x = np.array([0,0,1,1,0,1,0,0])
#Calculate the probability distribution
p = len(x[x==1]) / len(x)
pmf_bernoulli = sp.stats.bernoulli.pmf(x, p)
#Visualization
plt.vlines(x, 0, pmf_bernoulli,
colors='blue', lw=50)
plt.xticks([0,1])
plt.xlim([0 - 0.5, 1 + 0.5])
plt.grid(True)

| Two types of events | probability |
|---|---|
| 0 | 0.625 |
| 1 | 0.375 |
binom.pmf to find the probability that a coin with a probability p of 50% will appear 5 times and 2 of them will appear.sp.stats.binom.pmf(n=5, p=0.5, k=2)

binom.rvs to generate a pseudo-random number that follows a binomial distribution. To do.binom.pmf, calculate the probability distribution of the number of times the table appears when a coin with a probability p of 20% appears 10 times, and compare it with the histogram of pseudo-random numbers.#Generate pseudo-random numbers
np.random.seed(1)
rvs_binom = sp.stats.binom.rvs(n=10, p=0.2, size=10000)
#Get the probability distribution
m = np.arange(0, 10+1, 1)
pmf_binom = sp.stats.binom.pmf(n=10, p=0.2, k=m)
#Visualization
sns.distplot(rvs_binom, bins=m,
kde=False, norm_hist=True, label='rvs')
plt.plot(m, pmf_binom, label='pmf')
plt.xticks(m)
plt.legend()
plt.grid()

| Number of times the table appears | probability |
|---|---|
| 0 | 0.107374182 |
| 1 | 0.268435456 |
| 2 | 0.301989888 |
| 3 | 0.201326592 |
| 4 | 0.088080384 |
| 5 | 0.026424115 |
| 6 | 0.005505024 |
| 7 | 0.000786432 |
| 8 | 0.000073728 |
| 9 | 0.000004096 |
| 10 | 0.000000102 |
poisson.pmf to find the probability that an average of 5 occurrences will occur only 2 times in a given period.sp.stats.poisson.pmf(k=2, mu=5)

poisson.rvs is used to generate a pseudo-random number that follows a Poisson distribution.poisson.pmf, calculate the probability distribution when the probability of occurrence p is 20%, and compare it with the histogram of pseudo-random numbers.#Generate pseudo-random numbers
np.random.seed(1)
rvs_poisson = sp.stats.poisson.rvs(mu=2, size=10000)
#Get the probability distribution
m = np.arange(0, 10+1, 1)
pmf_poisson = sp.stats.poisson.pmf(mu=2, k=m)
#Visualization
sns.distplot(rvs_poisson, bins=m,
kde=False, norm_hist=True, label='rvs')
plt.plot(m, pmf_poisson, label='pmf')
plt.xticks(m)
plt.legend()
plt.grid()

| Number of occurrences | probability |
|---|---|
| 0 | 0.135335283 |
| 1 | 0.270670566 |
| 2 | 0.270670566 |
| 3 | 0.180447044 |
| 4 | 0.090223522 |
| 5 | 0.036089409 |
| 6 | 0.012029803 |
| 7 | 0.003437087 |
| 8 | 0.000859272 |
| 9 | 0.000190949 |
| 10 | 0.000038190 |
#Specify parameters
n = 100000000
p = 0.00000002
#Calculate the probability distribution of the binomial distribution
num = np.arange(0, 10+1, 1)
pmf_binom_2 = sp.stats.binom.pmf(n=n, p=p, k=num)
#Visualization
plt.plot(m, pmf_poisson,
color='lightgray', lw=10, label='poisson')
plt.plot(m, pmf_binom_2,
color='black', linestyle='dotted', label='binomial')
plt.xticks(num)
plt.legend()
plt.grid()

geom.pmf in scipy.stats to get the probability of throwing a dice only once and getting a" 1 ".%precision 3
sp.stats.geom.pmf(k=1, p=1/6)

#Specify the number of trials
num = np.arange(1, 11, 1)
#Calculate the probability distribution
prob = []
for i in num:
value = sp.stats.geom.pmf(k=i, p=1/6)
prob.append(value)
#Visualization
plt.bar(num, prob)
plt.xticks(num)
plt.xlabel('Number of times until 1 appears for the first time')
plt.ylabel('probability')
plt.show()

| Number of trials | probability | a formula |
|---|---|---|
| 1 | 0.167 | ⅙ |
| 2 | 0.139 | ⅚ ・ ⅙ |
| 3 | 0.116 | ⅚ ・ ⅚ ・ ⅙ |
| 4 | 0.096 | ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 5 | 0.080 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 6 | 0.067 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 7 | 0.056 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 8 | 0.047 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 9 | 0.039 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
| 10 | 0.032 | ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙ |
#Specify all events
num = np.arange(1, 7, 1)
#Calculate the probability distribution
prob = []
for i in num:
value = 1 / len(num)
prob.append(value)
#Visualization
plt.bar(num, prob)
plt.xticks(num)
plt.xlabel('Dice roll')
plt.ylabel('probability')
plt.show()

| Dice roll | probability |
|---|---|
| 1 | 0.167 |
| 2 | 0.167 |
| 3 | 0.167 |
| 4 | 0.167 |
| 5 | 0.167 |
| 6 | 0.167 |
#Specify parameters
M = 20 #Total number
n = 7 #Number of hits
N = 12 #Number of selections
#Create a random variable
k = np.arange(0, n+1)
#Create a model
hgeom = sp.stats.hypergeom(M, n, N)
#Calculate the probability distribution
pmf_hgeom = hgeom.pmf(k)
#Visualization
plt.bar(k, pmf_hgeom)
plt.xticks(k)
plt.xlabel('Number of hits')
plt.ylabel('probability')
plt.show()

| Number of hits | probability |
|---|---|
| 0 | 0.00010 |
| 1 | 0.00433 |
| 2 | 0.04768 |
| 3 | 0.19866 |
| 4 | 0.35759 |
| 5 | 0.28607 |
| 6 | 0.09536 |
| 7 | 0.01022 |
#Specify parameters
N = 12 #Number of trials
p = 0.5 #Probability of success
k = 3 #Number of successes
#Calculate the probability distribution
pmf_nbinom = sp.stats.nbinom.pmf(range(N), k, p)
#Visualization
plt.bar(range(N), pmf_nbinom)
plt.xlabel('Number of failures')
plt.ylabel('probability')
plt.xticks(range(N))
plt.show()

| Number of failures | probability |
|---|---|
| 0 | 0.125 |
| 1 | 0.188 |
| 2 | 0.188 |
| 3 | 0.156 |
| 4 | 0.117 |
| 5 | 0.082 |
| 6 | 0.055 |
| 7 | 0.035 |
| 8 | 0.022 |
| 9 | 0.013 |
| 10 | 0.008 |
| 11 | 0.005 |
We have looked at the discrete probability distribution, but we will summarize it in a list with an awareness of what is a random variable and, in a nutshell, what to put on the x-axis.
| Types of probability distributions | Random variable | Parameters | |
|---|---|---|---|
| ⑴ | Bernoulli distribution | Event 0, 1 | Probability of occurrence p |
| ⑵ | Binomial distribution | Number of trials | Probability of occurrence p,Number of occurrences k,Number of trials n |
| ⑶ | Poisson distribution | Number of trials | Average number of occurrences mu |
| ⑷ | Geometric distribution | Number of trials | Probability of success p,Number of trials k |
| ⑸ | Discrete uniform distribution | Event type | ※scipy.Uniform distribution of atats is continuous only |
| ⑹ | Hypergeometric distribution | Number of successes | Total number M,Number of successes in the whole n,Number of selections N |
| ⑺ | Negative binomial distribution | Number of failures | Probability of success p,Number of successes k,Number of trials N |
Recommended Posts