Given the probability distribution of the population, calculate the sampling distribution. This time we will deal with discrete distributions.
--Use the statistical package scipy.stats of scipy, which is a collection of scientific and technological calculation packages of python, and numpy of the numerical calculation library. --Calculate in the following 3 steps.
scipy.stats.rv_discrete The following is an excerpt from the scipy documentation. You can find it in the Help Example section, which you can read as follows:
from scipy import stats
help(stats.rv_discrete)
Just pass the discrete random variable in $ xk $, the probability corresponding to $ pk $, and pass it to stats.rv_discrete
.
This is an example of a random variable that takes 7 discrete values.
from scipy import stats
xk = np.arange(7)
pk = (0.1, 0.2, 0.3, 0.1, 0.1, 0.0, 0.2)
custm = stats.rv_discrete(name='custm', values=(xk, pk))
To visualize the probability mass function (pmf) of a discrete distribution, do the following:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.plot(xk, custm.pmf(xk), 'ro', ms=12, mec='r')
ax.vlines(xk, 0, custm.pmf(xk), colors='r', lw=4)
plt.show()
Sampling is performed from the created discrete distribution. For example, if you want to randomly sample 1000 times, do as follows. (rvs: random variables) If you want to fix the seed of the random number to ensure reproducibility, uncomment the first line.
import numpy as np
# np.random.seed(0)
sampled_rvs = custm.rvs(size=1000)
Calculate using the histogram function of numpy. Be careful not to forget to add +1 to bin.
f = np.histogram(sampled_rvs, bins = np.arange(7 + 1), density=True)[0]
The above contents are summarized.
Up until now, we have dealt with the case of one-dimensional probability distributions, but so that we can handle multidimensional joint probability distributions as well.
First, crush it to one dimension with p.ravel ()
, then calculate the sample distribution, reshape it, and then return it.
Pass the argument p as an array of nupmy.
import numpy as np
from scipy import stats
def sampling_dist(p, size, seed = 0):
xk = np.arange(p.size)
pk = p.ravel()
true_dist = stats.rv_discrete(name='true_dist', values=(xk, pk))
np.random.seed(seed)
sampled_rvs = true_dist.rvs(size=size)
f = np.histogram(sampled_rvs, bins = np.arange(p.size + 1), density=True)[0]
f.reshape(p.shape)
return(f.reshape(p.shape))
p = np.array([[0.10,0.10],[0.10,0.15],[0.15,0.10],[0.10,0.20]])
p_est = sampling_dist(p, 1000)
print(p)
print(p_est)
Recommended Posts