FindingData
PMF, PDF & CDF
PMF, PDF & CDF
Nov 29, 2020

Before heading to probability density function or cumulative density function, we will understand about random variables because these density function works on different type of random variables
Random Variables
- Random variables are the values which are depend upon the outcome of any experiment.
- for example, while rolling a dice, we dont know what is the outcome (it may be any one from 1,2,3,4,5,6) so variable value depend upon the outcome of experiment.
random variables are of two types:
- Continuous random variables
- Discrete random variable
Continuous random variables
- The variables which having the values between a range or interval and we can grab infinite number of values from that interval.
- for example : weight of 100 random people.
Discrete random values
- The variables whose values are obtained by the counting.
- for example : Number of student present in a class.
probability Mass Function (PMF)
probability density function is a statistical term that describes the probability distribution on Discrete random variables.
probability Density Function (PDF)
probability density function is a statistical term that describes the probability distribution of Continuous random variables.
- If datapoints of given dataset follows gaussian distribution then Pdf will also follow probability density function.
- On pdf, the probability of single outcome will always be zero because single point represent the line which doesn't cover any area under the curve.
Mostly PDF follows Normal Distribution (Bell like Curve)

counts, bin_edges = np.histogram(iris_data['PetalLengthCm'], bins=10, density = True)pdf = counts / sum(counts)Cumulative density function (CDF)
As we know PDF describes probability distribution for Continuous random variables and PMF describes probability distribution for Discrete random variables.
Cumulative density function is a statistical term that describes the probability distribution for Continuous as well as Discrete ramdom variables.
- CDF always lies between 0 and 1.
- The percentage of data in any range can be observed from CDF
For example, if X is the height of a person selected at random then F(x) is the chance that the person will be shorter than x. If F(180 cm)=0.8. then there is an 80% chance that a person selected at random will be shorter than 180 cm (equivalently, a 20% chance that they will be taller than 180cm).
cdf = np.cumsum(pdf)Visuals
counts, bin_edges = np.histogram(iris_data['PetalLengthCm'], bins=10, density = True)pdf = counts/(sum(counts))cdf = np.cumsum(pdf)plt.plot(bin_edges[1:],pdf)plt.plot(bin_edges[1:], cdf)plt.gca().legend(('Pdf','Cdf'))plt.title('PDF and CDF For iris_setosa')plt.xlabel("Petal length")plt.ylabel("Percentage")plt.show()
NextEdit this page on GitHub