Probability Densities,
A Measure of Existence.

After seeing that many of the disciplines of logic have a common graphical representation, as seen here in the graphical representation of logic, this article is a brief introduction to probability density functions which gives information about the existence of samples at various locations in sample space.

You may link to this page and print this page and use it as you wish as long as the URL, Webpage address, is also included.

If you find errors or if you know of any brief way of making this information more understandable, please don't hesitate to email me with your suggestions at email.

If you wish a more mathematically oriented in depth introduction to this subject, go here.

Now, a mathematical description of samples in regions would require that a grid of a coordinate system be imposed on the entire sample space. The regions would need to be described in this coordinate system by mathematical functions. And the existence of samples at various locations must also be expressed by mathematical functions. For example, Fig_39 below shows a sample space on which an x-y coordinate system is imposed. From here on, the discussion becomes increasingly more mathematical.

The sample space itself is labeled "S" on the left. The coordinate system that is imposed over it is labeled "T" on the right. A property or event is described in sample space on the left by a circular region which encloses the samples. When described in a coordinate system on the right, the region is a circle specified with the equation  x2+y2=r2, where r is about 1.9. In the sample space on the left the individual samples are labeled and the distinction between size and color is obvious. However, in the coordinate system on the right, the location of each sample is specified by its (x,y) coordinates. It is not necessary to show the different sizes and colors of the samples in the coordinate system, a simple dot for each sample would have served just as well. What is important is that there be a mathematical function that describes a location for each sample. The function giving the location of each sample is called a density function. For by specifying the location of each sample, it is shown how densely packed the samples are. When the density function gives the probability of a sample at a given location, then it is called a probability density function. In this example, if it is assumed that the odds of selecting any one sample is the same as for any other, then the probability density function, p(x,y), is:

p(x,y)=0.02857 for,  x = integer from -3 to 3, and y = integer from -2 to 2.
         =0.0        otherwise.

which means that as the values of x and y vary continuously within the grid, the value of p(x,y) is zero except where x and y are whole numbers such that -3<x<3, and -2<y<2, in which case p(x,y)=1/35. The samples are said to be a discrete distribution since only at certain discrete values of x and y does p(x,y) have a none zero value which indicates the existence of samples.

The probability of the event A would then be the addition of probabilities of all the samples within the circular region, or

Notation for this sum can be simplified using the sigma symbol, , as follows:

where the bold face x is vector notation representing the point (x,y),
and where the underneath the sigma means that p(x,y) is added once for each location of (x,y) that is an element of the region A for which p(x,y) has a value.

Or, upon substituting values in this example, we get

.

Notice in this example that for every points in the sample space and that the summation of all the probabilities for all samples in the sample space adds up to . For there must be a 100% chance of choosing at least one sample from all the possible samples.

The probability density function, p(x,y), does not need to be a constant for all values of x and y as used above. Referring to Fig_39 again, if the probability increases downward with size, and if the probability also increases to the right with color, then a probability density function would be:

p(x,y) = (x + 4)(3 - y)/420,  where x and y are integers in the interval, -3<x<3, and -2<y<2.

The probability of the sample at (-3, 2) is p(-3, 2) = 0.238%, the sample at (-3, -2) is p(-3, -2) = 1.19%, and p(3,-2) = 8.33%. The probability of the sample at (0,0) is p(0,0) = 2.857%, same as before. And the probability of the event A is 

P(A) = 25.71%.

It is only a coincidence that this is the same value as before. And there can be probability density functions described with other mathematical functions. But the properties that a probability density function must have are:

where means to include in the sum the probability calculated at each (x,y) in the coordinate system that covers the entire sample space.

The properties a) and b) listed above are chosen for convenience and are commonly used in the study of probabilities. However, I suppose no loss of information would occur if a different origin and scale were used so that property a) and b) did not hold.

The properties above apply to discrete distributions of samples, but there are also continuous probability distributions. The density function, p(x,y), for a continuous distributions would not have non-zero values only at specific coordinates; it would have gradual changes in value as x and y change.

A continuous distribution can be understood by imagining a sample space with millions of billions of trillions of quadrillions of samples scattered throughout. Some places may have more densely packed samples than others. But there are no abrupt, instantaneous changes in the number of samples per region from one point to the next. There are always intermediate values at some level between the two. And since the probability of the entire sample space that equals 1 must be divided among such a great number of samples, the probability for any one sample is infinitesimally small. No point has a probability in a continuous distribution, only regions have a probability. There is no probability per sample, only a probability per region. For regions do contain a proportion of the entire sample space, but points cannot. So no equation can be given to specify the probability at a given point. Instead, the continuous probability density function specifies the probability per unit region. And the only way to calculate the probability for the event of a large region is to add up the probabilities of many smaller regions within it.

For example, Fig_40 below is an attempt to show what Fig_39 would look like if the samples were distributed continuously. This is only an approximation. The abrupt changes in color and shading are obvious with close inspection. Try to imagine that there are intermediate levels between these changes so that colors and shading change in a smooth, continuous fashion. It helps to look at it from a distance. The picture to the right is a close up view of the small portion of A to the left.

 

Suppose that the probability per unit area increases downward with darker shades and increases to the right with less red and more violet. Then a continuous probability density function can be written for this sample space as:

  

where T is the coordinate system defined for and

To calculate the probability of the event A, the region is divided up into a great number of little square sections like the one shown inside the circle. A magnified view of this is shown to the right. The square has a width measuring dx and a height measuring dy. The lower case letter "d" in dx and dy means a "small difference" in x and y. The probability density may vary somewhat from one side of the small square to the other. But as the square gets smaller and smaller, the probability density anywhere within the square approaches the value calculated for (x,y) inside it. The probability of the small square is the probability per unit area calculated at (x,y) multiplied by the area of the square, dx times dy. This, then, is equal to a small portion of probability called dP, or:

dP = p(x,y)dxdy .

In order to then get the total probability for an event, we have to sum up all these small contributions. The sigma symbol, , is used for summing up discrete functions. But for continuous functions, the integral sign, , is used when summing up an ever increasing number of ever smaller portions. Thus,

The A under the integral sign indicates that we are summing up only those portions that are within the region of the event A. Altogether, this represents the process of integration studied in the mathematical subject of calculus. 

When applied to our example above:

When the mathematical procedure for calculating this integral is done, assuming the region A is a circle of radius 1.919, the probability of the event A is seen to be 25.71%. When the function, p(x,y), is integrated throughout the entire region of the sample space, T, which is a rectangle for      

Thus, the probability of the total sample space is:

And there can be probability density functions described with other mathematical functions as well. But the properties that a continuous probability density function must have are:

 

It is possible, however, to write a discrete probability function in terms of a continuous probability function. This is done with the use of a special function called the Dirac delta function, symbolized as (x). This is also called the impulse function. The important properties of the impulse function in one dimension are:

 

and

where A is any region that contains the point a, no matter how small the region A is as long as it contains the point a.

With the above properties we can prove a very useful feature of the impulse function. Since for all values of x that are not equal to a, 

where the region specified by a+r  under the right integral is a small line segment of distance 2r centered at a. The above equation is true even when the value of r is allowed to get smaller and smaller, as long as r 0. As r approaches zero, the value of f(x) will approach the value of f(a) so that it can be considered a constant and taken out of the integral. The remaining integral is simply the integral of the impulse function which equals 1, or,

In other words, the integral of a function multiplied by an impulse function is just the value of the function at the location of the impulse. The Impulse function is defined similarly for a function of many variables. For example, in a 2 dimensional x-y coordinate system, 

and

(x) = (x,y) = 0,   for x 0, and y 0,

where x represents any arbitrary point (x,y), and where a represents the specific location where the impulse occurs.

This can be used to turn a continuous probability density function into a discrete density. For instance, in the previous example of a discrete probability density function, we had,

p(x,y) = (x + 4)(3 - y)/420, 

where x and y were only allowed to be integers in the interval, -3<x<3, and -2<y<2. We can relax that restriction and let x and y take on any real number value between -4<x<4 and -3<y<3 if we also multiply that function by

where xn represents every point (x,y) that is inside the region of A for which x and y are integers. Then applying the integral formula f) above for this continuous probability densities will result in the formula c) above for discrete distributions, as shown below.

 

All the examples above were derived using a two dimensional x-y coordinate system. This is because it was easy to show the intersection and union of regions on a two dimensional screen. But the concept of regions can apply to any dimension. For example, Fig_41 shows overlapping regions is one dimension.

where region A is the green number line segment from 2 to 7, and the region B is the purple line segment from 5 to 10. The union of the two regions is a line from 2 to 10, and the intersection of the two regions is a line from 5 to 7. We could just as easily define a probability function for values of x along the x axis above as we did for (x,y) points in the two dimensional graphs. The density function, p(x), would be a function of the single variable x alone. It would specify the probability per unit length, and then multiplying this by a small length, dx, would give, p(x)dx, the probability of the small line segment around the point x. To find the probability of any event, we would sum up all of these smaller sub-regions in the larger region of that event. 

Fig_42 below shows overlapping regions in three dimensions. The region A is the sphere on the left; region B is the sphere on the right. The region labeled is where the the regions A and B overlap.

 

The probability density function in three dimensions, p(x,y,z), would be a function of the variables, x, y, and z. It would specify the probability per unit volume, and then multiplying this by a small volume, dV=dxdydz, would give, p(x,y,z)dxdydz, the probability of the small volume around the point (x,y,z). To find the probability of any event, we would sum up all of these smaller sub-regions in the larger region of that event.    

In one dimension, the probability density is a function of one variable, p(x). The coordinate system imposed on the sample space is a line. The variable x can be any real number on this line segment. If we let R represent the set of all possible real numbers, then xR. And if we let S represent the portions of all real numbers on the line segment that covers the sample space, then S is a subset of R, written, SR. The probability of some event, A, in one dimensional probability space is

where A is a one dimensional line segment within S.

In two dimensions, the probability density is a function of two variables, p(x,y). The coordinate system imposed on the sample space is an area which we may call S. The point (x,y) can be any point in this area. The values of x and y are chosen independently from one another, with xR, and also yR. This is also written as (x,y)R2. Note that SR2. The probability of some event, A, in two dimensional probability space is

where A is a two dimensional region within S.

In three dimensions, the probability density is a function of three variables, p(x,y,z). The coordinate system imposed on the sample space is a volume which we may call S. The point (x,y,z) can be any point in the volume of S. The values of x, y, and z are chosen independently from each other, with , , and , written as , and . The probability of some event, A, in three dimensional probability space is

We can generalize this approach to higher dimensions. For example, in four dimensions the density function is a function of 4 variables, p(x,y,z,w). The coordinate system imposed on the sample space is not a volume in the traditional 3-D sense. But for lack of a better term, we can call this a generalized volume, in this example a 4-D volume. Each of x or y or z or w is an element of the set of real numbers, R. Or, if we let x be the vector (x,y,z,w), then . The probability of some event, A, in four dimensional probability space is

And to be completely general, let x be the vector in "n" dimensional space. Then the n-dimensional density function can be written as , with . And,

where

The vector, x, is also called a random vector and each of xi, whether it be x1 or x2 or x3 or xn, is called a random variable because we are free to independently chose at random the value of each. An event is occupied by a range of values for each of the xi. And there is a probability associated with any region enclosing the point x.

In the most general case, it can be assumed that a probability density function is for a continuous distribution. If it turns out that the distribution is discrete, then the density function will contain some impulse functions. 

And various disciplines of logic use the information contained in the probability density function to varying degrees. The study of Probabilities uses all the information contained in the density function in the various regions of sample space. Predicate Logic is only concerned with whether the density function is zero or non-zero in the regions of interest. And Propositional Logic is only concerned with the relationship between regions in sample space.