To come in
Speech therapy portal
  • Presentation on physics on the topic: "Geocentric and Heliocentric Systems of the World"
  • Ready presentation on the topic of Spain by geography
  • Presentation section on the topic Galileo Galile
  • The position of various layers of society at the end of the XIX century
  • The beginning and development of Okrichnina
  • Chemistry lesson "hydrogen sulfide
  • Asymmetry and excess random variance. Calculation of asymmetry and excess empirical distribution in Excel. Mathematical expectation and dispersion of the number of events in independent experiments

    Asymmetry and excess random variance. Calculation of asymmetry and excess empirical distribution in Excel. Mathematical expectation and dispersion of the number of events in independent experiments

    58. The coefficients of asymmetry and excesses.

    Central Mysters Distribution

    To further study the nature of the variation, the average values \u200b\u200bof different degrees of deviations of individual values \u200b\u200bof the feature from its middle arithmetic value are used. These indicators were called central moments The distribution of the procedure corresponding to the degree in which deviations are erected, or just moments.

    Distribution form indicators

    Asymmetry distribution


    The purson indicator depends on the degree of asymmetry in the middle part of the distribution range, and the asymmetry indicator, based on the moment of the third order, from the extreme signs of the feature.

    Asymmetry materiality assessment

    To estimate the materiality of asymmetry, calculate the indicator of the average quadratic error of the asymmetry coefficient

    If attitude It matters more than 2, this indicates the essential nature of asymmetry

    Excess distribution

    An indicator of Excesse
    It is the deviation of the vertex of the empirical distribution up or down ("coolness") from the top of the normal distribution curve, but! The distribution schedule may look like a steep effect depending on the forces variation of the feature: the weaker the variation, the steeper the distribution curve at a given scale. Not to mention the fact that, changing the scale along the abscissa axis and along the ordinate axis, any distribution can be artificially done "cool" and "gentle". To show what the excresses of the distribution consists, and it is necessary to interpret it correctly, it is necessary to compare the rows with the same force variation (one and the same value σ) and different indicators of the excesses. In order not to mix excess with asymmetry, all compared rows should be symmetrical. Such a comparison is shown in Fig.

    Since the excess of the normal distribution is 3, the Excesses indicator is calculated by the formula


    Estimation of the materiality of Excesse

    To assess the materiality of the excesses, calculate the indicator of its average quadratic error

    If attitude It matters more than 3, this indicates the essential nature of the excess

    Definition. ModoyM 0 discrete random variable It is called its most likely value. For a continuous random variable of a mod - such a value of a random variable in which the distribution density has a maximum.

    If the distribution polygon for a discrete random variable or the distribution curve for a continuous random variable has two or more maxima, then such a distribution is called two-modal or multimodal.

    If the distribution has a minimum, but does not have a maximum, then it is called antimodal.

    Definition. Median M D Random variance is called this value, relative to which is equally obtaining a greater or smaller value of a random variable.

    Geometrically median - the abscissa of the point in which the area, the limited distribution curve is divided in half.

    Note that if the distribution is single-variable, then the fashion and median coincide with the mathematical expectation.

    Definition. Initial momentorder k. random variable X is called mathematical expectation of x k. .

    For a discrete random variable :.

    .

    The initial moment of the first order is equal to mathematical expectation.

    Definition. Central Momentorder k. Random variable X is called mathematical expectation of magnitude

    For discrete random variable: .

    For a continuous random variable: .

    The central moment of the first order is always equal to zero, and the central moment of the second order is equal to the dispersion. The central moment of the third order characterizes the asymmetry of the distribution.

    Definition. The ratio of the central moment of the third order to the average quadratic deflection in the third degree is called asymmetry coefficient.

    Definition. For the characteristics of the islandity and flat-acting distribution, the value called excessive.

    In addition to the amounts considered, the so-called absolute moments are also used:

    Absolute initial moment :.

    Absolute central moment: .

    Quantile responding to the specified probability R, call such a value in which the distribution function takes a value equal to R. Where R- specified probability level.

    In other words kwantil there is a value of a random variable in which

    Probability R The percentage given in percent gives the name to the corresponding quantile, for example, is called 40% quantile.

    20. Mathematical expectation and dispersion of the number of events in independent experiments.

    Definition. Mathematical expectation continuous random variables whose possible values \u200b\u200bbelong to a segment called a specific integral

    If the possible values \u200b\u200bof random variance are considered on the entire numeric axis, then the mathematical expectation is by the formula:

    At the same time, of course, it is assumed that the immutable integral converges.

    Mathematical expectationthe discrete random variable is called the amount of products of its possible values \u200b\u200bto the probability corresponding to them:

    M.(H.) =h. 1 r 1 +h. 2 r 2 + … +h. p r p . (7.1)

    If the number of possible values \u200b\u200bof random variable is infinite, then
    If the resulting series converges absolutely.

    Note 1.Mathematical expectation is called sometimes weighted averageSince it is approximately equal to the average arithmetic observed values \u200b\u200bof the random variable with a large number of experiments.

    Note 2.From the determination of mathematical expectation it follows that its value is not less than the smallest possible value of the random variable and not more than the largest.

    Note 3.The mathematical expectation of the discrete random variable is nalesha(constant. In the future, we will see that it is true for continuous random variables.

    Properties of mathematical expectation.

      The mathematical expectation is constant equal to the most constant:

    M.(FROM) =FROM.(7.2)

    Evidence. If we consider FROMas a discrete random value that takes only one value FROMwith probability r\u003d 1, then M.(FROM) =FROM· 1 \u003d. FROM.

      A constant multiplier can be submitted for a sign of mathematical expectation:

    M.(SK) =CM(H.). (7.3)

    Evidence. If a random value H.set a number of distribution

    x. i.

    x. n.

    p. i.

    p. n.

    then a number of distribution for SKit has the form:

    FROMx. i.

    FROMx. 1

    FROMx. 2

    FROMx. n.

    p. i.

    p. n.

    Then M.(SK) =SK 1 r 1 +SK 2 r 2 + … +SK p r p =FROM( H. 1 r 1 +h. 2 r 2 + … +h. p r p) =CM(H.).

    Mathematical expectationcontinuous random variable called

    (7.13)

    Note 1.The general definition of the dispersion is preserved for a continuous random variable as well as for discrete (ODR. 7.5), and the formula for calculating it has the form:

    (7.14)

    The average quadratic deviation is calculated by formula (7.12).

    Note 2.If all possible values \u200b\u200bof the continuous random variable do not go beyond interval [ a., b.], the integrals in Formulas (7.13) and (7.14) are calculated within these limits.

    Theorem. The dispersion of the number of events in independent tests is equal to the product of the number of tests on the likelihood of the appearance and fault of the event in one test :.

    Evidence. Let - the number of events in independent tests. It is equal to the sum of the appearances of the event in each test :. Since the tests are independent, then random variables - independent, therefore.

    As was shown above, but.

    Then, A. .

    In this case, as mentioned earlier, the average quadratic deviation.

    When analyzing the distribution of numbers, significant interest is the assessment of the deviation of this distribution from symmetric, or, in other words, its bess. The degree of scree (asymmetry) is one of the most important properties of the distribution of numbers. There are a number of statistical indicators intended for calculating asymmetry. All of them answer at least two requirements for any score indicator: it must be dimensionless and equal to zero if the distribution is symmetrical.

    In fig. 2 A, B shows the curves of two asymmetric distributions of numbers, one of which is bevelled to the left, and the other is the right. The mutual location of fashion, medians and medium is qualitatively shown. It can be seen that one of the possible indicators of the ambulance can be built with the distance on which the average and fashion are apart. But considering the complexity of the definition of fashion according to empirical data, and on the other hand, the known relationship (3) between fashion, median and average, the following formula was proposed for calculating asymmetry indicator:

    From this formula it follows that the distributions are bevelled left, have a positive asymmetry, and the beveled to the right - negative. Naturally, for symmetric distributions for which the average and median coincide, asymmetry is zero.

    Calculate asymmetry indicators for the data given in Table. 1 and 2. To distribute the duration of the cardiac cycle, we have:

    Thus, this distribution has a small left-sided bev. The resulting value for asymmetry is approximate, and not accurate, since the values \u200b\u200band calculated by the simplified method were used for its calculation.

    To distribute sulfhydryl groups of serum, we have:

    Thus, this distribution has a negative asymmetry, i.e. Sleeping to the right.

    It is theoretically shown that the value determined by formula 13 lies within 3. But almost this value very rarely reaches its limit values, and for moderately asymmetric sashing distributions, it is usually smaller than one.

    Asymmetry indicator can be used not only for a formal description of the distribution of numbers, but also for the meaningful interpretation of the data obtained.

    In fact, if the sign observed by us is formed under the influence of a large number of reasons independently from each other, each of which makes a relatively small contribution to the value of this feature, then in accordance with some theoretical prerequisites discussed in the section on the theory of probabilities, we have the right to expect that the distribution of numbers obtained as a result of the experiment will be symmetrical. However, if the experimental data obtained a significant amount of asymmetry (numerical value AS module within a few tenths), it can be assumed that the conditions mentioned above are not respected.

    In this case, it makes sense to assume either the existence of some one or two factors whose contribution to the formation of a value observed in the experiment is significantly larger than others, or to postulate the presence of a special mechanism other than the mechanism of the independent effect of a set of reasons for the value of the observed sign.

    For example, if the changes of the values \u200b\u200bof interest to us corresponding to the action of some factor are proportional to the very magnitude and intensity of the cause, then the distribution obtained will always be bevelled left, i.e. have a positive asymmetry. With such a mechanism, there are, for example, biologists, evaluating the magnitudes associated with the growth of plants and animals.

    Another method of assessing asymmetry is based on the method of moments that will be discussed in chapter 44. In accordance with this method, for calculating asymmetry, the amount of deviations of all values \u200b\u200bof a number of data relative to the middle, erected into a third degree, i.e.:

    The third degree ensures the equality zero of the numerator of this expression for symmetric distributions, since in this case the amounts of deviations in a large and smaller side of the middle to the third degree will be equal and have opposite signs. Decision on provides dimensionlessness for asymmetry.

    Formula (14) can be converted as follows. In the previous paragraph, standardized values \u200b\u200bwere introduced:

    Thus, the mechanism measure is the average value of the standardized data erected into the cube.

    For the same data for which asymmetry was calculated by formula (13), we will find the figure according to formula (15). We have:

    Naturally, asymmetry indicators calculated by different formulasDifferent from each other in size, but equally indicate the nature of the scorn. In the packages of application programs for statistical analysis, when calculating asymmetry, formula (15) is used as gives more accurate values. For the pre-calculations using the simplest calculators, the formula (13) can be used.

    Excess.So, we looked at three of the four groups of indicators, with the help of which the distributions of numbers are described. The last of these is a group of islandity indicators, or excesses (from Greek - humpbalance). For the calculation of one of the possible indicators of the excesses, the following formula is used:

    Using the same approach that was applied when converting the formula asymmetry (14) is easy to show that:

    Theoretically was shown that the value of the excesse for a normal (Gaussian) distribution curve playing in statistics, as well as in the theory of probability a large role, is numerically equal to 3. Based on a variety of considerations, the pointedness of this curve is taken for the standard, and therefore as an indicator of Excessa Use the quantity:

    Find the value of the isostartiness for the data given in Table. 1. We have:

    Thus, the duration of the duration of cardiac cycles is flattened compared to a normal curve for which.

    In tab. 3 shows the distribution of the number of edge flowers in one of the types of chrysanthemums. For this distribution

    Excession can take very large values, as can be seen from the above example, but its lower limit can not be less than one. It turns out that if the distribution of the double (bimodally), the value of the excesse is approaching its lower boundary, so it seeks to -2. Thus, if as a result of calculations it turns out that the value is less -1-1.4, it is confident that at our disposal the distribution of numbers at least bimodally. This is especially important to take into account when the experimental data, by passing the pre-processing stage, are analyzed using the TSM and before the eyes of the researcher, there is no direct graphic image of the distribution of numbers.

    The bioreshitude of the distribution curve of experienced data may occur in many ways. In particular, such a distribution may appear by combining into a single set of two sets of heterogeneous data. To illustrate this, we artificially combined data on the width of the shells of two types of mineral mollusks in one set (Table 4, Fig. 3).

    The figure clearly shows the presence of two modes, since two sets of data from different aggregates are mixed. The calculation gives for the value of the excess 1.74, and, therefore, \u003d -1.26. Thus, the calculated value of the islandity indicator indicates, in accordance with the previously expressed position, the distribution has two vertices.

    Here you need to make one caution. Indeed, in all cases where the distribution of numbers will have two maxima, the excesses value will be close to one. However, from this fact, it is impossible to automatically conclude that the analyzed set of data is a mixture of two heterogeneous samples. Firstly, this mixture, depending on the number of components of its aggregates, may not have two vertices, and the Excessa indicator will be much more than one. Secondly, two modes may also have a homogeneous sample if, for example, the requirements for the selection of experimental data are violated. Thus, in this, as, indeed, in other cases, after a formal calculation of various statistics, a thorough professional analysis should be carried out, which will allow the data obtained by the meaningful interpretation.

    To obtain an approximate representation of the form of the distribution of random variance, the graph of its range of distribution (polygon and histogram), function or distribution density are built. In the practice of statistical research, it is necessary to meet with various distributions themselves. Homogeneous sets are characterized, as a rule, simer distributions. Multipleness indicates the heterogeneity of the aggregate studied. In this case, data is needed to allocate more homogeneous groups.

    Clarifying the overall nature of the distribution of a random value involves an assessment of the degree of its homogeneity, as well as the calculation of asymmetry and excesses. In a symmetric distribution in which the mathematical expectation is equal to the median, i.e. , It can be considered asymmetry is absent. But the more noticeable asymmetry, the greater the deviation between the characteristics of the distribution center - mathematical expectation and median.

    The simplest coefficient of the asymmetry of the distribution of a random variable can be considered where it is a mathematical expectation - a median, a - standard deviation of a random variable.

    In the case of right-sided asymmetry, left-sided -. If it is believed that asymmetry is low, if the average, and when it is high. The geometric illustration of the right-hand and left-sided asymmetry is shown in the figure below. It shows the graphs of the distribution density of the corresponding types of continuous random variables.

    Picture. Illustration of right-hand and left-sided asymmetry on graphs of density densities of continuous random variables.

    There is another random variable distribution asymmetry coefficient. It can be proved that the difference from zero of the central moment of the odd order indicates asymmetry of the distribution of random variable. In the previous indicator, we used an expression similar to the point of first order. But usually in this other asymmetry coefficient use the central moment of the third order. , And in order for this coefficient to be dimensionlessly divide it to the cube of standard deviation. It turns out such a coefficient of asymmetry: . For this, asymmetry coefficient, as well as for the first in the case of right-sided asymmetry, left-sided -.

    Excess random variable

    The excess distribution of a random variable characterizes the degree of concentration of its values \u200b\u200bnear the distribution center: the higher such a focus, the higher the density schedule of its distribution. The exponent of the excesses (isochnost) is calculated by the formula: where - This is a central point of 4 orders, and is the standard deviation, erected in 4 degree. Since the degrees of the numerator and the denominator are the same excess is a dimensionless value. At the same time, it was customary for the absence of excess, zero excesses, to take a normal distribution. But you can prove that for normal distribution. Therefore, in the formula for calculating the excesse from this fraction, the number 3 is deducted.

    Thus, for the normal distribution, the excess is zero :. If the excess is greater than zero, i.e. , then the distribution is more islandish, than normal. If the excess is less than zero, i.e. , then the distribution is less islandish, than normal. The limit value of the negative excesse is the value; The magnitude of the positive excesse can be infinitely large. What graphs of isher-shop and flat-crossing densities of the distribution of random variables in comparison with the normal distribution are shown in the figure.

    Picture. An illustration of isochhane and flat-term densities of the distribution of random variables in comparison with normal distribution.

    Asymmetry and excess distribution of random variance show how deviates from normal law. For large asymmetry and excesses, apply calculation formulas for normal distribution should not be. What is the level of admissibility of asymmetry and the excesses to use the formulas of the normal distribution in the analysis of data of a particular random variable must determine the researcher on the basis of its knowledge and experience.

    Asymmetry is calculated by the SCO function. Its argument is the interval of cells with data, for example, \u003d SCOS (A1: A100), if the data is contained in the cell interval from A1 to A100.

    The excess is calculated by the excess function, the argument of which is numerical data specified, as a rule, in the form of cell intervals, for example: \u003d Excession (A1: A100).

    §2.3. Analysis tool Descriptive statistics

    IN Excel It is possible to calculate all the point characteristics of the sample immediately using the analysis tool. Descriptive statisticswhich is contained in Pack of analysis.

    Descriptive statistics Creates a table of basic statistical characteristics for the totality of data. This table will contain the following characteristics: average, standard error, dispersion, standard deviation, mode, median, range of variation of the interval, maximum and minimum values, asymmetry, excess, total volume, sum of all elements of the totality, confidence interval (reliability level). Tool Descriptive statistics Significantly simplifies statistical analysis the fact that it disappears the need to cause each function to calculate the statistical characteristics separately.

    In order to cause Descriptive statistics, follow:

    1) in the menu Service Select a command Data analysis;

    2) in the list Analysis tools dialog box Data analysisselect tool Descriptive statistics and press OK.

    In the window Descriptive statistics Need:

    · in a group Input data in field Input interval specify the cell interval containing data;

    · If the first line in the input range contains a column header, then in the label field in the first lineyou should put a tick;

    · in a group Output parameters activate the switch (put a check mark) Final statisticsif needed full list characteristics;

    · Enable switch Reliability level and specify reliability in% if it is necessary to calculate the confidence interval (default reliability is 95%). Press OK.

    As a result, a table appears with the calculated values \u200b\u200bof the above statistical characteristics. Immediately, without dropping allocations of this table, execute the command Format® Column® Width auto production.

    View of the dialog box Descriptive statistics:

    Practical tasks

    2.1. Calculation of basic point statistical characteristics using standard functions Excel

    The same voltmeter was measured 25 times the voltage on the chain section. As a result of experiments, the following voltage voltage values \u200b\u200bwere obtained:

    32, 32, 35, 37, 35, 38, 32, 33, 34, 37, 32, 32, 35,

    34, 32, 34, 35, 39, 34, 38, 36, 30, 37, 28, 30.

    Find middle, selective and corrected dispersion, standard deviation, scope of varying, fashion, median. Check the deviation from the normal distribution, calculating asymmetry and excess.

    To perform this task, do the following items.

    1. Type the results of the experiment in Column A.

    2. In the cell B1, dial "average", in B2 - "selective dispersion", in B3 - "Standard Deviation", in B4 - "Fixed Dispersion", in B5 - "Fixed Standard Deviation", in B6 - "Maximum", In B7 - "Minimum", in B8 - "Variation Space", in B9 - "Fashion", in B10 - "Mediana", in B11 - "Asymmetry", in B12 - "Ekszhess".

    3. Align the width of this column with Automotive widths.

    4. Select the C1 cell and press the button with the "\u003d" sign in the formula row. Via Masters functions In category Statistical Find the SRVNAF function, then highlight the interval of the data cells and click OK.

    5. Highlight the C2 cell and press the sign \u003d in the formula row. Via Masters functions In category Statistical Find the display function, then highlight the interval of the data cells and click OK.

    6. Do yourself similar actions to calculate the remaining characteristics.

    7. To calculate the scope of variation in the C8 cell, enter the formula: \u003d C6-C7.

    8. Add one line in front of your table into which the headlines of the corresponding columns are: "characteristic name" and "numerical values".