To come in
Speech therapy portal
  • How to gain self-confidence, achieve calmness and increase self-esteem: discovering the main secrets of Gaining self-confidence
  • Psychological characteristics of children with general speech underdevelopment: features of cognitive activity Mental characteristics of children with onr
  • What is burnout at work and how to deal with it How to deal with burnout at work
  • How to Deal with Emotional Burnout Methods for Dealing with Emotional Burnout
  • How to Deal with Emotional Burnout Methods for Dealing with Emotional Burnout
  • Burnout - How To Deal With Work Stress How To Deal With Emotional Burnout
  • Parabolic regression. Parabolic regression equation. Correlation analysis in Excel

    Parabolic regression.  Parabolic regression equation.  Correlation analysis in Excel

    The following data is available different countries on the index of retail prices for food (x) and on the index of industrial production (y).

    Retail price index for food (х)Industrial production index (y)
    1 100 70
    2 105 79
    3 108 85
    4 113 84
    5 118 85
    6 118 85
    7 110 96
    8 115 99
    9 119 100
    10 118 98
    11 120 99
    12 124 102
    13 129 105
    14 132 112

    Required:

    1. To characterize the dependence of y on x, calculate the parameters of the following functions:

    A) linear;

    B) power-law;

    C) equilateral hyperbole.

    3. Assess the statistical significance of the regression and correlation parameters.

    4. Make a forecast of the value of the index of industrial production y at the predicted value of the index of retail prices for food x = 138.

    Solution:

    1. To calculate the parameters of linear regression

    We solve the system of normal equations for a and b:

    Let's build a table of calculated data, as shown in table 1.

    Table 1 Calculated data for evaluating linear regression

    P / p No.NSathux 2y 2
    1 100 70 7000 10000 4900 74,26340 0,060906
    2 105 79 8295 11025 6241 79,92527 0,011712
    3 108 85 9180 11664 7225 83,32238 0,019737
    4 113 84 9492 12769 7056 88,98425 0,059336
    5 118 85 10030 13924 7225 94,64611 0,113484
    6 118 85 10030 13924 7225 94,64611 0,113484
    7 110 96 10560 12100 9216 85,58713 0,108467
    8 115 99 11385 13225 9801 91,24900 0,078293
    9 119 100 11900 14161 10000 95,77849 0,042215
    10 118 98 11564 13924 9604 94,64611 0,034223
    11 120 99 11880 14400 9801 96,91086 0,021102
    12 124 102 12648 15376 10404 101,4404 0,005487
    13 129 105 13545 16641 11025 107,1022 0,020021
    14 132 112 14784 17424 12544 110,4993 0,013399
    Total: 1629 1299 152293 190557 122267 1299,001 0,701866
    Mean: 116,3571 92,78571 10878,07 13611,21 8733,357 NS NS
    8,4988 11,1431 NS NS NS NS NS
    72,23 124,17 NS NS NS NS NS

    The average value is determined by the formula:

    The mean square deviation is calculated by the formula:

    and enter the result into table 1.

    Squaring the resulting value, we get the variance:

    Equation parameters can also be determined by the formulas:

    So the regression equation is:

    Consequently, with an increase in the retail food price index by 1, the industrial production index increases by an average of 1.13.

    Let's calculate the linear pair correlation coefficient:

    The connection is direct, rather close.

    Let's define the coefficient of determination:

    The 74.59% variation in the result is explained by the variation in factor x.

    Substituting the actual values ​​of x into the regression equation, we determine the theoretical (calculated) values.

    therefore, the parameters of the equation are defined correctly.

    Let's calculate the average approximation error - the average deviation of the calculated values ​​from the actual ones:

    On average, the calculated values ​​deviate from the actual ones by 5.01%.

    We will evaluate the quality of the regression equation using the F-test.

    The F-test is to test the hypothesis H 0 about the statistical insignificance of the regression equation and the indicator of the tightness of the connection. For this, a comparison of the actual F fact and the critical (tabular) F table of the F-Fisher's test values ​​is performed.

    F fact is determined by the formula:

    where n is the number of units in the population;

    m is the number of parameters for variables x.

    The obtained estimates of the regression equation allow us to use it for forecasting.

    If the predicted value of the retail price index for food is x = 138, then the predicted value of the industrial production index will be:

    2. Power regression looks like:

    To determine the parameters, the logarithm of the power function is performed:

    To define parameters logarithmic function build a system of normal equations according to the method of least squares:

    Let's build a table of design data, as shown in table 2.

    Table 2 Calculated data for assessing power regression

    N / aNSatlg xlg ylg x * lg y(lg x) 2(lg y) 2
    1 100 70 2,000000 1,845098 3,690196 4,000000 3,404387
    2 105 79 2,021189 1,897627 3,835464 4,085206 3,600989
    3 108 85 2,033424 1,929419 3,923326 4,134812 3,722657
    4 113 84 2,053078 1,924279 3,950696 4,215131 3,702851
    5 118 85 2,071882 1,929419 3,997528 4,292695 3,722657
    6 118 85 2,071882 1,929419 3,997528 4,292695 3,722657
    7 110 96 2,041393 1,982271 4,046594 4,167284 3,929399
    8 115 99 2,060698 1,995635 4,112401 4,246476 3,982560
    9 119 100 2,075547 2,000000 4,151094 4,307895 4,000000
    10 118 98 2,071882 1,991226 4,125585 4,292695 3,964981
    11 120 99 2,079181 1,995635 4,149287 4,322995 3,982560
    12 124 102 2,093422 2,008600 4,204847 4,382414 4,034475
    13 129 105 2,110590 2,021189 4,265901 4,454589 4,085206
    14 132 112 2,120574 2,049218 4,345518 4,496834 4,199295
    Total 1629 1299 28,90474 27,49904 56,79597 59,69172 54,05467
    Mean 116,3571 92,78571 2,064624 1,964217 4,056855 4,263694 3,861048
    8,4988 11,1431 0,031945 0,053853 NS NS NS
    72,23 124,17 0,001021 0,0029 NS NS NS

    Continuation of table 2 Calculated data for assessing power regression

    N / aNSat
    1 100 70 74,16448 17,34292 0,059493 519,1886
    2 105 79 79,62057 0,385112 0,007855 190,0458
    3 108 85 82,95180 4,195133 0,024096 60,61728
    4 113 84 88,59768 21,13866 0,054734 77,1887
    5 118 85 94,35840 87,57961 0,110099 60,61728
    6 118 85 94,35840 87,57961 0,110099 60,61728
    7 110 96 85,19619 116,7223 0,11254 10,33166
    8 115 99 90,88834 65,79901 0,081936 38,6174
    9 119 100 95,52408 20,03384 0,044759 52,04598
    10 118 98 94,35840 13,26127 0,037159 27,18882
    11 120 99 96,69423 5,316563 0,023291 38,6174
    12 124 102 101,4191 0,337467 0,005695 84,90314
    13 129 105 107,4232 5,872099 0,023078 149,1889
    14 132 112 111,0772 0,85163 0,00824 369,1889
    Total 1629 1299 1296,632 446,4152 0,703074 1738,357
    Mean 116,3571 92,78571 NS NS NS NS
    8,4988 11,1431 NS NS NS NS
    72,23 124,17 NS NS NS NS

    Solving the system of normal equations, we determine the parameters of the logarithmic function.

    We get a linear equation:

    Having performed its potentiation, we get:

    Substituting the actual values ​​of x into this equation, we obtain the theoretical values ​​of the result. Based on them, we will calculate the indicators: the tightness of the connection - the correlation index and the average approximation error.

    The connection is quite close.

    On average, the calculated values ​​deviate from the actual ones by 5.02%.

    Thus, H 0 - the hypothesis about the random nature of the evaluated characteristics is rejected and their statistical significance and reliability is recognized.

    The obtained estimates of the regression equation allow us to use it for forecasting. If the predicted value of the retail price index for food is x = 138, then the predicted value of the industrial production index will be:

    To determine the parameters of this equation, a system of normal equations is used:

    Let's make the change of variables

    and we get the following system of normal equations:

    Solving the system of normal equations, we determine the parameters of the hyperbola.

    Let's compose a table of calculated data, as shown in table 3.

    Table 3 Calculated data for assessing hyperbolic dependence

    N / aNSatzyz
    1 100 70 0,010000000 0,700000 0,0001000 4900
    2 105 79 0,009523810 0,752381 0,0000907 6241
    3 108 85 0,009259259 0,787037 0,0000857 7225
    4 113 84 0,008849558 0,743363 0,0000783 7056
    5 118 85 0,008474576 0,720339 0,0000718 7225
    6 118 85 0,008474576 0,720339 0,0000718 7225
    7 110 96 0,009090909 0,872727 0,0000826 9216
    8 115 99 0,008695652 0,860870 0,0000756 9801
    9 119 100 0,008403361 0,840336 0,0000706 10000
    10 118 98 0,008474576 0,830508 0,0000718 9604
    11 120 99 0,008333333 0,825000 0,0000694 9801
    12 124 102 0,008064516 0,822581 0,0000650 10404
    13 129 105 0,007751938 0,813953 0,0000601 11025
    14 132 112 0,007575758 0,848485 0,0000574 12544
    Total: 1629 1299 0,120971823 11,13792 0,0010510 122267
    Mean: 116,3571 92,78571 0,008640844 0,795566 0,0000751 8733,357
    8,4988 11,1431 0,000640820 NS NS NS
    72,23 124,17 0,000000411 NS NS NS

    Continuation of table 3 Calculated data for the assessment of hyperbolic dependence

    1. Which of the following measurements belong to the class of names of measuring scales:
    a) numbers encoding temperament;


    d) telephone numbers.

    2. Which of the following measurements are in the scale order class:

    b) academic rank as a measure of career advancement;
    c) metric distance measurement system;
    d) telephone numbers.

    3. Which of the following measurements belongs to the class of ratios of measuring scales:
    a) numbers encoding temperament;
    b) academic rank as a measure of career advancement;
    c) metric distance measurement system;
    d) telephone numbers.

    4. Which of the following features are quantitative types:

    b) family ties of family members;
    c) gender and age of the person;
    d) the social status of the depositor;
    e) the number of children in the family;
    f) retail trade turnover of trade enterprises.

    5. Which of the following features belong to qualitative species:
    a) the number of employees in the firm;
    b) family ties of family members;
    c) gender and age of a person;
    d) the social status of the depositor;
    e) the number of children in the family;
    f) retail trade turnover of trade enterprises.

    6. What scale is used to measure the level of human intelligence:
    a) names;
    b) ordinal;
    c) interval;
    d) relationships.

    7. The standard deviation is:
    a) the square of the range of the variation series;
    b) the square root of the variance;
    c) the square of the coefficient of variation;
    G) Square root from the magnitude of the variation range.

    8. The coefficient of variation of the series is determined by the ratio:
    a) the standard deviation to the arithmetic mean of the series;
    b) variance to the median of the series;
    c) variance to the maximum value of the series;
    d) the absolute indicator of variation to the arithmetic mean of the series.

    9. Fashion of this variation series

    x 10 15 35
    n 1 2 3

    this is:
    a) 20;
    b) 16;
    at 3;
    d) 35.

    10. The arithmetic mean of the population is:
    a) the value of the feature in the middle of the variation series;
    b) half-difference of the maximum and minimum values ​​of the variation series;
    c) the half-sum of the maximum and minimum values ​​of the variation series;
    d) the ratio of the sum of all values ​​of the population to their total number.

    11. Known data on the length of service of seven shop assistants: 2; 3; 2; 5; ten; 7; 1 years old. Find the average value of their work experience.
    a) 4.3 years;
    b) 5 years;
    c) 3 years;
    d) 3.8 years.

    12. The distribution range is:
    a) sequence of sample data;
    b) the ordered arrangement of data on a quantitative basis;
    c) numerical sequence of data;
    d) a sequence of values, sorted by quality.

    13. The frequency of the variants of the variation series is called:
    a) the size of the sample;
    b) the value of the variants of the variation series;
    c) the number of individual variants or groups of the variation series;
    d) the number of groups of the variational series.

    14. Fashion is:
    a) the maximum value of the attribute of the population;
    b) the most common meaning of the feature;
    c) the arithmetic mean of the population.

    15. Known data on the length of service of shop assistants: 2; 3; 2; 5; ten; 7; 1. Find the median of their work experience:
    a) 4.5 years;
    b) 4.3 years;
    c) 3 years;
    d) 5 years.

    16. The variation range of this variation series:
    x 10 15 20 30
    n 1 2 3 2

    this is:
    a) 15;
    b) 10;
    c) 30;
    d) 20.

    17. The size of the ordered row divides in half:
    a) fashion;
    b) arithmetic mean;
    c) average harmonic;
    d) median.

    18. Statistical grouping is:
    a) consolidation or separation of data on essential grounds;
    b) scientific organization of statistical observation;
    c) types of reporting;
    d) direct collection of bulk data.

    19. The oscillation coefficient is:
    a) absolute indicator;
    b) the average;
    c) the relative indicator of variation.

    20. The variance of the variation series characterizes:
    a) the average value of individual characteristics;
    b) scattering of individual values ​​of attributes from the average value;
    c) standard deviation.

    21. The equation of the straight-line regression function reflects the dynamics of development:
    a) with variable acceleration;

    c) uniform;
    d) uniformly accelerated.

    22. If the value of the correlation coefficient is 0.6, then on the Cheddka scale:
    a) there is practically no connection;
    b) the connection is weak;
    c) the connection is moderate;
    d) the connection is strong.

    23. The data represent the estimates of adults in the Stanford-Binet IQ test 104, 87, 101, 130, 148, 92, 97, 105, 134, 121. Find the range of variation:
    a) 61;
    b) 60;
    c) 75.

    24. Find the weighted arithmetic mean for the following interval series:

    li ni
    10-14 1
    15-19 1
    20-24 4
    25-29 2
    30-34 4

    a) 24;
    b) 24.92;
    c) 25.38.

    25. Calculate the median of the next row 2.1; 1.5; 1.6; 2.1; 2.4:
    a) 2;
    b) 1.5;
    c) 2.1.

    26. Calculate the mode of the next interval series

    frequency 5-7 8-10 11-13 14-16
    interval 4 7 26 41

    a) 14;
    b) 14.54;
    c) 15.23;

    27. Which of the following measurements belong to the class of names of measuring scales:
    a) the diagnosis of the patient;
    b) license plates;
    c) the hardness of the mineral;
    d) calendar time;
    e) the weight of the person.

    28. Which of the following measurements belong to the class of ordinal measuring scales:
    a) the diagnosis of the patient;
    b) license plates;
    c) the hardness of the mineral;
    d) calendar time;
    e) the weight of the person.

    29. Which of the following measurements belong to the class of interval measuring scales:
    a) the diagnosis of the patient;
    b) license plates;
    c) the hardness of the mineral;
    d) calendar time;
    e) the weight of the person.
    30. Which of the following measurements belong to the class of ratios of measuring scales:
    a) the diagnosis of the patient;
    b) license plates;
    c) the hardness of the mineral;
    d) calendar time;
    e) the weight of the person.

    31. What scale is used when measuring time:
    a) interval;
    b) relationships;
    c) Chaddock.

    32. The following characteristics are referred to quantitative types:
    a) human height;
    b) awards for merit;
    c) eye color;
    d) car numbers.

    33. Qualitative types include the following characteristics:
    a) human height;
    b) awards for merit;
    c) eye color;
    d) car numbers

    34. Calculate fashion

    xi 5 8 10 13 14
    ni 7 4 5 9 1

    a) 10;
    b) 11;
    c) 13

    35. In large numbers of students in classes, there is less progress in acquiring knowledge for a quarter than in small classes. What is an effective trait?
    a) the number of students in the class;
    b) success in acquiring knowledge,
    c) the number of students with success in acquiring knowledge.

    36. The length of an interval in an interval row is:
    a) the range of variation divided by the arithmetic mean;
    b) the range of variation divided by the number of groups;
    c) variance divided by sample size.

    37. An example of paired correlation: Students who learn to read earlier than others tend to perform better. Which of these signs: ability to read early or high academic performance of a student is a factor?
    a) the ability to read early;
    b) high academic performance;
    c) none of them.

    38. Which of the following methods can be used when comparing the means of three or more samples:
    a) Student's test;
    b) Fisher's test;
    c) analysis of variance.

    39. Sample size of the variation series

    xi 10 15 20 30
    ni 1 2 3 2

    a) 5;
    b) 8;
    at 12;
    d) 30.

    40. Variational series mode

    xi 10 15 20 25
    ni 1 5 4 3

    a) 15;
    b) 5;
    c) 23;
    d) 3.

    41. The equation of the parabolic regression function reflects the dynamics of development:
    a) with variable acceleration;
    b) with a slowdown in growth at the end of the period;
    c) uniform;
    d) uniformly accelerated.

    42. The regression coefficient B shows:
    a) the expected value of the dependent variable at a zero value of the predictor
    b) the expected value of the dependent variable when the predictor changes by one
    c) the probability of a regression error
    d) this issue has not yet been finally resolved

    43. Sample is:
    a) the whole set of objects about which the researcher's reasoning is built;
    b) many objects available for empirical research;
    c) all possible variance values;
    d) the same as randomization.

    44. Which of the following correlation coefficients demonstrates the greatest relationship of variables:
    a) -0.90;
    b) 0;
    c) 0.07;
    d) 0.01.

    45. The general population is:
    a) the whole set of objects about which the researcher's reasoning is built;
    b) many objects available for empirical research;
    c) all possible values ​​of the mathematical expectation;
    d) normal distribution.

    46. ​​How do the sample and population sizes compare?
    a) the sample is usually much smaller than the general population;
    b) the general population is always smaller than the sample;
    c) the sample and the general population almost always coincide;
    d) there is no correct answer.

    47. The point-biserial correlation coefficient is a special case of the correlation coefficient:
    a) Spearman;
    b) Pearson;
    c) Kendala;
    d) all answers are correct.

    48. At what minimum level significance is it customary to reject the null hypothesis?
    a) 5% level
    b) 7% level
    c) 9% level
    d) 10% level

    49. Which of the following methods is usually used when comparing means in two normal samples:
    a) Student's test;
    b) Fisher's test;
    c) univariate analysis of variance;
    d) correlation analysis.

    50. With the help of which statistical hypotheses are tested:
    a) statistician;
    b) parameters;
    c) experiments;
    d) observation.

    51. Which of the following values ​​of the correlation coefficient is impossible:
    a) -0.54;
    b) 2.18;
    c) 0; d) 1.

    52. What transformation needs to be done when comparing two correlation coefficients:
    a) Student's t;
    b) Fisher;
    c) Pearson;
    d) Spearman.

    53. What is the median of distribution:
    a) the same as the bisector;
    b) the same as fashion;
    c) arithmetic mean;
    d) 50% distribution quantile;
    e) there is no correct answer.

    54. Point-bisserial correlation coefficient is a special case of the correlation coefficient:
    a) Spearman;
    b) Pearson;
    c) Kendall;
    d) all answers are correct.

    55. Which of the following variables is discrete:
    a) type of temperament;
    b) the level of intelligence;
    c) reaction time;
    d) all answers are correct.

    56. In what range can the correlation coefficient change:
    a) from –1 to 1;
    b) from 0 to 1;
    c) from 0 to 100;
    d) in any.

    57. About what statistical hypotheses are put forward:
    a) concepts;
    b) statistician;
    c) samples;
    d) parameters.

    58. What is the name of the nonparametric analogue analysis of variance:
    a) Student's test;
    b) the Kruskal-Wallis method;
    c) Wilcoxon test;
    d) Mann-Whitney test.

    59. The concept of the correlation coefficient was first developed in the works:
    a) Fisher;
    b) Student's t;
    c) Pearson;
    d) Spearman.

    60. Which of the following statistics is an unbiased estimate of the mathematical expectation:
    a) arithmetic mean;
    b) fashion;
    c) median;
    d) all answers are correct.

    61. How the correlation coefficients of Pearson and Spearman are related:
    a) Pearson's coefficient is a special case of Spearman;
    b) Spearman's coefficient is a special case of Pearson;
    c) these coefficients have different construction logic;
    d) they are one and the same.

    62. According to the theoretical assumptions of analysis of variance, the F-ratio cannot be:
    a) is equal to 1;
    b) more than 1;
    c) less than 1;
    d) there is no correct answer.

    Service purpose... Using this online calculator, you can find the parameters of the nonlinear regression equation (exponential, power, equilateral hyperbola, logarithmic, exponential) (see example).

    Instruction. Indicate the amount of source data. The resulting solution is saved in a Word file. A solution template in Excel is also automatically generated. Note: if you need to determine the parameters of the parabolic dependence (y = ax 2 + bx + c), then you can use the Analytical alignment service.
    It is possible to limit a homogeneous set of units by eliminating anomalous objects of observation using the Irwin method or according to the three sigma rule (eliminate those units for which the value of the explanatory factor deviates from the mean by more than three times the standard deviation).

    Types of nonlinear regression

    Here ε is a random error (deviation, perturbation) reflecting the influence of all unaccounted for factors.

    First order regression equation is a pairwise linear regression equation.

    Second order regression equation this is a second order polynomial regression equation: y = a + bx + cx 2.

    Third order regression equation respectively, the third order polynomial regression equation: y = a + bx + cx 2 + dx 3.

    To bring nonlinear dependencies to linear, linearization methods are used (see the alignment method):

    1. Change of variables.
    2. Logarithm of both sides of the equation.
    3. Combined.
    y = f (x)TransformationLinearization method
    y = b x aY = ln (y); X = ln (x)Logarithm
    y = b e axY = ln (y); X = xCombined
    y = 1 / (ax + b)Y = 1 / y; X = xChange of variables
    y = x / (ax + b)Y = x / y; X = xChange of variables. Example
    y = aln (x) + bY = y; X = ln (x)Combined
    y = a + bx + cx 2x 1 = x; x 2 = x 2Change of variables
    y = a + bx + cx 2 + dx 3x 1 = x; x 2 = x 2; x 3 = x 3Change of variables
    y = a + b / xx 1 = 1 / xChange of variables
    y = a + sqrt (x) bx 1 = sqrt (x)Change of variables
    An example. According to the data taken from the corresponding table, proceed as follows:
    1. Construct a correlation field and formulate a hypothesis about the form of communication.
    2. Calculate the parameters of linear, power, exponential, semi-logarithmic, inverse, hyperbolic pair regression equations.
    3. Assess the tightness of communication using the indicators of correlation and determination.
    4. Using the average (general) coefficient of elasticity, give a comparative assessment of the strength of the relationship between the factor and the result.
    5. Evaluate the quality of the equations using the average approximation error.
    6. Evaluate the statistical reliability of the results of regression modeling using Fisher's F-test. According to the values ​​of the characteristics calculated in paragraphs. 4, 5 and this paragraph, choose the best regression equation and give its justification.
    7. Calculate the predicted value of the result if the predicted value of the factor increases by 15% from its average level. Determine the confidence interval of the forecast for the significance level α = 0.05.
    8. Assess the results obtained, draw up conclusions in an analytical note.
    YearActual final consumption of households (in current prices), billion rubles (1995 - trillion rubles), yAverage per capita monetary income of the population (per month), rubles (1995 - thousand rubles), x
    1995 872 515,9
    2000 3813 2281,1
    2001 5014 3062
    2002 6400 3947,2
    2003 7708 5170,4
    2004 9848 6410,3
    2005 12455 8111,9
    2006 15284 10196
    2007 18928 12602,7
    2008 23695 14940,6
    2009 25151 16856,9

    Solution. In the calculator, sequentially select types of nonlinear regression... We get a table of the following form.
    The exponential regression equation is y = a e bx
    After linearization, we get: ln (y) = ln (a) + bx
    We get the empirical regression coefficients: b = 0.000162, a = 7.8132
    Regression equation: y = e 7.81321500 e 0.000162x = 2473.06858e 0.000162x

    The power regression equation is y = a x b
    After linearization, we get: ln (y) = ln (a) + b ln (x)
    Empirical regression coefficients: b = 0.9626, a = 0.7714
    Regression equation: y = e 0.77143204 x 0.9626 = 2.16286x 0.9626

    The hyperbolic regression equation has the form y = b / x + a + ε
    After linearization, we get: y = bx + a
    Empirical regression coefficients: b = 21089190.1984, a = 4585.5706
    Empirical regression equation: y = 21089190.1984 / x + 4585.5706

    The logarithmic regression equation is y = b ln (x) + a + ε
    Empirical regression coefficients: b = 7142.4505, a = -49694.9535
    Regression equation: y = 7142.4505 ln (x) - 49694.9535

    The exponential regression equation has the form y = a b x + ε
    After linearization, we get: ln (y) = ln (a) + x ln (b)
    Empirical regression coefficients: b = 0.000162, a = 7.8132
    y = e 7.8132 * e 0.000162x = 2473.06858 * 1.00016 x

    xy1 / xln (x)ln (y)
    515.9 872 0.00194 6.25 6.77
    2281.1 3813 0.000438 7.73 8.25
    3062 5014 0.000327 8.03 8.52
    3947.2 6400 0.000253 8.28 8.76
    5170.4 7708 0.000193 8.55 8.95
    6410.3 9848 0.000156 8.77 9.2
    8111.9 12455 0.000123 9 9.43
    10196 15284 9.8E-59.23 9.63
    12602.7 18928 7.9E-59.44 9.85
    14940.6 23695 6.7E-59.61 10.07
    16856.9 25151 5.9E-59.73 10.13

    Another type of one-way regression is approximation by power polynomials of the form:

    Naturally, the desire to obtain as simple a dependence as possible, limiting ourselves to power polynomials of the second degree, i.e. parabolic dependence:
    (5.5.2)

    Let us calculate the partial derivatives with respect to the coefficients b 0 , b 1 and b 2 :



    (5.5.3)

    Equating the derivatives to zero, we obtain the normal system of equations:

    (5.5.4)

    Solving the system of normal equations (5.5.2) for the specific case of values x i * , y i * ;
    get optimal values b 0 , b 1 and b 2 . For approximation by dependence (5.5.2) and even more so (5.5.1), simple formulas for calculating the coefficients have not been obtained and, as a rule, they are calculated according to standard procedures in matrix form:

    (5.5.5)

    Figure 5.5.1 shows a typical example of approximation by a parabolic dependence:

    9 (5;9)

    (1;1)

    1

    1 2 3 4 5 x

    Figure 5.5.1. The coordinates of the experimental points and approximate

    their parabolic dependence

    Example 5.1. Conduct an approximation of the experimental results given in Table 5.1.1 with a linear regression equation
    .

    Table 5.1.1

    Let's construct the experimental points according to the coordinates indicated in Table 5.1.1 on the graph shown in Figure 5.1.1.

    at

    9

    4

    1 2 3 4 5 x

    According to Fig. 5.1.1, on which we draw a straight line for a preliminary assessment, we conclude that there is a pronounced nonlinearity in the arrangement of the experimental points, but it is not very significant and therefore it makes sense to approximate them with a linear dependence. Note that in order to obtain a correct mathematical conclusion, it is required to construct a straight line using the least squares method.

    Before carrying out the regression analysis, it is advisable to calculate

    linear correlation coefficient between variables NS and at:

    The significance of the correlation is determined by the critical value of the linear correlation coefficient, calculated by the formula:

    Critical value of Student's criterion t Crete found from statistical tables for the recommended level of significance α = 0.05 and for n-2 degrees of freedom. If the calculated value r xy not less than critical value r Crete, then the correlation between the variables x and y considered to be real. Let's make the calculations:










    Due to the fact that
    we conclude that the correlation between the variables NS and at is essential and can be linear.

    Let's calculate the coefficients of the regression equation:

    Thus, we got a linear regression equation:

    Let us draw a straight line according to the regression equation in Figure 5.1.2.

    y (5; 9.8)

    9

    4

    (0;-0.2) 1 2 3 4 5 x

    Figure 5.1.2. The coordinates of the experimental points and approximate

    their linear dependence

    Using the regression equation, we calculate the values ​​of the function by the experimental points of Table 5.1.1 and the difference between the experimental and calculated values ​​of the function, which are presented in Table 5.1.2.

    Table 5.1.2


    Let's calculate the mean square error and its ratio to the mean value:

    In relation to the standard error to the mean, an unsatisfactory result was obtained, since the recommended value of 0.05 was exceeded.

    Let us assess the significance level of the coefficients of the regression equation according to the Student's criterion:


    From the statistical table for 3 degrees of freedom, let's write out lines with a significance level - and the value of the Student's criterion t in table 5.1.3.

    Table 5.1.3

    Significance level of the coefficients of the regression equation:


    Note that according to the level of significance for the coefficient a satisfactory result was obtained, and for the coefficient unsatisfactory.

    Let us assess the quality of the obtained regression equation by indicators calculated on the basis of analysis of variance:

    Examination:

    The result of the check is positive, which indicates the correctness of the calculations.

    Let's calculate the Fisher criterion:

    with two degrees of freedom:

    According to the statistical tables, we find the critical values ​​of the Fisher criterion for the two recommended gradations of the significance level:


    Since the calculated value of the Fisher criterion exceeds the critical value for the significance level of 0.01, we will assume that the significance level according to the Fisher criterion is less than 0.01, which will be considered satisfactory.

    Let's calculate the coefficient of multiple determination:

    for two degrees of freedom

    According to the statistical table for the recommended significance level of 0.05 and the two degrees of freedom found, we find the critical value of the multiple determination coefficient:

    Since the calculated value of the multiple determination coefficient exceeds the critical value for the significance level
    , then the level of significance by the multiple determination coefficient
    and the result obtained for the given indicator will be considered satisfactory.

    Thus, the calculated parameters obtained in relation to the standard error to the mean value and the level of significance according to the Student's criterion are unsatisfactory, therefore it is advisable to choose another approximating dependence for approximation.

    Example 5.2. Approximation of the experimental distribution of random numbers by mathematical dependence

    The experimental distribution of random numbers given in Table 5.1.1, when approximated by a linear dependence, did not lead to a satisfactory result, incl. by the insignificance of the coefficient of the regression equation with a free term, therefore, to improve the quality of the approximation, we will try to draw it with a linear dependence without a free term:

    Let's calculate the value of the coefficient of the regression equation:

    Thus, we got the regression equation:

    Using the obtained regression equation, we calculate the values ​​of the function and the difference between the experimental and calculated values ​​of the function, which we present in the form of table 5.2.1.

    Table 5.2.1

    x i

    Regression equation
    in Figure 5.2.1, draw a straight line.

    y (5; 9.73 )

    (0;0) 1 2 3 4 5 x

    Fig.5.2.1. The coordinates of the experimental points and approximate

    their linear dependence

    To assess the quality of the approximation, we will calculate the quality indicators in the same way as in Example 5.1.

    (remains old);

    with 4 degrees of freedom;

    for

    Based on the results of the approximation, we note that the level of significance of the coefficient of the regression equation obtained a satisfactory result; the ratio of the standard error to the mean has improved, but is still above the recommended value of 0.05, so it is recommended to repeat the approximation with a more complex mathematical relationship.

    Example 5.3. To improve the quality of approximation of examples 5.1 and 5.2, we carry out a nonlinear approximation by the dependence
    ... To do this, we will initially perform intermediate calculations and place their results in Table 5.3.1.

    The values

    Table 5.3.1

    X 2

    (lnX) 2

    lnX lnY

    Additionally, we calculate:

    Let's make an approximation by dependence
    ... Using formulas (5.3.7), (5.3.8), we calculate the coefficients b 0 and b 1 :

    Using formulas (5.3.11), we calculate the coefficients A 0 and A 1 :


    To calculate the standard error, intermediate calculations were performed, presented in Table 5.3.2.

    Table 5.3.2

    Y i

    y i

    Amount: 7.5968

    The standard error of approximation turned out to be much larger than in the two previous examples, therefore, the approximation results are considered unsuitable.

    Example 5.4. Let's try to approximate one more nonlinear dependence
    ... Using formulas (5.3.9), (5.3.10) and using the data in Table 5.3.1, we calculate the coefficients b 0 and b 1 :

    We got an intermediate dependence:

    Using formulas (5.3.13), we calculate the coefficients C 0 and C 1 :


    We got the final dependency:

    To calculate the standard error, we will carry out intermediate calculations and place them in Table 5.4.1.

    Table 5.4.1

    Y i

    y i

    Amount: 21.83152

    Let's calculate the standard error:

    The standard error of approximation turned out to be much larger than in the previous example, therefore, the approximation results are considered unsuitable.

    Example 5.5. Approximation of the experimental distribution of random numbers by mathematical dependence y = b · lnx

    The initial data, as in the previous examples, are shown in table 5.4.1 and figure 5.4.1.

    Table 5.4.1

    Based on the analysis of Fig. 5.4.1 and Table 5.4.1, we note that for smaller values ​​of the argument (at the beginning of the table) the function changes more than for large values ​​(at the end of the table), therefore, it seems appropriate to change the scale of the argument and introduce a logarithmic function into the regression equation from it and carry out an approximation with the following mathematical dependence:

    ... Using formula (5.4.3), we calculate the coefficient b:

    To assess the quality of the approximation, we will carry out intermediate calculations presented in Table 5.4.2, according to which we calculate the magnitude of the error and the ratio of the standard error to the mean value.

    Table 5.4.2


    Since the recommended value of 0.05 has been exceeded in relation to the standard error to the mean, the result will be considered unsatisfactory. In particular, note that the largest deviation gives the value x = 1, since with this value lnx=0. Therefore, we will carry out an approximation by the dependence y = b 0 + b 1 Lnx

    Auxiliary calculations are presented in the form of table 5.4.3.

    Table 5.4.3

    Using formulas (5.4.6) and (5.4.7), we calculate the coefficients b 0 and b 1 :

    9 (5;9.12)

    4

    1 (1;0.93)

    1 2 3 4 5 x

    To assess the quality of the approximation, we will carry out auxiliary calculations and determine the significance level of the found coefficients and the ratio of the standard error to the mean value.

    Significance level slightly above the recommended value of 0.05 (
    ).


    In view of the fact that for the main indicator - the ratio of the standard error to the mean, an almost two-fold excess of the recommended level of 0.05 was obtained, the results will be considered acceptable. Note that the calculated value of the Student's test t b 0 =2,922 different from critical
    relatively small amount.

    Example 5.6. Let us approximate the experimental data of Example 5.1 with a hyperbolic dependence
    ... To calculate the coefficients b 0 and b 1 let's carry out preliminary calculations shown in table 5.6.1.

    Table 5.6.1

    X i

    x i = 1 / X i

    x i 2

    x i y i

    Based on the results of Table 5.6.1, using formulas (5.4.8) and (5.4.9), we calculate the coefficients b 0 and b 1 :

    Thus, we have obtained a hyperbolic regression equation

    .

    The results of auxiliary calculations to assess the quality of the approximation are shown in Table 5.6.2.

    Table 5.6.2

    X i

    Based on the results of Table 5.6.2, we calculate the standard error and the ratio of the standard error to the mean:


    In view of the fact that the ratio of the standard error to the mean value exceeds the recommended value of 0.05, we conclude that the approximation results are unsuitable.

    Example 5.7.

    To calculate specific values ​​of income from the operation of jib cranes, depending on the time of maintenance, it is required to obtain a parabolic dependence.

    Let us calculate the coefficients of this dependence b 0 , b 1 , b 11 in matrix form according to the formula:

    Nonlinear regression equations connecting the effective indicator with the optimal values ​​of the maintenance work of tower cranes were obtained using the multiple regression procedure of the applied Statistica programs 6.0. Below we present the results of regression analysis for the effective performance indicator according to table 5.7.1.

    Table 5.7.1

    Table 5.7.2 shows the non-linear regression results for the effective performance indicator and Table 5.7.3 shows the results of the residual analysis.

    Table 5.7.2

    Table 5.7.3

    Rice. 3.7.36. Residual analysis.

    Thus, we have obtained the multiple regression equation for the variable
    :

    Standard error to mean ratio:

    14780/1017890=0,0145 < 0,05.

    Since the ratio of the standard error to the mean value does not exceed the recommended value of 0.05, the approximation results can be considered acceptable. As a disadvantage in Table 5.7.2, it should be noted that all the calculated coefficients exceed the recommended significance level of 0.05.

    Linear regression

    The linear regression equation is an equation of a straight line that approximates (approximately describes) the relationship between random variables X and Y.

    Consider a random two-dimensional variable (X, Y), where are the dependent random variables... Let's represent one of the quantities as a function of the other. We restrict ourselves to an approximate representation of the quantity in the form of a linear function of the quantity X:

    where are the parameters to be determined. This can be done in various ways, the most common of which is the least squares method. The function g (x) is called the root-mean-square regression of Y on X. The function g (x) is called the root-mean-square regression of Y on X.

    where F is the total standard deviation.

    Let's choose a and b so that the sum of the squares of the deviations is minimal. In order to find the coefficients a and b at which F reaches its minimum value, we equate the partial derivatives to zero:

    Find a and b. After performing elementary transformations, we obtain a system of two linear equations for a and b:

    where is the sample size.

    In our case, A = 3888; B = 549; C = 8224; D = 1182; N = 100.

    Find a and b from this linear. We get a stationary point for where 1.9884; 0.8981.

    Therefore, the equation will take the form:

    y = 1.9884x + 0.8981


    Rice. ten

    Parabolic regression

    Let us find from the observation data the sample equation of the curve of the root-mean-square (parabolic in our case) regression line. Let's use the least squares method to determine p, q, r.

    We restrict ourselves to representing the quantity Y as a parabolic function of the quantity X:

    where p, q, and r are parameters to be determined. This can be done using the least squares method.

    Let us choose the parameters p, q and r so that the sum of the squares of the deviations is minimal. Since each deviation depends on the sought parameters, then the sum of the squares of the deviations is a function F of these parameters:

    To find the minimum, we equate the corresponding partial derivatives to zero:

    Find p, q and r. After performing elementary transformations, we obtain a system of three linear equations for p, q and r:

    Solving this system by the method inverse matrix, we get: p = -0.0085; q = 2.0761;

    Therefore, the parabolic regression equation will take the form:

    y = -0.0085x 2 + 2.0761x + 0.7462

    Let's plot a parabolic regression graph. For ease of observation, the regression plot will be in the background of the scatter plot (see Figure 13).


    Rice. 13

    Now let's draw the lines of linear regression and parabolic regression on one diagram, for visual comparison (see Figure 14).


    Rice. fourteen

    Linear regression is shown in red and parabolic regression is shown in blue. The diagram shows that the difference in this case is greater than when comparing two lines linear regressions... Further research is required as to which regression better expresses the relationship between x and y, that is, what type of relationship is between x and y.