MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods CHAPTER 8 STATISTICAL METHODS 8.1 INTRODUCTION Variability in composite material property data may result from a number of sources including run-to- run variability in fabrication,batch-to-batch variability of raw materials,testing variability,and variability intrinsic to the material.It is important to acknowledge this variability when designing with composites and to incorporate it in design values of material properties.Procedures for calculating statistically-based material properties are provided in this chapter.With a properly designed test program(Chapter 2),these statistical procedures can account for some,but not all,of these sources for variability.A fundamental assumption is that one is measuring the desired properties.If this is not the case,then no statistical pro- cedure is sufficient to account for other technical inadequacies. Section 8.2 provides introductory material and guidance for the methods used in the remainder of the chapter.Readers unfamiliar with the statistical methods in the chapter should read Section 8.2 before the remainder of the chapter;more experienced readers may find it useful as a reference.Section 8.3 pro- vides methods for evaluating data and calculating statistically-based properties.Section 8.4 contains other statistical methods,including methods for confidence intervals for a coefficient of variation,stress- strain curves,quality control,and alternate material evaluation.Section 8.5 contains statistical tables and approximate formulas. 8.1.1 Overview of methods for calculating statistically-based properties Section 8.3 describes computational methods for obtaining A-and B-basis values from composite material data.Different approaches are used depending on whether the data can be grouped in a natural way (for example,because of batches or differences in environmental conditions).Data sets which either cannot be grouped,or for which there are negligible differences among such groups,are called unstruc- tured.Otherwise,the data are said to be structured.The statistical methods in Section 8.3.2,which ex- amine if the differences among groups of data are negligible,are useful for determining whether the data should be treated as structured or unstructured.Unstructured data are modeled using a Weibull,normal, or lognormal distribution,using the methods in Section 8.3.4.If none of these are acceptable,nonpara- metric basis values are determined.Structured data are modeled using linear statistical mode/s,including regression and the analysis of variance(ANOVA),using the methods in Section 8.3.5. 8.1.2 Computer software Non-proprietary computer software useful for analyzing material property data is available.STAT17, available from the MIL-HDBK-17 Secretariat upon request(see page ii),performs the calculations in the flowchart in Figure 8.3.1 with the exception of linear regression.RECIPE(REgression Confidence Inter- vals on PErcentiles),available from the National Institute of Standards and Technology,performs calcula- tions that find material basis values from linear models including regression and analysis of variance. RECIPE can be obtained by anonymous ftp from'ftp.nist.gov',directory 'recipe'.A non-proprietary general statistical analysis and graphics package DATAPLOT is also available from NIST by anonymous ftp from'scf.nist.gov',directory 'pubs/dataplot". 8.1.3 Symbols The symbols that are used in Chapter 8 and not commonly used throughout the remainder of this handbook are listed below,each with its definition and the section in which it is first used. Contact Stefan Leigh,Statistical Engineering Division,NIST,Gaithersburg,MD,20899-0001,email:stefan.leigh@nist.gov. 8-1
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-1 CHAPTER 8 STATISTICAL METHODS 8.1 INTRODUCTION Variability in composite material property data may result from a number of sources including run-torun variability in fabrication, batch-to-batch variability of raw materials, testing variability, and variability intrinsic to the material. It is important to acknowledge this variability when designing with composites and to incorporate it in design values of material properties. Procedures for calculating statistically-based material properties are provided in this chapter. With a properly designed test program (Chapter 2), these statistical procedures can account for some, but not all, of these sources for variability. A fundamental assumption is that one is measuring the desired properties. If this is not the case, then no statistical procedure is sufficient to account for other technical inadequacies. Section 8.2 provides introductory material and guidance for the methods used in the remainder of the chapter. Readers unfamiliar with the statistical methods in the chapter should read Section 8.2 before the remainder of the chapter; more experienced readers may find it useful as a reference. Section 8.3 provides methods for evaluating data and calculating statistically-based properties. Section 8.4 contains other statistical methods, including methods for confidence intervals for a coefficient of variation, stressstrain curves, quality control, and alternate material evaluation. Section 8.5 contains statistical tables and approximate formulas. 8.1.1 Overview of methods for calculating statistically-based properties Section 8.3 describes computational methods for obtaining A- and B-basis values from composite material data. Different approaches are used depending on whether the data can be grouped in a natural way (for example, because of batches or differences in environmental conditions). Data sets which either cannot be grouped, or for which there are negligible differences among such groups, are called unstructured. Otherwise, the data are said to be structured. The statistical methods in Section 8.3.2, which examine if the differences among groups of data are negligible, are useful for determining whether the data should be treated as structured or unstructured. Unstructured data are modeled using a Weibull, normal, or lognormal distribution, using the methods in Section 8.3.4. If none of these are acceptable, nonparametric basis values are determined. Structured data are modeled using linear statistical models, including regression and the analysis of variance (ANOVA), using the methods in Section 8.3.5. 8.1.2 Computer software Non-proprietary computer software useful for analyzing material property data is available. STAT17, available from the MIL-HDBK-17 Secretariat upon request (see page ii), performs the calculations in the flowchart in Figure 8.3.1 with the exception of linear regression. RECIPE (REgression Confidence Intervals on PErcentiles), available from the National Institute of Standards and Technology, performs calculations that find material basis values from linear models including regression and analysis of variance. RECIPE can be obtained by anonymous ftp from 'ftp.nist.gov', directory 'recipe'. A non-proprietary general statistical analysis and graphics package DATAPLOT is also available from NIST by anonymous ftp from ‘scf.nist.gov’, directory ‘pubs/dataplot1 . 8.1.3 Symbols The symbols that are used in Chapter 8 and not commonly used throughout the remainder of this handbook are listed below, each with its definition and the section in which it is first used. 1 Contact Stefan Leigh, Statistical Engineering Division, NIST, Gaithersburg, MD, 20899-0001, email: stefan.leigh@nist.gov
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods SYMBOL DEFINITION SECTION A A-basis value a distribution limit 8.1.4 ADC critical value of ADK 8.3.2.2 ADK k-sample Anderson-Darling statistic 8.3.2.2 B-basis value 8.2.5.1 b distribution limit 8.1.4 critical value 8.3.3.1 CV coefficient of variation 8.2.5.2 e error,residual 8.3.5.1 F F-statistic 8.3.5.2.2 F(x) cumulative distribution function 8.1.4 f(x) probability density function 8.1.4 Fo standard normal distribution function 8.3.4.3.2 IQ informative quantile function 8.3.6.2 J number of specimens per batch 8.2.5.3 number of batches 8.2.3 kA (1)one-sided tolerance limit factor,A-basis 8.3.4.3.3 (2)Hanson-Koopmans coefficient,A-basis 8.3.4.5.2 kB (1)one-sided tolerance limit factor.B-basis 8.3.4.3.3 (2)Hanson-Koopmans coefficient,B-basis 8.3.4.5.2 MNR maximum normed residual test statistic 8.3.3.1 MSB between-batch/group mean square 8.3.5.2.5 MSE within-batch/group mean square 8.3.5.2.5 n number of observations in a data set 8.1.4 n' effective sample size 8.3.5.2.6 n number of specimens required for comparable reproducibility 8.2.5.3 々州 see Equation 8.3.5.2.6(b) 8.3.5.2.6 ni number of observations in batch/group i 8.3.2.1 OSL observed significance level 8.3.1 p(s) fixed condition 8.3.5.1 Q quantile function 8.3.6.1 quantile function estimate 8.3.6.1 rank of observation 8.3.4.5.1 RME relative magnitude of error 8.5 sample standard deviation 8.1.4 s2 sample variance 8.1.4 SL standard deviation of log values 8.3.4.4 Sy estimated standard deviation of errors from the regression line 8.3.5.3 SSB between-batch/group sum of squares 8.3.5.2.3 SSE within-batch/group sum of squares 8.3.5.2.3 SST total sum of squares 8.3.5.2.3 T tolerance limit factor 8.3.5.2.7 quantile of the t-distribution 8.3.3.1 Ti temperature at condition i 8.3.5.1 8-2
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-2 SYMBOL DEFINITION SECTION A A-basis value - a distribution limit 8.1.4 ADC critical value of ADK 8.3.2.2 ADK k-sample Anderson-Darling statistic 8.3.2.2 B B-basis value 8.2.5.1 b distribution limit 8.1.4 C critical value 8.3.3.1 CV coefficient of variation 8.2.5.2 e error, residual 8.3.5.1 F F-statistic 8.3.5.2.2 F(x) cumulative distribution function 8.1.4 f(x) probability density function 8.1.4 F0 standard normal distribution function 8.3.4.3.2 IQ informative quantile function 8.3.6.2 J number of specimens per batch 8.2.5.3 k number of batches 8.2.3 kA (1) one-sided tolerance limit factor, A-basis (2) Hanson-Koopmans coefficient, A-basis 8.3.4.3.3 8.3.4.5.2 kB (1) one-sided tolerance limit factor, B-basis (2) Hanson-Koopmans coefficient, B-basis 8.3.4.3.3 8.3.4.5.2 MNR maximum normed residual test statistic 8.3.3.1 MSB between-batch/group mean square 8.3.5.2.5 MSE within-batch/group mean square 8.3.5.2.5 n number of observations in a data set 8.1.4 n′ effective sample size 8.3.5.2.6 ~n number of specimens required for comparable reproducibility 8.2.5.3 *n see Equation 8.3.5.2.6(b) 8.3.5.2.6 ni number of observations in batch/group i 8.3.2.1 OSL observed significance level 8.3.1 p(s) fixed condition 8.3.5.1 Q quantile function 8.3.6.1 Q quantile function estimate 8.3.6.1 r rank of observation 8.3.4.5.1 RME relative magnitude of error 8.5 s sample standard deviation 8.1.4 2 s sample variance 8.1.4 sL standard deviation of log values 8.3.4.4 sy estimated standard deviation of errors from the regression line 8.3.5.3 SSB between-batch/group sum of squares 8.3.5.2.3 SSE within-batch/group sum of squares 8.3.5.2.3 SST total sum of squares 8.3.5.2.3 T tolerance limit factor 8.3.5.2.7 t quantile of the t-distribution 8.3.3.1 Ti temperature at condition i 8.3.5.1
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods SYMBOL DEFINITION SECTION ty,0.95() 0.95 quantile of the non-central t-distribution with non-centrality pa-8.3.5.3 rameter and degrees of freedom y TIQ truncated informative quantile function 8.3.6.2 (1)ratio of mean squares 8.3.5.2.7 (2)batch 8.3.5.1 VA one-sided tolerance limit factor for the Weibull distribution.A-basis 8.3.4.2.3 VB one-sided tolerance limit factor for the Weibull distribution.B-basis 8.3.4.2.3 Wij transformed data 8.3.5.2.1 f sample mean,overall mean 8.1.4 Xi observation i in a sample 8.1.4 Xi median of x values 8.3.5.2.1 X可 jth observation in batch/group i 8.3.2.1 Xijk kth observation in batch j at condition i 8.2.3 XL mean of log values 8.3.4.4 X(r) rth observation,sorted in ascending order;observation of rank r 8.3.4.5.1 Z0.10 tenth percentile of the underlying population distribution 8.2.2 Z60 ranked independent values 8.3.2.1 Zp(s).u regression constants 8.3.5.1 (1)significance level 8.3.3.1 (2)scale parameter of Weibull distribution 8.1.4 e estimate of a 8.3.4.2.1 B shape parameter of Weibull distribution 8.1.4 B estimate of B 8.3.4.2.1 B regression parameters 8.3.5.3 B least squares estimate of B 8.3.5.3 Y degrees of freedom 8.3.5.3 6 noncentrality parameter 8.3.5.3 a regression parameters 8.3.5.1 μ population mean 8.1.4 4 mean at condition i 8.2.3 P correlation between any two measurements in the same batch 8.2.5.3 0 population standard deviation 8.1.4 02 population variance 8.1.4 % population between-batch variance 8.2.3 品 population within-batch variance 8.2.3 8.1.4 Statistical terms Definitions of the most often used statistical terms in this handbook are provided in this section. This list is certainly not complete;the user of this document with little or no background in statistical methods should also consult an elementary text on statistical methods such as Reference 8.1.4.Defini- tions for additional statistical terms are included in Section 1.7. 8-3
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-3 SYMBOL DEFINITION SECTION tγ ,0.95( ) δ 0.95 quantile of the non-central t-distribution with non-centrality parameter δ and degrees of freedom γ 8.3.5.3 TIQ truncated informative quantile function 8.3.6.2 u (1) ratio of mean squares (2) batch 8.3.5.2.7 8.3.5.1 VA one-sided tolerance limit factor for the Weibull distribution, A-basis 8.3.4.2.3 VB one-sided tolerance limit factor for the Weibull distribution, B-basis 8.3.4.2.3 wij transformed data 8.3.5.2.1 x sample mean, overall mean 8.1.4 xi observation i in a sample 8.1.4 i ~x median of x values 8.3.5.2.1 xij thj observation in batch/group i 8.3.2.1 xijk th k observation in batch j at condition i 8.2.3 xL mean of log values 8.3.4.4 x(r) th r observation, sorted in ascending order; observation of rank r 8.3.4.5.1 z0.10 tenth percentile of the underlying population distribution 8.2.2 z(i) ranked independent values 8.3.2.1 zp(s),u regression constants 8.3.5.1 α (1) significance level (2) scale parameter of Weibull distribution 8.3.3.1 8.1.4 α estimate of α 8.3.4.2.1 β shape parameter of Weibull distribution 8.1.4 β estimate of β 8.3.4.2.1 i β regression parameters 8.3.5.3 i β least squares estimate of i β 8.3.5.3 γ degrees of freedom 8.3.5.3 δ noncentrality parameter 8.3.5.3 θ i regression parameters 8.3.5.1 µ population mean 8.1.4 i µ mean at condition i 8.2.3 ρ correlation between any two measurements in the same batch 8.2.5.3 σ population standard deviation 8.1.4 2 σ population variance 8.1.4 b 2 σ population between-batch variance 8.2.3 e 2 σ population within-batch variance 8.2.3 8.1.4 Statistical terms Definitions of the most often used statistical terms in this handbook are provided in this section. This list is certainly not complete; the user of this document with little or no background in statistical methods should also consult an elementary text on statistical methods such as Reference 8.1.4. Definitions for additional statistical terms are included in Section 1.7.
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods Population--The set of measurements about which inferences are to be made or the totality of pos- sible measurements which might be obtained in a given testing situation.For example,"all possible ulti- mate tensile strength measurements for Composite Material A,conditioned at 95%relative humidity and room temperature".In order to make inferences about a population,it is often necessary to make as- sumptions about its distributional form.The assumed distributional form may also be referred to as the population. Sample--The collection of measurements (sometimes referred to as observations)taken from a specified population. Sample size--The number of measurements in a sample. A-basis Value--A statistically-based material property;a 95%lower confidence bound on the first percentile of a specified population of measurements.Also a 95%lower tolerance bound for the upper 99%of a specified population. B-basis Value--A statistically-based material property;a 95%lower confidence bound on the tenth percentile of a specified population of measurements.Also a 95%lower tolerance bound for the upper 90%of a specified population. Compatible--Descriptive term referring to different groups or subpopulations which may be treated as coming from the same population. Structured data--Data for which natural groupings exist,or for which responses of interest could vary systematically with respect to known factors.For example,measurements made from each of several batches could reasonably be grouped according to batch,and measurements made at various known temperatures could be modeled using linear regression(Section 8.3.5.2);hence both can be regarded as structured data. Unstructured data--Data for which all relevant information is contained in the response measure- ments themselves.This could be because these measurements are all that is known,or else because one is able to ignore potential structure in the data.For example,data measurements that have been grouped by batch and demonstrated to have negligible batch-to-batch variability (using the subsample compatibility methods of Section 8.3.2)may be considered unstructured. Location parameters and statistics: Population mean--The average of all potential measurements in a given population weighted by their relative frequencies in the population.The population mean is the limit of the sample mean as the sample size increases. Sample mean--The average of all observations in a sample and an estimate of the population mean. If the notation x1,x2,...,xn is used to denote the n observations in a sample,then the sample mean is defined by: x =xI+x2+...+xn 8.1.4(a n or 1 n 8.1.4b) ni=1 Sample median--After ordering the observations in a sample from least to greatest,the sample me- dian is the value of the middle-most observation if the sample size is odd and the average of the two mid- dle-most observations if the sample size is even.If the population is symmetric about its mean,the sam- ple median is also a satisfactory estimator of the population mean. 8-4
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-4 ` Population -- The set of measurements about which inferences are to be made or the totality of possible measurements which might be obtained in a given testing situation. For example, "all possible ultimate tensile strength measurements for Composite Material A, conditioned at 95% relative humidity and room temperature". In order to make inferences about a population, it is often necessary to make assumptions about its distributional form. The assumed distributional form may also be referred to as the population. Sample -- The collection of measurements (sometimes referred to as observations) taken from a specified population. Sample size -- The number of measurements in a sample. A-basis Value -- A statistically-based material property; a 95% lower confidence bound on the first percentile of a specified population of measurements. Also a 95% lower tolerance bound for the upper 99% of a specified population. B-basis Value -- A statistically-based material property; a 95% lower confidence bound on the tenth percentile of a specified population of measurements. Also a 95% lower tolerance bound for the upper 90% of a specified population. Compatible -- Descriptive term referring to different groups or subpopulations which may be treated as coming from the same population. Structured data -- Data for which natural groupings exist, or for which responses of interest could vary systematically with respect to known factors. For example, measurements made from each of several batches could reasonably be grouped according to batch, and measurements made at various known temperatures could be modeled using linear regression (Section 8.3.5.2); hence both can be regarded as structured data. Unstructured data -- Data for which all relevant information is contained in the response measurements themselves. This could be because these measurements are all that is known, or else because one is able to ignore potential structure in the data. For example, data measurements that have been grouped by batch and demonstrated to have negligible batch-to-batch variability (using the subsample compatibility methods of Section 8.3.2) may be considered unstructured. Location parameters and statistics: Population mean -- The average of all potential measurements in a given population weighted by their relative frequencies in the population. The population mean is the limit of the sample mean as the sample size increases. Sample mean -- The average of all observations in a sample and an estimate of the population mean. If the notation x12 n , x , ..., x is used to denote the n observations in a sample, then the sample mean is defined by: x = x + x +...+x n 12 n 8.1.4(a) or x= 1 n x i=1 n ∑ i 8.1.4(b) Sample median -- After ordering the observations in a sample from least to greatest, the sample median is the value of the middle-most observation if the sample size is odd and the average of the two middle-most observations if the sample size is even. If the population is symmetric about its mean, the sample median is also a satisfactory estimator of the population mean
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods Dispersion statistics: Sample variance--The sum of the squared deviations from the sample mean,divided by n-1,where n denotes the sample size.The sample variance is defined by: s21 n 2(x-x2 8.1.4(c n-1=1 or 2=2x2n2 8.1.4d n-1i=1n-1 Sample standard deviation--The square root of the sample variance.The sample standard deviation is denoted by s. Probability distribution terms: Probability distribution--A formula which gives the probability that a value will fall within prescribed limits.When the word distribution is used in this chapter.it should be interpreted to mean probability dis- tribution. Normal Distribution--A two parameter(u,o)family of probability distributions for which the probabil- ity that an observation will fall between a and b is given by the area under the curve x)=1 e\x-uY'12o 8.1.4(e) GV2π between a and b.A normal distribution with parameters (,o)has population mean u and variance g2. Lognormal Distribution--A probability distribution for which the probability that an observation se- lected at random from this population falls between a and b(0<a<b<)is given by the area under the normal distribution between In(a)and In(b). Two-Parameter Weibull Distribution--A probability distribution for which the probability that a ran- domly selected observation from this population lies between a and b(0<a<b<)is given by e(alay-e(bla) 8.1.4(0 where a is called the scale parameter and B is called the shape parameter. Probability function terms: Cumulative Distribution Function--A function,usually denoted by F(x),which gives the probability that a random variable lies between any prescribed pair of numbers,that is Pr(a<x≤b)=Fb)-F(a) 8.1.4(g) Such functions are non-decreasing and satisfy lim F(x)=1 8.1.4h) X→∞ 8-5
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-5 Dispersion statistics: Sample variance -- The sum of the squared deviations from the sample mean, divided by n-1, where n denotes the sample size. The sample variance is defined by: n 2 2 i i=1 1 s = (x x) n 1 ∑ − − 8.1.4(c) or 2 i=1 n s = 1 n-1 - n n-1 ∑ x x i 2 2 8.1.4(d) Sample standard deviation -- The square root of the sample variance. The sample standard deviation is denoted by s . Probability distribution terms: Probability distribution -- A formula which gives the probability that a value will fall within prescribed limits. When the word distribution is used in this chapter, it should be interpreted to mean probability distribution. Normal Distribution -- A two parameter ( µ σ, ) family of probability distributions for which the probability that an observation will fall between a and b is given by the area under the curve f(x) = 1 2 e-(x- ) /2 2 2 σ π µ σ 8.1.4(e) between a and b. A normal distribution with parameters ( µ σ, ) has population mean µ and variance 2 σ . Lognormal Distribution -- A probability distribution for which the probability that an observation selected at random from this population falls between a and b ( 0 < a < b < ∞ ) is given by the area under the normal distribution between ln(a) and ln(b) . Two-Parameter Weibull Distribution -- A probability distribution for which the probability that a randomly selected observation from this population lies between a and b ( 0 < a < b < ∞ ) is given by -(a/ ) -(b/ ) e - e α α β β 8.1.4(f) where α is called the scale parameter and β is called the shape parameter. Probability function terms: Cumulative Distribution Function -- A function, usually denoted by F(x) , which gives the probability that a random variable lies between any prescribed pair of numbers, that is Pr(a < x b) = F(b) - F(a) ≤ 8.1.4(g) Such functions are non-decreasing and satisfy x lim F(x) = 1 →+∞ 8.1.4(h)