Business 112 (Managerial Statistics) Study Notes

Define briefly the following terminologies

Population

(1) set of all possible observations for a variable to study; (2) denoted as N; (3) example: all men above 50

Sample

(1) representation of the population; (2) denoted as n; (3) example: only the men of San Beda

Statistics

(1) a branch of applied mathematics; (2) science of collecting, organizing, analyzing, and interpreting data; (3) the goal of statistics is to provide a meaningful picture of the focus of the study

Data

(1) information gained from experiments; (2) examples: age of subjects, weight or gain loss

Variable

(1) any characteristic whose value may change from object to object in a population; (2) example: age, sex

Observation

(1) active acquisition of information from a primary source; (2) employs the senses and recording the data via use of instruments; (3) any data collected during the scientific activity

Nominal Scale

(1) numbers are assigned to objects where different numbers indicate different objects – the numbers have no real meaning other than differentiating between objects; (2) example: male = 1, female = 2; (3) mathematical operation used: equal or not equal

Ordinal Scale

(1) numbers are assigned to objects like nominal, but the numbers also have meaningful order; (2) example: place finished in the race = 1st, 2nd, 3rd ; (3) mathematical operation used: equal or not equal, greater than or less than

Interval Scale

(1) numbers have order like ordinal, but there are also equal intervals between adjacent categories; (2) example: the difference between 78 and 79 degrees Fahrenheit is the same as the difference between 46 and 45 degrees Fahrenheit; (3) mathematical operation used: equal or not equal, greater than or less than, plus or minus

Ratio Scale

(1) differences between numbers are meaningful like interval, but zeroes are also meaningful; (2) example: age, height, sales; (3) mathematical operation used: equal or not equal, greater than or less than, plus or minus, multiply or divide

Categorical data

(1) data that describes the characteristics of the individual or category; (2) only some measures of center apply to categorical data: range, mode, median; (3) examples of categorical data: red, green, ID number, Toyota

Quantitative data

(1) data whose variables can be expressed in numerical terms; (2) examples of quantitative data: price, income, weight

Qualitative data

(1) data whose variables cannot be measured in numerical terms; (2) examples of qualitative data: taste, happiness

Cross-sectional data

(1) type of data collected by observing many subjects such as individuals, firms, countries, or regions at the same point of time or without regard the differences in time; (2) provides a snapshot of the data at the given time; (3) example: expenditures of different chemical plants in the Philippines in 2017

Time-series data

(1) sequence of data; (2) records a variable at a specific equally spaced frequency; (3) recorded over time; (3) example: expenditures of a chemical plant in the Philippines from 2015 to 2018

Descriptive statistics

(1) branch of statistics that consists of gathering, sorting, and summarizing data; (2) quantitatively describe or summarize features of a collection of information; (3) aim to summarize a sample rather than use the data to learn about the population that the sample of data is sought to represent

Population

(1) set of all possible observations for a variable to study; (2) denoted as N; (3) example: all men above 50

Statistical inference

(1) act of using observed data to infer unknown properties and characteristics of the probability distribution from which observed data have been generated; (2) involves using data from a sample to draw conclusions about a wider population

Example of use of statistics in a functional department

(1) Research and Development Department: uses descriptive statistics to describe samples; uses statistical test to confirm and validate hypotheses regarding newly created products; (2) Supply Chain: uses descriptive statistics to describe raw materials and finished goods inventory; uses different kinds of statistical methods for trend projection

class interval

(1) data groupings; (2) non-overlapping grouping of data; (3) example: 1-5, 6-10, 11-15

histogram

(1) interval is drawn as a bar bounded or defined by the class boundaries and the corresponding frequencies; (2) properties: uses quantitative data, no gaps, bar width is equal to the class size

frequency polygon

(1) uses class midpoints to represent intervals; (2) drawn over a histogram

ogive chart

(1) chart generated by graphing class boundaries versus cumulative frequency; (2) allows quick estimation of number of observations that are less than or equal to a particular value; (3) always positive slope; (4) tells growth

pie chart

(1) graphical representation of individual contribution; (2) commonly used to compare composition percentages

Rated scale

(1) scale used to record rank-based statements; (2) assigns a numerical value depending on severity or extremity; (3) examples: Likert, Mohs’ scale for hardness

Likert scale

(1) scale that represents items about the level of agreement; (2) commonly represented as: strongly agree, agree, neutral, disagree, strongly disagree

Mean

(1) sum of all the values divided by the total number of values (the mean commonly used in statistics is arithmetic mean; there are other form of mean like geometric and harmonic means); (2) a central tendency; (3) value from “smoothing” or “flattening” all the different data values into one consistent value

Median

(1) middle value in an arrayed data; (2) a central tendency

Mode

(1) value which occurs with the greatest number of times in a data set; (2) best measure of central tendency

Percentiles

(1) indicates the location of a score in a distribution; (2) ranges from 1 to 99; (3) indicates the percentage of scores that a given value is higher or greater than

Quartiles

(1) divides the variates into four equal parts; (2) Q1 is the lower quartile, Q2 is the middle quartile, and Q3 is the upper quartile

Range

(1) difference between the highest and the lowest values in a data set; (2) describes the distance or width between the two extreme observed values in data

Variance

(1) mean of the squared deviations; (2) a measure of dispersion – the higher the variance, the data is then highly dispersed; (3) denoted by ?2

Standard deviation

(1) measures how much variation exists in a distribution; (2) the lower the standard deviation, the set of data is then close to the mean; (2) square root of the variance

Coefficient of variation

(1) mean over standard deviation; (2) used to compare variation between two different sets of data with different means; (3) the higher the coefficient of variation, the higher the variability

Z-score

(1) used to measure how many standard deviations above or below the mean; (2) a high z-sore means the observation is further from the mean; (3) a measure of variability; (4) z-scores ignore measurement units

Box plot

(1) displays the range and distribution of data along a number line; (2) used to summarize large amounts of data; (3) show lower extreme, lower quartile, median, upper quartile, and upper extreme; (4) displays bunching or spreading of data

Covariance

(1) one of a family of statistical measures used to analyze the linear relationship between two variables; (2) a positive value indicates a direct or increasing relationship; (3) a negative value indicates a decreasing relationship

Correlation coefficient

(1) measures the strength or quality of the linear relationship between two variables; (2) includes Pearson product moment correlation coefficient

Skewness

(1) Measure of symmetry of distributions; (2) data are not symmetrical, thus, one tail is longer than the other; (3) peak is not in the center

Kurtosis

(1) measure of the shape of the curve; (2) measures if the bell of the curve is normal, flat, or peak; (3) a positive value of kurtosis means the curve is peaked; zero means normal; and a negative means flat; (4) the fourth central moment of distribution

Combination

(1) the number of different ways that a certain number of objects as a group can be selected from a larger number of objects; (2) order does not matter

Permutation

(1) the number of different ways that a certain number of objects can be arranged in order from a larger number of objects; (2) order matters

Inferential statistics

(1) branch of statistics that uses probability to determine whether it is likely that a particular sample or test outcome is representative of the population; (2) making inferences about the population beyond the data; (3) used to test validity of hypotheses

p-value

(1) a measure of the strength of the evidence against the null hypothesis; (2) probability that if the null hypothesis were true, sampling variation would produce an estimate that is further away from the hypothesized value than data estimate; (3) small p-value indicates strong evidence against the null hypothesis

level of significance

(1) the probability that the test statistic will fall in the critical region when the null hypothesis is actually true; (2) probability of making a type I error; (3) selected cut-off point that determines whether one should consider a p-value acceptably high or low; (4) denoted by the Greek letter alpha; (4) smaller value of alpha means more difficult to reject the null hypothesis, yet, type II errors will be common

alpha error

(1) also known as type I error; (2) the error of rejecting the null hypothesis when in fact, it is true; (3) the probability of committing an alpha error is called the significance level, denoted by the Greek letter alpha

beta error

(1) also known as type II error; (2) the error of accepting the null hypothesis when in fact, it is false; (3) beta value depends on a number of factors including the choice of alpha, the sample size, and the true value of the parameter

probability

(1) number that estimates the chances that the event will happen; (2) frequency at which some event happens out of a greater number of outcomes

random sampling

(1) involves selecting the sample at random from the sampling frame using draw lots, computer, any random number generator; (2) no bias; (3) equal chance of being selected

purposive sampling

(1) also known as judgmental sampling; (2) use of personal judgment to select samples; (3) often used when working with very small samples such as in case study research

mutually exclusive events

(1) only one outcome; (2) occurrence of one event precludes the occurrence of the other; (3) example of a mutually exclusive event is a coin toss: result will either be heads or tails, but not both

conditional probability

(1) probability that is conditional on other events; (2) the probability of an event based on the occurrence of a previous outcome; (3) conditional probability is usually analyzed using a Venn diagram

normal probability distribution

(1) also known as normal distribution or Gaussian distribution; (2) a continuous probability distribution; (3) characteristics: single-peak, symmetrical, tapered; (4) completely defined by mean and standard deviation; (5) used to find the area under the function as to calculate the probability of a specified range of distribution

normal curve

(1) symmetrical bell-shaped curve of a normal distribution; (2) distribution of data with most of the scores clustered around the middle

Hypothesis testing

(1) statistical method that uses sample data to evaluate a hypothesis about a population; (2) allows to use a sample to decide between two statements made about a population characteristic – these two statements are called the null hypothesis and the alternative hypothesis

Null hypothesis

(1) expresses the equality with the predetermined standard or required value of the parameter; (2) the sample mean is the same as the population

Alternative hypothesis

(1) states the deviation from the required value which is of concern to the researcher; (2) the sample is not the same as the population

Type I error

(1) also known as alpha error; (2) the error of rejecting the null hypothesis when in fact, it is true; (3) the probability of committing an alpha error is called the significance level, denoted by the Greek letter alpha

Type II error

(1) also known as beta error; (2) the error of accepting the null hypothesis when in fact, it is false; (3) beta value depends on a number of factors including the choice of alpha, the sample size, and the true value of the parameter

Two-tailed test

(1) test to check if calculated value is either above or below where it is expected to be; (2) does not specify direction (e.g. greater than or less than), only equal or not equal

One-tailed test

(1) test to check if calculated value is above or below where it is expected to be; (2) specifies direction (e.g. greater than or less than)

Give 5 Steps in hypothesis testing

(1) define null and alternative hypothesis, (2) state level of significance, (3) state decision rule, (4) calculate test statistic, (5) state conclusion

Parametric test

(1) test for hypothesis which provide generalizations for making statement about the mean of the population; (2) assumes normal distribution (e.g. z-test, t-test, ANOVA); (3) suitable when data are interval or ratio scaled

Non-parametric test

(1) does not use normal distribution (e.g. Kolmogorov-Smirnov test); (2) suitable for any continuous data; (3) used when there is no knowledge about the population or parameters

Level of confidence

(1) the percentage that a statistical result would be correct; (2) higher level of confidence means that greater certainty; (3) commonly used confidence level is 95%

Degrees of freedom

(1) the number of values in the final calculation of statistic that are free to be varied; (2) the number of independent ways by which a dynamic system can move without violating any constraint imposed on it; (3) high values of degrees of freedom means the closer the t-distribution resembles the normal distribution; (4) number of samples less groups

Critical value

(1) the value of the random variable at the boundary between the acceptance region and the rejection region in the testing of a hypothesis; (2) depends upon test statistic

Cronbach’s alpha

(1) commonly used to assess the internal consistency of a questionnaire or survey; (2) minimum accepted value for Cronbach’s alpha is 0.7; below this value, the internal consistency of the common range is low; on the other hand, maximum value of Cronbach’s alpha is 0.9; beyond this, there is a perceived redundancy or duplication; (3) generally increases when the correlations between the items increase

Test of Normality

(1) test if set of data is normally distributed or not; (2) examples include using histogram, skewness, kurtosis, chi-square, and Shapiro-Wilk’s test

Shapiro-Wilk’s Test

(1) a test for normality; (2) uses a null hypothesis and sample comes from the normal distribution; (3) test statistic is denoted by W; (4) if p-value is less than the Shapiro-Wilk coefficient, then, the distribution is not normal

Collinearity

(1) in multiple regression, the relationship of independent variable/s to a dependent variable; (2) occurs when two independent variables in a multiple regression have no correlation

Multi-collinearity

(1) in multiple regression, the relationship of an independent variable to a dependent variable and other independent variables; (2) occurs when more than two independent variables are inter-correlated

Variance Inflation Factor (VIF)

(1) used in testing for multi-collinearity; (2) quantifies the severity of collinearity; (3) ratio of variance with collinearity and variance without collinearity

Durbin-Watson Test

(1) a test for auto correlation; (2) test statistic used to detect the presence of a relationship between values separated from each other by a given time lag in the prediction errors a regression analysis

Homoscedasticity

(1) literally means “same variance”; (2) one of the assumptions used in regression; (3) variance around the regression line is the same for all values of the predictor variable (x)

Barlett’s Test

(1) a test for homogeneity of variances; (2) used to test whether or not samples have equal variances; (3) similar to the two-sample F test but allows multiple group comparison of variances; (4) more powerful than Levene’s test

Box M Test

(1) a test for homogeneity of covariance matrices; (2) not designed to be used for linear model context; (3) an insignificant value of Box’s M test shows that those groups do not differ from each other and would meet the assumption

Koenker-Bassett Test

(1) a test for heteroscedasticity; (2) similar to Breusch-Pagan test, except the residuals are made robust to outliers/non-normality; (3) if result of Koenker-Bassett test is significant, it indicates that one of the variables may be a strong predictor in some areas, but weak in the others

Breusch-Pagan Test

(1) a test for heteroscedasticity; (2) squares of the explanatory variables to determine if there is non-constant variance in the errors; (3) if result of Breusch-Pagan test is significant, it indicates that one of the variables may be a strong predictor in some areas, but weak in the others

Tolerance

(1) limits that create an interval that bounds a specified percentage of the population at a given level of confidence; (2) often used to demonstrate compliance with a set of requirements or specification limits; (3) a low tolerance level means errors are given more chance to be incorporated

Sphericity

(1) an assumption in repeated measure ANOVA; (2) accounts for random variation and error associated with measurement in inferential statistics; (3) condition where the variances of the differences between all combinations of related groups are equal

Mauchly’s test

(1) a test for sphericity; (2) tests the null hypothesis that the variances of the differences are equal; (3) if test is statistically significant, then, the variances of the differences are not equal

Ordinal Regression

(1) denotes a family of statistical learning methods in which the goal is to predict a variable which is discrete and ordered; (2) suitable when outcome is ordinal (e.g. mild, moderate, severe); (3) requires assuming that the effect of the independents is the same for each level of the dependent

Binary Logistic Regression

(1) a type of regression analysis where the dependent variable is a dummy variable (represented as 0,1); (2) models the relationship between a set of predictors and a binary response variable (e.g. win or lose); (3) used to understand how changes in the predictor values are associated with changes in the probability of an event occurring

Canonical Correlation

(1) a correlation between two sets of variables; (2) seeks the weighted linear composite for each set of dependent variable or independent variable to maximize the overlap in their distribution; (3) goal is to maximize correlation; (4) assumes linear relationship between any two variables and between variates (dependent or independent variable)

Factor Analysis

(1) a correlational method used to find and describe the underlying factors driving data values for a large set of variables; (2) identifies correlations between and among variables to bind them into one underlying factor driving their values; (3) large number of variables may be reduced to only several factors

Eigenvalues

(1) scalar numbers, usually real numbers; (2) represent stretching factors which the matrix A stretches the length of a particular vector; (3) represented like matrices, except it uses parentheses; (4) in statistics, eigenvalues are the variances of the factors in factor analysis

Cluster analysis

(1) process of grouping observations of similar kinds into smaller groups within the larger population; (2) like simple classification but the class label of each observation is not known; (3) often used in conjunction with discriminant analysis

Multiple Regression

(1) an extension of simple linear regression; (2) relationship is many one: one dependent variable to two or more independent variable; (3) two or more independent variables are used to predict the variance in one dependent variable; (4) each coefficient is interpreted as the estimated change in y corresponding to a one-unit change in a variable, when all other variables are held constant

MANOVA

(1) a multivariate procedure; (2) assumes individual group covariance matrices are equal; (3) stands for multivariate analysis; (4) comparing one or more dependent values across two or more groups

Discriminant Analysis

(1) a multivariate procedure; (2) assumes individual group covariance matrices are equal; (3) statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables)

Structural Equation Modeling

(1) a multivariate procedure; (2) general statistical modeling techniques used to establish relationship among variables; (3) a confirmatory technique – tests if the theory fits the data

Correspondence Analysis

(1) a multivariate procedure; (2) allows examination of the relationship between two nominal variables graphically in a multidimensional space; (3) a technique for graphically displaying a two-way table by calculating coordinates representing its rows and columns

Conjoint Analysis

(1) also known as multi-attribute composition model; (2) factors and their values are defined by the researcher in advance and the various combinations of the factor values yield fictive results; (3) popularly used in market research; (4) dependent variable: object attributes; independent variable: preferences

Spearman’s rank correlation

(1) a test of relationship; (2) measures the relationship between two continuous (interval or ratio) variables; (3) used to measure relationship between two ordinal variables or two variables related, but not linearly; (4) ranges from -1.0 to 1.0; (5) larger values means more linear association between the rankings; (6) a non-parametric test

Kendall’s Tau

(1) a coefficient that represents the degree of concordance between two columns of ranked data; (2) in Kendall’s Tau, the greater number of inversions, the smaller the coefficient will be; (3) ranges from -1.0 to 1.0; (4) larger values means more linear association between the rankings

Wilcoxon Rank Sum Test

(1) a non-parametric test; (2) test used to know which variable is more likely to exceed the other one, given two variables; (3) similar with the Mann-Whitney U test except in Wilcoxon Rank Sum Test, it uses paired data

Mann-Whitney U Test

(1) a non-parametric test; (2) alternative to t-test that can be applied when populations are not normal, particularly for small samples; (3) a test to compare populations means between two independent groups; (4) a test of both location and shape; (5) a test whether one variable tends to have values higher than the other

Partial Least Squares

(1) a form of Structural Equation Modeling; (2) a dimension reduction technique; (3) method for modeling relations between sets of observed variables by means of latent variables; (4) method for constructing predictive models when there are many highly collinear factors

Moderation

(1) occurs when a third variable affects the strength of relationship between two other variables; (2) example: relationship between X1 and Y is strong, especially if X2 is strong

Mediation

(1) occurs when a third variable acts as generative mechanism between two other variables; (2) the mechanism by which a predictor causes or explains the outcome; (3) example: X1 causes X2 which cause Y

Partial Correlation

(1) relationship between two variables after removing the overlap of the third; (2) examines the relationship between the scores of two variables, while controlling for a third, fourth, or fifth variable; (3) a method for figuring out if there is a relationship between two variables using linear regression; (4) produces an equation similar to simple linear regression

Beta (in multiple regression)

(1) coefficient in the equation generated in a multiple regression; (2) interpreted as the estimated change in y corresponding to a one-unit change in a variable, when all other variables are held constant; (3) treated as absolute value; (4) higher beta means stronger relationship of the independent variable to the dependent variable

B (in multiple regression)

(1) a coefficient that appears in an estimated multiple regression equation; (2) estimate coefficient with respect to beta; (3) standardized value of beta; (4) b is used to understand the theoretical importance of an independent variable (e.g. when variables are measured in units like peso, years, percentages)

Chi-square

(1) an inferential statistical test which is used to test difference or relationship; (2) compares categorical variables; (3) can test relationship and variance; (4) can be used with nominal, ordinal, interval, and ratio data; (5) uses F-critical values; (6) a non-parametric test

ANOVA

(1) stands for analysis of variance; (2) compares means of greater than two sets of data; (3) a special case of MANOVA; (4) only one dependent variable is analyzed; (5) observed variance in a particular variable is partitioned into components attributable to different sources of variation

t-test

(1) a parametric test; (2) checks the average or means of two groups if reliably different; (3) inferential statistics in nature; (4) standard test for testing that the difference between population means for two non-paired samples are equal; (5) used when observations made are less than 30; (6) used when variance is not known

Z-test

(1) a non-parametric test; (2) a test used to compare the means; (3) used when observations made are equal or greater than 30; (4) used when variance is known

Confidence intervals

(1) a range of values that is likely to contain the value of an unknown population parameter, such as the mean, with a specified degree of confidence; (2) communicates how accurate the estimate is likely to be; (3) a narrower confidence interval means more accuracy (e.g. a 95% confidence interval is less accurate than a 99% confidence interval)

Choose cite format:

Business 112. (2019, Jan 19). Retrieved September 24, 2020, from https://midwestcri.org/business-112/