 Rock Street, San Francisco

Business 112 (Managerial Statistics) Study Notes
Define briefly the following terminologies
Population
(1) set of all possible observations for a variable to study; (2) denoted as N; (3) example: all men above 50
Sample
(1) representation of the population; (2) denoted as n; (3) example: only the men of San Beda
Statistics
(1) a branch of applied mathematics; (2) science of collecting, organizing, analyzing, and interpreting data; (3) the goal of statistics is to provide a meaningful picture of the focus of the study
Data
(1) information gained from experiments; (2) examples: age of subjects, weight or gain loss
Variable
(1) any characteristic whose value may change from object to object in a population; (2) example: age, sex
Observation
(1) active acquisition of information from a primary source; (2) employs the senses and recording the data via use of instruments; (3) any data collected during the scientific activity
Nominal Scale
(1) numbers are assigned to objects where different numbers indicate different objects – the numbers have no real meaning other than differentiating between objects; (2) example: male = 1, female = 2; (3) mathematical operation used: equal or not equal
Ordinal Scale
(1) numbers are assigned to objects like nominal, but the numbers also have meaningful order; (2) example: place finished in the race = 1st, 2nd, 3rd ; (3) mathematical operation used: equal or not equal, greater than or less than
Interval Scale
(1) numbers have order like ordinal, but there are also equal intervals between adjacent categories; (2) example: the difference between 78 and 79 degrees Fahrenheit is the same as the difference between 46 and 45 degrees Fahrenheit; (3) mathematical operation used: equal or not equal, greater than or less than, plus or minus
Ratio Scale
(1) differences between numbers are meaningful like interval, but zeroes are also meaningful; (2) example: age, height, sales; (3) mathematical operation used: equal or not equal, greater than or less than, plus or minus, multiply or divide
Categorical data
(1) data that describes the characteristics of the individual or category; (2) only some measures of center apply to categorical data: range, mode, median; (3) examples of categorical data: red, green, ID number, Toyota
Quantitative data
(1) data whose variables can be expressed in numerical terms; (2) examples of quantitative data: price, income, weight
Qualitative data
(1) data whose variables cannot be measured in numerical terms; (2) examples of qualitative data: taste, happiness
Cross-sectional data
(1) type of data collected by observing many subjects such as individuals, firms, countries, or regions at the same point of time or without regard the differences in time; (2) provides a snapshot of the data at the given time; (3) example: expenditures of different chemical plants in the Philippines in 2017
Time-series data
(1) sequence of data; (2) records a variable at a specific equally spaced frequency; (3) recorded over time; (3) example: expenditures of a chemical plant in the Philippines from 2015 to 2018
Descriptive statistics
(1) branch of statistics that consists of gathering, sorting, and summarizing data; (2) quantitatively describe or summarize features of a collection of information; (3) aim to summarize a sample rather than use the data to learn about the population that the sample of data is sought to represent
Population
(1) set of all possible observations for a variable to study; (2) denoted as N; (3) example: all men above 50
Statistical inference
(1) act of using observed data to infer unknown properties and characteristics of the probability distribution from which observed data have been generated; (2) involves using data from a sample to draw conclusions about a wider population
Example of use of statistics in a functional department
(1) Research and Development Department: uses descriptive statistics to describe samples; uses statistical test to confirm and validate hypotheses regarding newly created products; (2) Supply Chain: uses descriptive statistics to describe raw materials and finished goods inventory; uses different kinds of statistical methods for trend projection
class interval
(1) data groupings; (2) non-overlapping grouping of data; (3) example: 1-5, 6-10, 11-15
histogram
(1) interval is drawn as a bar bounded or defined by the class boundaries and the corresponding frequencies; (2) properties: uses quantitative data, no gaps, bar width is equal to the class size
frequency polygon
(1) uses class midpoints to represent intervals; (2) drawn over a histogram
ogive chart
(1) chart generated by graphing class boundaries versus cumulative frequency; (2) allows quick estimation of number of observations that are less than or equal to a particular value; (3) always positive slope; (4) tells growth
pie chart
(1) graphical representation of individual contribution; (2) commonly used to compare composition percentages
Rated scale
(1) scale used to record rank-based statements; (2) assigns a numerical value depending on severity or extremity; (3) examples: Likert, Mohs’ scale for hardness
Likert scale
(1) scale that represents items about the level of agreement; (2) commonly represented as: strongly agree, agree, neutral, disagree, strongly disagree
Mean
(1) sum of all the values divided by the total number of values (the mean commonly used in statistics is arithmetic mean; there are other form of mean like geometric and harmonic means); (2) a central tendency; (3) value from “smoothing” or “flattening” all the different data values into one consistent value
Median
(1) middle value in an arrayed data; (2) a central tendency
Mode
(1) value which occurs with the greatest number of times in a data set; (2) best measure of central tendency
Percentiles
(1) indicates the location of a score in a distribution; (2) ranges from 1 to 99; (3) indicates the percentage of scores that a given value is higher or greater than
Quartiles
(1) divides the variates into four equal parts; (2) Q1 is the lower quartile, Q2 is the middle quartile, and Q3 is the upper quartile
Range
(1) difference between the highest and the lowest values in a data set; (2) describes the distance or width between the two extreme observed values in data
Variance
(1) mean of the squared deviations; (2) a measure of dispersion – the higher the variance, the data is then highly dispersed; (3) denoted by ?2
Standard deviation
(1) measures how much variation exists in a distribution; (2) the lower the standard deviation, the set of data is then close to the mean; (2) square root of the variance
Coefficient of variation
(1) mean over standard deviation; (2) used to compare variation between two different sets of data with different means; (3) the higher the coefficient of variation, the higher the variability
Z-score
(1) used to measure how many standard deviations above or below the mean; (2) a high z-sore means the observation is further from the mean; (3) a measure of variability; (4) z-scores ignore measurement units
Box plot
(1) displays the range and distribution of data along a number line; (2) used to summarize large amounts of data; (3) show lower extreme, lower quartile, median, upper quartile, and upper extreme; (4) displays bunching or spreading of data
Covariance
(1) one of a family of statistical measures used to analyze the linear relationship between two variables; (2) a positive value indicates a direct or increasing relationship; (3) a negative value indicates a decreasing relationship
Correlation coefficient
(1) measures the strength or quality of the linear relationship between two variables; (2) includes Pearson product moment correlation coefficient
Skewness
(1) Measure of symmetry of distributions; (2) data are not symmetrical, thus, one tail is longer than the other; (3) peak is not in the center
Kurtosis
(1) measure of the shape of the curve; (2) measures if the bell of the curve is normal, flat, or peak; (3) a positive value of kurtosis means the curve is peaked; zero means normal; and a negative means flat; (4) the fourth central moment of distribution
Combination
(1) the number of different ways that a certain number of objects as a group can be selected from a larger number of objects; (2) order does not matter
Permutation
(1) the number of different ways that a certain number of objects can be arranged in order from a larger number of objects; (2) order matters
Inferential statistics
(1) branch of statistics that uses probability to determine whether it is likely that a particular sample or test outcome is representative of the population; (2) making inferences about the population beyond the data; (3) used to test validity of hypotheses
p-value
(1) a measure of the strength of the evidence against the null hypothesis; (2) probability that if the null hypothesis were true, sampling variation would produce an estimate that is further away from the hypothesized value than data estimate; (3) small p-value indicates strong evidence against the null hypothesis
level of significance
(1) the probability that the test statistic will fall in the critical region when the null hypothesis is actually true; (2) probability of making a type I error; (3) selected cut-off point that determines whether one should consider a p-value acceptably high or low; (4) denoted by the Greek letter alpha; (4) smaller value of alpha means more difficult to reject the null hypothesis, yet, type II errors will be common
alpha error
(1) also known as type I error; (2) the error of rejecting the null hypothesis when in fact, it is true; (3) the probability of committing an alpha error is called the significance level, denoted by the Greek letter alpha
beta error
(1) also known as type II error; (2) the error of accepting the null hypothesis when in fact, it is false; (3) beta value depends on a number of factors including the choice of alpha, the sample size, and the true value of the parameter
probability
(1) number that estimates the chances that the event will happen; (2) frequency at which some event happens out of a greater number of outcomes
random sampling
(1) involves selecting the sample at random from the sampling frame using draw lots, computer, any random number generator; (2) no bias; (3) equal chance of being selected
purposive sampling
(1) also known as judgmental sampling; (2) use of personal judgment to select samples; (3) often used when working with very small samples such as in case study research
mutually exclusive events
(1) only one outcome; (2) occurrence of one event precludes the occurrence of the other; (3) example of a mutually exclusive event is a coin toss: result will either be heads or tails, but not both
conditional probability
(1) probability that is conditional on other events; (2) the probability of an event based on the occurrence of a previous outcome; (3) conditional probability is usually analyzed using a Venn diagram
normal probability distribution
(1) also known as normal distribution or Gaussian distribution; (2) a continuous probability distribution; (3) characteristics: single-peak, symmetrical, tapered; (4) completely defined by mean and standard deviation; (5) used to find the area under the function as to calculate the probability of a specified range of distribution
normal curve
(1) symmetrical bell-shaped curve of a normal distribution; (2) distribution of data with most of the scores clustered around the middle
Hypothesis testing
(1) statistical method that uses sample data to evaluate a hypothesis about a population; (2) allows to use a sample to decide between two statements made about a population characteristic – these two statements are called the null hypothesis and the alternative hypothesis
Null hypothesis
(1) expresses the equality with the predetermined standard or required value of the parameter; (2) the sample mean is the same as the population
Alternative hypothesis
(1) states the deviation from the required value which is of concern to the researcher; (2) the sample is not the same as the population
Type I error
(1) also known as alpha error; (2) the error of rejecting the null hypothesis when in fact, it is true; (3) the probability of committing an alpha error is called the significance level, denoted by the Greek letter alpha
Type II error
(1) also known as beta error; (2) the error of accepting the null hypothesis when in fact, it is false; (3) beta value depends on a number of factors including the choice of alpha, the sample size, and the true value of the parameter
Two-tailed test
(1) test to check if calculated value is either above or below where it is expected to be; (2) does not specify direction (e.g. greater than or less than), only equal or not equal
One-tailed test
(1) test to check if calculated value is above or below where it is expected to be; (2) specifies direction (e.g. greater than or less than)
Give 5 Steps in hypothesis testing
(1) define null and alternative hypothesis, (2) state level of significance, (3) state decision rule, (4) calculate test statistic, (5) state conclusion
Parametric test
(1) test for hypothesis which provide generalizations for making statement about the mean of the population; (2) assumes normal distribution (e.g. z-test, t-test, ANOVA); (3) suitable when data are interval or ratio scaled
Non-parametric test
(1) does not use normal distribution (e.g. Kolmogorov-Smirnov test); (2) suitable for any continuous data; (3) used when there is no knowledge about the population or parameters
Level of confidence
(1) the percentage that a statistical result would be correct; (2) higher level of confidence means that greater certainty; (3) commonly used confidence level is 95%
Degrees of freedom
(1) the number of values in the final calculation of statistic that are free to be varied; (2) the number of independent ways by which a dynamic system can move without violating any constraint imposed on it; (3) high values of degrees of freedom means the closer the t-distribution resembles the normal distribution; (4) number of samples less groups
Critical value
(1) the value of the random variable at the boundary between the acceptance region and the rejection region in the testing of a hypothesis; (2) depends upon test statistic
Cronbach’s alpha
(1) commonly used to assess the internal consistency of a questionnaire or survey; (2) minimum accepted value for Cronbach’s alpha is 0.7; below this value, the internal consistency of the common range is low; on the other hand, maximum value of Cronbach’s alpha is 0.9; beyond this, there is a perceived redundancy or duplication; (3) generally increases when the correlations between the items increase
Test of Normality
(1) test if set of data is normally distributed or not; (2) examples include using histogram, skewness, kurtosis, chi-square, and Shapiro-Wilk’s test
Shapiro-Wilk’s Test
(1) a test for normality; (2) uses a null hypothesis and sample comes from the normal distribution; (3) test statistic is denoted by W; (4) if p-value is less than the Shapiro-Wilk coefficient, then, the distribution is not normal
Collinearity
(1) in multiple regression, the relationship of independent variable/s to a dependent variable; (2) occurs when two independent variables in a multiple regression have no correlation
Multi-collinearity
(1) in multiple regression, the relationship of an independent variable to a dependent variable and other independent variables; (2) occurs when more than two independent variables are inter-correlated
Variance Inflation Factor (VIF)
(1) used in testing for multi-collinearity; (2) quantifies the severity of collinearity; (3) ratio of variance with collinearity and variance without collinearity
Durbin-Watson Test
(1) a test for auto correlation; (2) test statistic used to detect the presence of a relationship between values separated from each other by a given time lag in the prediction errors a regression analysis
Homoscedasticity
(1) literally means “same variance”; (2) one of the assumptions used in regression; (3) variance around the regression line is the same for all values of the predictor variable (x)
Barlett’s Test
(1) a test for homogeneity of variances; (2) used to test whether or not samples have equal variances; (3) similar to the two-sample F test but allows multiple group comparison of variances; (4) more powerful than Levene’s test
Box M Test
(1) a test for homogeneity of covariance matrices; (2) not designed to be used for linear model context; (3) an insignificant value of Box’s M test shows that those groups do not differ from each other and would meet the assumption
Koenker-Bassett Test
(1) a test for heteroscedasticity; (2) similar to Breusch-Pagan test, except the residuals are made robust to outliers/non-normality; (3) if result of Koenker-Bassett test is significant, it indicates that one of the variables may be a strong predictor in some areas, but weak in the others
Breusch-Pagan Test
(1) a test for heteroscedasticity; (2) squares of the explanatory variables to determine if there is non-constant variance in the errors; (3) if result of Breusch-Pagan test is significant, it indicates that one of the variables may be a strong predictor in some areas, but weak in the others
Tolerance
(1) limits that create an interval that bounds a specified percentage of the population at a given level of confidence; (2) often used to demonstrate compliance with a set of requirements or specification limits; (3) a low tolerance level means errors are given more chance to be incorporated
Sphericity
(1) an assumption in repeated measure ANOVA; (2) accounts for random variation and error associated with measurement in inferential statistics; (3) condition where the variances of the differences between all combinations of related groups are equal
Mauchly’s test
(1) a test for sphericity; (2) tests the null hypothesis that the variances of the differences are equal; (3) if test is statistically significant, then, the variances of the differences are not equal
Ordinal Regression
(1) denotes a family of statistical learning methods in which the goal is to predict a variable which is discrete and ordered; (2) suitable when outcome is ordinal (e.g. mild, moderate, severe); (3) requires assuming that the effect of the independents is the same for each level of the dependent
Binary Logistic Regression
(1) a type of regression analysis where the dependent variable is a dummy variable (represented as 0,1); (2) models the relationship between a set of predictors and a binary response variable (e.g. win or lose); (3) used to understand how changes in the predictor values are associated with changes in the probability of an event occurring
Canonical Correlation
(1) a correlation between two sets of variables; (2) seeks the weighted linear composite for each set of dependent variable or independent variable to maximize the overlap in their distribution; (3) goal is to maximize correlation; (4) assumes linear relationship between any two variables and between variates (dependent or independent variable)
Factor Analysis
(1) a correlational method used to find and describe the underlying factors driving data values for a large set of variables; (2) identifies correlations between and among variables to bind them into one underlying factor driving their values; (3) large number of variables may be reduced to only several factors
Eigenvalues
(1) scalar numbers, usually real numbers; (2) represent stretching factors which the matrix A stretches the length of a particular vector; (3) represented like matrices, except it uses parentheses; (4) in statistics, eigenvalues are the variances of the factors in factor analysis
Cluster analysis
(1) process of grouping observations of similar kinds into smaller groups within the larger population; (2) like simple classification but the class label of each observation is not known; (3) often used in conjunction with discriminant analysis
Multiple Regression
(1) an extension of simple linear regression; (2) relationship is many one: one dependent variable to two or more independent variable; (3) two or more independent variables are used to predict the variance in one dependent variable; (4) each coefficient is interpreted as the estimated change in y corresponding to a one-unit change in a variable, when all other variables are held constant
MANOVA
(1) a multivariate procedure; (2) assumes individual group covariance matrices are equal; (3) stands for multivariate analysis; (4) comparing one or more dependent values across two or more groups
Discriminant Analysis
(1) a multivariate procedure; (2) assumes individual group covariance matrices are equal; (3) statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables)
Structural Equation Modeling
(1) a multivariate procedure; (2) general statistical modeling techniques used to establish relationship among variables; (3) a confirmatory technique – tests if the theory fits the data
Correspondence Analysis
(1) a multivariate procedure; (2) allows examination of the relationship between two nominal variables graphically in a multidimensional space; (3) a technique for graphically displaying a two-way table by calculating coordinates representing its rows and columns
Conjoint Analysis
(1) also known as multi-attribute composition model; (2) factors and their values are defined by the researcher in advance and the various combinations of the factor values yield fictive results; (3) popularly used in market research; (4) dependent variable: object attributes; independent variable: preferences
Spearman’s rank correlation
(1) a test of relationship; (2) measures the relationship between two continuous (interval or ratio) variables; (3) used to measure relationship between two ordinal variables or two variables related, but not linearly; (4) ranges from -1.0 to 1.0; (5) larger values means more linear association between the rankings; (6) a non-parametric test
Kendall’s Tau
(1) a coefficient that represents the degree of concordance between two columns of ranked data; (2) in Kendall’s Tau, the greater number of inversions, the smaller the coefficient will be; (3) ranges from -1.0 to 1.0; (4) larger values means more linear association between the rankings
Wilcoxon Rank Sum Test
(1) a non-parametric test; (2) test used to know which variable is more likely to exceed the other one, given two variables; (3) similar with the Mann-Whitney U test except in Wilcoxon Rank Sum Test, it uses paired data
Mann-Whitney U Test
(1) a non-parametric test; (2) alternative to t-test that can be applied when populations are not normal, particularly for small samples; (3) a test to compare populations means between two independent groups; (4) a test of both location and shape; (5) a test whether one variable tends to have values higher than the other
Partial Least Squares
(1) a form of Structural Equation Modeling; (2) a dimension reduction technique; (3) method for modeling relations between sets of observed variables by means of latent variables; (4) method for constructing predictive models when there are many highly collinear factors
Moderation
(1) occurs when a third variable affects the strength of relationship between two other variables; (2) example: relationship between X1 and Y is strong, especially if X2 is strong
Mediation
(1) occurs when a third variable acts as generative mechanism between two other variables; (2) the mechanism by which a predictor causes or explains the outcome; (3) example: X1 causes X2 which cause Y
Partial Correlation
(1) relationship between two variables after removing the overlap of the third; (2) examines the relationship between the scores of two variables, while controlling for a third, fourth, or fifth variable; (3) a method for figuring out if there is a relationship between two variables using linear regression; (4) produces an equation similar to simple linear regression
Beta (in multiple regression)
(1) coefficient in the equation generated in a multiple regression; (2) interpreted as the estimated change in y corresponding to a one-unit change in a variable, when all other variables are held constant; (3) treated as absolute value; (4) higher beta means stronger relationship of the independent variable to the dependent variable
B (in multiple regression)
(1) a coefficient that appears in an estimated multiple regression equation; (2) estimate coefficient with respect to beta; (3) standardized value of beta; (4) b is used to understand the theoretical importance of an independent variable (e.g. when variables are measured in units like peso, years, percentages)
Chi-square
(1) an inferential statistical test which is used to test difference or relationship; (2) compares categorical variables; (3) can test relationship and variance; (4) can be used with nominal, ordinal, interval, and ratio data; (5) uses F-critical values; (6) a non-parametric test
ANOVA
(1) stands for analysis of variance; (2) compares means of greater than two sets of data; (3) a special case of MANOVA; (4) only one dependent variable is analyzed; (5) observed variance in a particular variable is partitioned into components attributable to different sources of variation
t-test
(1) a parametric test; (2) checks the average or means of two groups if reliably different; (3) inferential statistics in nature; (4) standard test for testing that the difference between population means for two non-paired samples are equal; (5) used when observations made are less than 30; (6) used when variance is not known
Z-test
(1) a non-parametric test; (2) a test used to compare the means; (3) used when observations made are equal or greater than 30; (4) used when variance is known
Confidence intervals
(1) a range of values that is likely to contain the value of an unknown population parameter, such as the mean, with a specified degree of confidence; (2) communicates how accurate the estimate is likely to be; (3) a narrower confidence interval means more accuracy (e.g. a 95% confidence interval is less accurate than a 99% confidence interval)

Choose cite format: 