Descriptive
Past data only; Not predicting or optimizing
Predictive
Past to predict the future; Predicting, no optimizing
Prescriptive
Past to predict the future and optimizing
Omission
Missing information
Out of Range
Doesn’t match the data or not true
Reliable
Constant and repeatable. A measure of the instrument
Valid
Measures what is intended to be measured
Measurement bias
Includes representative sample, random, large enough sample
Information bias
Ignore the purpose of the information collection; not truthful answers
Big Data
Both structure and unstructured; to large to process using traditional database and software techniques
Data mining
Process of discovering pattern in large data sets
Why collect big data?
Used to encourage buying behavior
Analytics
the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and and add value
Variable
An expression that can be assigned to data
Continuous data
Data that can lay along any point in a range of data (age 22.6 old)
Discrete data
Whole values only and clear boundaries
Nominal data
Categorical data used to label subjects in a study; discrete (male female)
Ordinal data
Allows you to place objects on some in some kind of order according to some quality; discrete (black belts 3rd degree higher than 1st degree)
Interval data
has order; all objects are equal interval apart; no natural zero point and zero does not represent the absence of the property measured; Continuous (time, date, temperature)
Ratio data
Has a unique zero point – numbers can be compared as multiples of one another, continuous (income, stock, repeat customers)
Observational studies
used when impractical and impossible to control the conditions of the study
Prospective cohort study
Observe people going forward in time from the time of their entry into the study
Experimental studies
All variable measurements and manipulations are under researcher’s control.
Experimental studies: Experimental units
Subjects or objects under observations
Experimental studies: Experimental treatments
Procedure applied to each subject
Experimental studies: Responses
Effects of the experimental treatments
1st step of statistical experiment
Identify the experimental units from which you want to measure something
2nd step of statistical experiment
Id the treatments and controls that you will use on control group
3rd step of statistical experiment
Generate a testable hypothesis
construct validity
study actual measure what is being investigated
content validity
Construct measures what it claims to measure
Internal validity
Biases may have entered the study
Blind study
participants are not told if they are in treatment group or control group
Double blind
neither treatment allocator nor participant know which group participant is in
Triple blind
participant, allocator and response gather do not know which group the participant is in
Random errors
occur because of random and inherently unpredictable events in the measurement process
Correlation
extent or degree of statistical associations among two or more variables
systematic errors
errors in measurement at are constant within a data set, sometimes caused by faulty equipment or bias
Skewness
a measure of the degree to which data “leans” toward one side
Not a truly representative sample
sample not representative of entire population
Response bias
respondents say what they believe the questioner wants to hear
Conscious Bias
When surveyor actively seeks a response; Researcher manipulates phrasing of question
Missing data and refusals
Sample gets lost or subjects refuses to contribute; distorts survey data when demographic is missing, leads to false conclusion
Association and Causation
the mistaken assumption that because two events seem to occur together, one causes the other
Training and test data
Data used to form hypothesis is used again to test hypothsis
Unfounded assumptions
Assumption is made that has not been proven
Faulty operationalization
Occurs when the development of specific research procedures that allow for observation and measurement of abstract concepts is flawed.
Lack of binding
when users fail to place barriers between themselves and subjects and influence behavior
Confidence interval
Range around a sample that has a specific probability of containing the true population mean
Probibility
The chance of event occurring
Vehn diagram
Mathematical sets or events visually
inferential statistics
making predictions and testing theories about a population from a population
Descriptive statistics
Statistics that are used to describe a population from the whole population
Mean
sum of all numbers divided by how many numbers
Median
middle number
Mode
most common number
Deviation score
score minus the mean
Variance
statistical measure of the spread of a set of data. Find difference in data point, plus mean; square them
Range
Take largest number from sample and subtract smallest
Emperical Rule
68.3% of data points will be w/i 1 standard deviation of the mean; 95.4 w/i 2 standard deviations and 95.7 will be w/i 3 standard deviation
z-score
a measure of how many standard deviations you are away from the norm (average or mean)
z-score formula
relay from memory (z-score)
Quartile
each of four equal groups into which a population can be divided
inter-quartile range
measures difference between the third and the first quartile
Outlier
an observation point that is significantly distant from the other data set
Histogram
A graph of vertical bars shows counts or numbers in each range, continuous data.
Bar chart
Measures distribution of discrete data
bivariate charts
have vertical y axis and horizontal x axis that measures 2 variables; independent variable is on the x-axis
hypothesis
Statement or claim about a given population
Null hypothesis
Argument there is no difference between two samples or that a sample has not changed over time
Alternative hypothesis
argument that states that a sample is not equal to the hypothesized null sample
Statistically significant
A result is unlikely to be caused by random variation or error
T-test
a statistical test used to evaluate the size and significance of the difference between two means
one sample t test
Used to determine if a single sample mean is different from a known population mean
Chi-squared test
A hypothesis test that is used to examine the distribution of categorical data
Anova
used to determine if there is a significant difference among three or more means
Regression analysis
A method of predicting sales based on finding a relationship between past sales and one or more independent variables, such as population or income
Time Series Analysis
A forecasting method that uses historical sales data to discover patterns in the firm’s sales over time and generally involves trend, cycle, seasonal, and random factor analyses
Cluster analysis
The process of arranging terms or values based on different variables into “natural” groups
Decision analysis
Weighing all outcomes of a decision to determine the best course of action
Decision tree
a graph of decisions and their possible consequences; it is used to create a plan to reach a goal
Expected value
the sum of each possible outcome of a future event, weighted by its probability of occurring
Dependent value
Value depends on other variables in the equation
Independent value
variables presumed to influence the dependent value
Auto correlation
the correlation of current demand values with past demand values
Homoscedasticity
A regression in which the variances in y for the values of x are equal or close to equal
Heteroscedasticity
A regression in which the variances in y for the values of x are not equal
Linear programming
mathematical technique used to find a maximum or minimum of equations; used for time, money space
Crossover analysis
Allows a decision maker to identify the crossover point, which represents the point at which we are indifferent between the plans.
Trend
General slope upward or downward over long period
Cyclincality
Repetition of up or down movements that follow or counteract a business cycle that can last several years
Seasonality
Regular pattern of volatility, usually within a single year.
irregularity
One-time deviations unforeseen
Random variation
Variability of a process caused by irregular fluctuations that cannot be anticipated, detected or eliminated .
Quality Management Principle
Business paradigm that focuses on production / service quality and the means to achieve it
Plan-Do-Check-Act
Four step method for testing hypothesis and solving problems
PDCA: Plan
Step 1 id problem and develop plans to solve
PDCA: Do
Step 2 Run an experiment to see if plans will work on small scale before implementing
PDCA: Check
Step 3 Analyze the results, make improvements
PDCA: ACT
Step 4 Enact change on large scale in normal operations
Quality control
Process such as statistical sample that monitors the quality of operations
Quality assurance
The function responsible for providing assurances that products or services are consistently maintained at a high level of quality.
SIPOC
Suppliers, Inputs, Process, Outputs, Customers
SIPOC diagram
A diagram that defines the boundaries of a process and shows how its Suppliers, Inputs, Processes, Outputs, and Customers affect process quality.
Statistical process control
methods that rely on statistics and measurements to monitor work and analyze improvements to processes.
Metrics
Measurements that allow teams to gauge results objectively.
Attribute data
Data that shows whether a result meets a requirement or not (yes/no, pass/fail).
Variable data
Data that shows how well a result meets a requirement, often shown on a scale or as a rating.
Common cause variations
Variations in quality that arise from random natural differences users will tolerate
Special Cause Variation
Abnormal variation that is not a natural part of a process.
Control limits
The area composed of three standard deviations on either side of the centerline or mean of a normal distribution of data plotted on a control chart, which reflects the expected variation in the data.
The run chart
tool for tracking results over a period of time, uncover trends or aberrations
The control chart
A modified run chart that also provides upper and/or lower limits that a process should not exceed.
Cause & effect diagram
A decomposition technique that helps trace an undesirable effect back to its root cause.
Flowchart
Graphic representation of the steps that make up a process (redundancies & problems)
The Check Sheet
Structured form used to count how many times an event or problem happen
Pareto Chart
a bar graph whose bars are drawn in decreasing order of frequency or relative frequency
Lean
Eliminate anything that does not add value for customers or satisfy their needs; View is from customer
Six Sigma
Application of metrics and statistic to evaluate and control the variation found
Lean six-sigma
Combines leans enhancement on customer value with six-sigma optimization of work
ISO Certification
Internationally recognized standards that ensure a company’s goods, services, and operations meet established quality levels and its operations minimize harm to the environment.
Index numbers
measure the change in quantity or price over time for a good or a number of goods and services
Consumer price index
an index of the cost of all goods and services to a typical consumer
Base period
A period in time used as a point of reference when being compared to other time periods.
Simple index number
shows the change in price or quantity of a single good or service over time
Simple index formula
formula for simple index
Simple composite index
created when a researcher gathers data from many different sources without weighing any data more significantly than any other data
Epidemiology
the study of the incidence, distribution, and possible control of diseases
rate
measure of an event occurring over a period of time
Prevalence
fraction of a population having a specific disease at a given time
Incidence
The number or rate of new cases of a particular condition during a specific time.
cumulative incidence
The incidence calculated using a period of time during which all of the individuals in the population are considered to be at risk for the outcome
Incidence rate
number of people contracting a disease during a time period
Net promoter score
A management tool designed to collect data indicating the relative loyalty of customers and their willingness to recommend a company’s products or services.
R-squared
goodness of fit
Analytics is a board term that refers to a variety of tools that inform managerial decisions. Which term can be used to describe managerial decisions?
Prescriptive
What are two reasons for increasing use of analytics in organizational decision-making? Choose 2 Answers
Relatively lower cost of computer storage
Higher computer processing power
How does probability theory inform decision-making for managers?
By quantifying risk
Which type of data are the Olympic medals of gold, silver, and bronze examples of?
Ordinal data
What are two aspects of data quality management? Choose 2 answers
It reduces the amount of incomplete data
It cleans data
Which two attributes indicate potential data quality issues when evaluating a set of nominal data? Choose 2 answers
Missing data
Misspelled data
When conducting a study that measures an individual’s weight, all scales are calibrated prior to use in measurement. Which type of error should this procedure significantly reduce?
Systematic error
An advertising manager creates a research study by presenting low, medium, or high frequency of the same ad in matched markets. The manager then reports on sales in each market location. What is the term for the different sales in this study?
Response variable
A healthcare study follows a particular sample over time to identify how the health habits of teenagers impact their likelihood of acquiring various diseases later in their life. The healthcare organization hopes this data will allow them to create early prevention programs. Which type of research design does this study describe?
Cohort study
Several missing values in a particular field in a dataset were observed. The likelihood of a record having a missing value is correlated to another variable in the dataset. Which two types of error might be introduced into the dataset if the removed record included missing values? Choose 2 answers
Systematic error
Omission error
A company is trying to increase its online sales revenue by improving its email advertising campaign for repeat customers as well as new customers. Which two variables would be used in determining a campaign to maximize revenue gain? Choose 2 answers
Number of purchases per email
Spending per purchase
A company that raises turkeys is hoping to increase the rate of growth of the turkeys while controlling the cost of feeding them. It has determined that feeds containing both nutrients and proteins can be used. Which decision-making technique is most appropriate for this company to minimize the cost of feeding the birds?
Linear programming
Match each actions with the appropriate statistical procedure
Answer options may be used more than once
ANOVA- Compare outcomes of different drug testing results
Correlation- Understand effective marketing to spend advertising dollars efficiently
Control Charts- Monitor production process
A researcher concludes that bananas create healthy children because the researcher’s notes indicate that children who eat a banana every day are healthy.
What misuse of statistics would this study be an example of if these children also eat a balanced diet and exercise daily?
Confusion of association and causality
In the month of December, there is a strong positive correlation between airline ticket sales and retail sales. Which question should the researcher consider before concluding that the correlation statistic implies that airline sales drive retail sales?
Does a causal relationship truly exist?
A medical care provider determines the probability that a patient needs treatment for a broken bone, the probability that a patient needs treatment for a concussion, and the probability that a patient needs treatment for both a broken bone and concussion. Which two techniques apply in finding the probability that a patient needs treatment for a broken bone and a concussion? Choose 2 answers
Intersection
Multiplication principle
A hardware store has found that there is a 0.25 probability that a customer buys and electrical product, a 0.45 probability that a customer buys a plumbing product, and a 0.10 probability that a customer buys an electrical product if they purchase a plumbing product. Which statistical rule can be used to determine the probability that a customer buys a plumbing product given that this customer has purchased an electrical product?
Bayes’ theorem
Given the following data set:
9,12,14,10,8,11,12
What is the mode?
12
Which graphical tool is used to illustrate a possible relationship or correlation between two variables?
Scatterplot
What does it mean when an individual data point has a z-score of -2?
The data point is two standard deviations less than the mean of the data set
A research study examined the impact of product advertisement exposure on the product’s brand awareness. The appropriate statistical test indicates that the null hypothesis should be rejected at the 5% significance level. What can be concluded from this study?
The advertisement was effective in building brand awareness
A manager analyzes a data set that includes information on individual incomes. The manager knows that the data set is fairly representative of the general population and includes several millionaires. Based on this data set, which measure of central tendency best represents the middle of the distribution?
Median
After evaluating manufacturing times for a particular product, a manager determines that the times are spread out across the distribution. The manager has been asked to determine how far, on average, the time is from the mean. Which statistic roughly measures the average distance of a data point from the mean of the distribution?
Standard Deviation
A nonprofit organization ran an email campaign with three different messages to solicit additional donations. What should the nonprofit organization use to determine if the average donation differs for the different messages?
ANOVA
A manager of a call center is in charge of creating a staffing plan. The number of calls received per day is normally distributed. Which two statistics would be needed to estimate the number of calls that would be received 95% of the time? Choose 2 answers
Standard Deviation
Mean
a manager uses a linear regression to examine how the store’s retail sales are predicted by advertising expenditures. Which type of variable do retail sales represent in this regression?
Dependent Variable
An analyst used multiple linear regression to explore how Store A’s sales (y) are predicted by Store A’s advertising expenditure dollars (variable x1) and the advertising expenditure dollars of Store A’s competitor (variable x2).
The estimated regression is y= 532 + 80.5 x1 – 35.6 x2.
How much sales would be predicted if x1 is $1,000 and x2 is also $1,000?
$45,432
A company decides to auction excess inventory on an auction website. The company has performed a regression analysis to identify how the length of the auction impacts the final price. Which statistic indicates the strength of the relations hip between the length of the auction and final price?
R-squared
A researcher looks at moving average data on store sales and wishes to perform a multiple regression of interest rates and disposable income. What is a particular concern when performing time series multiple regression?
Autocorrelation
A retail store notices a spike in turkey sales every November. Which time series pattern are turkey sales likely to exhibit?
Seasonal
A researcher wants to predict student test scores based on hours spent studying. Which type of regression would be more appropriate?
Ordinary least squares regression
A doctoral student is surveying chief executive officers (CEOs) to understand their relationships with their governing boards. The student receives responses to a survey with 10 questions that rates the respective relationships. Why would the student measure the standard deviation of repsonses?
To measure the spread of the data
A doctoral student surveys chief executive officers (CEOs) to understand their relationships with their governing boards. The student determines the years of business experience for each CEO as well as their rating, on a 10-point scale, of satisfaction with the governing board. Which statistical approach should be used to display the data for the analysis?
Scatterplot
Which two decision considerations describe fact- based decisions-making, according to the quality management principles? Choose 2 answers
Decisions reduce external bias
Decisions foster trust in plans
Which two statements describe how the dedication of leadership and a focus on process enhance quality? Choose 2 answers
It ensures clearly aligned goals
It makes results easier to manage and achieve
An organization is concerned about whether quality control standards are being met and develops a strategy to test quality control metrics. Which step does this represent in the plan-do-check-act cycle?
Plan
Which phrase is a description of quality assurance?
Focuses on training
A joint venture is established between two firms to use their core competencies to increase their market share. How would a SIPOC (supplier, inputs, process, outputs, and customers) diagram benefit the joint venture?
Provides a holistic view of the entire operation
Ina statistical process control analysis, sample data are collected from an assembly line and measured to see if they fall within a tolerated measurement range. If an observation does fall within the range, a “yes” is recorded. If it falls outside the range, a “no” is recorded. Which kind of data do the “yes” and “no” represent?
Attribute data
A soft-drink manufacturer performs a control chart analysis and the results indicate that the soft-drink bottles are consistently under-filled by a large amount according to specifications. The system was evaluated three months prior and was determined to be stable and filling bottles within accepted limits.
What is this consistent under-filling an example of?
Special cause variation
Which of Ishikawa’s seven basic tools of quality is used to illustrate performance measurements over time?
Run Chart
A college is reviewing statistics concerning student retention. The college would like to determine the most important factors that cause students to leave. The college asks a researcher to display this information using one of the seven basic tools of quality. Which tool should be used in this case?
Pareto chart
A check sheet indicates that a 100 returned items, 50 were damaged upon delivery, 30 were the wrong size, 10 were poor quality, 5 were mistaken order, and 5 were returned due to customer no longer wanting the item. What is the appropriate way to represent these data?
Construct a Pareto chart
A hotel chain is interested in improving its customer service by reducing the amount of time it takes customers to check in. Which analysis technique should they use?
Flowchart
Which approach focuses on eliminating activities that fail to add value or satisfy customers?
Lean Operations
An organization develops a new strategic plan and seeks ways to measure its performance over a specific period of time. Which system enables the organization to measure performance based upon established global standards?
ISO
Which approach uses financial, customer, internal business processes, and innovation/learning measures?
Balance scorecard
Which two items represent steps of results-based management?
Define Resources
Study the long term effects of the output
What is a description of data mining?
It discovers patterns or trends in large data sets
Which common analytic measure is used to improve business performance?
Simple composite index
A national manufacturer is building an overseas factory to be closer to one of its largest markets. Which analysis technique would be appropriate when evaluating location options under risk in order to strengthen the firm’s competitiveness in the new market?
Develop a decision tree
What is the most important analytic to determine the success of failure of a particular year’s flu vaccine?
Incidence accounting for all new case
The management for Hospital A conducted a survey of its patients’ opinions including gathering demographic data to determine which programs should be pursued during the upcoming fiscal year. Upon analyzing the response on the need for a cosmetic surgery program for the hospital, it was found that there was an r score of 0.75, and a p of 0.03 between a patient’s income level and support for the creation of a cosmetic surgery department. What is an effective strategic decision based on an analysis of the given information?
There is a strong positive relationship for a cosmetic surgery program as a function of income, and this program should be developed in high income areas
What does True Score Theory state about a test without systematic error?
The observed score is the true score plus random error
The average test score of students taking an exam was 70% with standard deviation of 8.5%. Which tool should be used to compare a student’s score to the group average?
Z-score
What is true cost-effective analysis?
It analyzes the cost of achieving a quantifiable goal
A county government must reduce spending. It wishes to eliminate incremental budgeting and align budget allocations with each of its units’ strategic business objectives. How can the county apply data analytic approaches to attain this goal?
By benchmarking like strategies of successful counties.
A county government wishes to convert government vehicles from gasoline to natural gas. How can the county evaluate the effectiveness of the conversion?
By estimating the payback period
How can organizations implement an effective performance management evaluation system?
By establishing a balanced scorecard
Which key performance indicator can a hotel chain use to measure its ability to meet client tastes and perferences?
Survey customer satisfaction upon checkout
How do balanced scorecards differ from KPI dashboards?
KPI dashboards provide visual representation of KPIs, such as charts and graphs
What is a significant disadvantage of KPIs?
They require significant ongoing maintenance
What is a disadvantage of a balanced scorecard?
It requires significant time and effort to establish a meaningful scorecard.
In what ways are KPI dashboards useful in performance assessment? Choose 2 answers
They are visual representations of key areas of focus
They show trends that represent organizational results over time
A professional services firm is undergoing a business process improvement exercise to improve productivity, staff morale, and client satisfaction. Which technique should the firm use to evaluate the strength of customer service relations?
Net promoter score
A professional services firm wants to track and monitor important financial performance measures of the company (e.g., year-over-year change in revenues and profits). Which performance approach would meet the company’s objective?
Results-based management