ISYE6414 REGRESSION MIDTERM 2 EXAM 2022-2024 / ISYE6414 MIDTERM 2 REAL EXAM QUESTIONS AND 100% CORRECT ANSWERS/ GRADED A

isye 6414 midterm 2 cheat sheet
isye 6414 midterm 1
isye6414 midterm 2
isye 6414 midterm 1 2023
isye 6414 midterm 1 cheat sheet
isye 6414 final exam part 2
isye 6414 regression analysis
isye 6414 notes

What is cooks distance used for?
It measures how much all of the values in the regression model change with the ith observation is removed. Basically its a test for outliers

Rule of thumb: D denotes cooks distance, if D is > 4/n
OR D > 1 or any large D then it may be an outlier and should be removed.

If the normality assumption does not hold, we can pursue a transformation in the response variable. T/F
True

If the linearity assumption does not hold, we can pursue a transformation in the response variable. T/F
False, we pursue a transformation in the predictor variables.

R^2 will always increase if we add more predicting variables. T/F
True

If we want to compare models with different numbers of predicting variables, what statistic should we use?
Adjusted R^2 because it adjusts for the number of predicting variables. It doesn’t increase when we add more predicting variables.

A statistic that effectively summarizes how well the X’s are linearly related to Y is the correlation coefficient. T/F
True

T/F – The correlation coefficient cannot be used to evaluate the correlation between the predicting variables for detecting (near) linear dependence among the variables (or multicolinearity)
False, it CAN

How do you diagnose multicolinearity?
Calculate the VIF (variance inflation factor) for each predicting variable

VIF = 1 / (1 – R^2j)

If VIF < max(10, 1 / (1- R^2)) then we got a problem

If a variable is correlated but does not have multicolinearity, is this a problem?
Not necessarily bruh

What does the VIF measure
the VIF measures the proportional increase in the variance of beta hat compared to what it would have been if the predicting variables had been completely uncorrelated.

True/False: The response variable in logistic regression is a binary response?
True

True/False: In logistic regression, we model the probability of a success given the predicting variables, not the response itself.
True

What are the assumptions for logistic regression?
Linearity Assumption
Independence Assumption
The G-Link function is a logit function Assumption

What is the logit function?
ratio between the probability of success over probability of a failure. So basically ratio between log of P over 1-p

What is the interpretation of the logistic regression coefficient?
The log of the odds ratio for an increase of one unit in the predicting variable. We do not interpret beta with respect to the response variable but with respect to the odds of success.

How many regression coefficients are there for logistic regression?
Since there is no error time, you have P + 1 with intercept.

Logistic regression is different from standard linear regression in that
a) It does not have an error term
b) The response variable is not normally distributed.
c) It models probability of a response and not the expectation of the response.
d) All of the above.
d) all of the above

Which one is correct?
a) The logit link function is the only link function that can be used for modeling binary response data.
b) Logistic regression models the probability of a success given a set of predicting variables.
c) The interpretation of the regression coefficients in logistic regression is the same as for standard linear regression assuming normality.
d) None of the above.
b) Logistic regression models the probability of a success given a set of predicting variables.

In logistic regression,
a) The estimation of the regression coefficients is based on maximum likelihood estimation.
b) We can derive exact (close form expression) estimates for the regression coefficients.
c) The estimations of the regression coefficients is based on minimizing the sum of least squares.
d) All of the above.
a) The estimation of the regression coefficients is based on maximum likelihood estimation.

Using the R statistical software to fit a logistic regression,
a) We can use the lm() command.
b) The input of the response variable is exactly the same if the binary response data are with or without replications.
c) We can obtain both the estimates and the standard deviations of the estimates for the regression coefficients.
d) None of the above.
c) We can obtain both the estimates and the standard deviations of the estimates for the regression coefficients.

The maximum likelihood estimator can be approximated by the normal distribution. T/F
True

What is the test for statistical significance given all predictors in a logistic regression model?
Z-test (wald test)

Statistical inference is reliable on logistic regression models with a small N. T/F
False

How do you calculate the deviance statistic for a model with a subset of predictors in logistic regression?
The deviance is calculated as the difference in the log likliehood under the reduced model and the log liklihood under the full model.

The distribution of the deviance test statistic is a chi-squared distribution. T/F
True

How many degrees of freedom does the deviance test statistic have?
q degrees of freedom where ‘q’ is the number of regression coefficients discarded from the full model to get the reduced model.

The hypothesis test for a subset of coefficients in logistic regression is not approximate. T/F
False, it is approximate.

Can you use the hypothesis test typically used for subsets of coefficients in logistic regression models to simply test the whole model for significance?
Yes.

Logistic regression is different from standard linear regression in that

a) The sampling distribution of the regression coefficient is approximate.
b) A large sample data is required for making accurate statistical inferences.
c) A normal sampling distribution is used instead of a t-distribution for statistical inference.
d) All of the above.
d) All of the above.

In logistic regression,

a) The hypothesis test for subsets of coefficients is a goodness of fit test.
b) The hypothesis test for subsets of coefficients is approximate; it relies on large sample size.
c) We can use the partial F test for testing whether a subset of coefficients are all zero.
d) None of the above
b) The hypothesis test for subsets of coefficients is approximate; it relies on large sample size.

T/F We can define residuals for residual analysis in logistic regression for binary data without replications.
False, with replications

T/F Both the pearson residual and the deviance residual in logistic regression residual analysis have a normal distribution.
True if the model is a good fit

What are the visual analytic methods for testing goodness of fit for a logistic regression model?
Plot of pearson residuals in QQplot or histogram will show if it is normal and thus a good fit.

Residuals vs predictors show linearity and independence assumptions.

Logit of success rate vs predictors checks linearity assumptions.

In a hypothesis test for goodness of fit for logistic regression, what is the null hypothesis?
The model fits well.

What are some approaches to take if the logistic regression model is deemed not a good fit?
1) add more predictors and/or transform the predicting variables to improve linearity.
2) identify outliers and fit with and without them
3) If the binomial distribution isn’t appropriate, correct for the overdispersion by using another function besides logit.

What is overdispersion?
The variability of the probability of the estimates is larger than would be implied by binomial random variables.

-correlation in the observed responses
-heterogeneity in the success probabilities that hasn’t been modeled.

What is simpsons paradox?
Reversal of association when looking at a marginal relationship versus a conditional relationship.

In logistic regression,

a) We can perform residual analysis for response data with or without replications.
b) Residuals are derived as the fitted values minus the observed responses.
c) The sampling distribution of the residual is approximately normal distribution if the model is a good fit.
d) All of the above.
c) The sampling distribution of the residual is approximately normal distribution if the model is a good fit.

Which one is correct?

a) We can evaluate the goodness of fit a model using the testing procedure of the overall regression.
b) In applying the deviance test for goodness of fit in logistic regression, we seek large p-values, that is, not reject the null hypothesis.
c) There is no error term in logistic regression and thus we cannot perform a goodness of fit assessment.
d) None of the above.
b) In applying the deviance test for goodness of fit in logistic regression, we seek large p-values, that is, not reject the null hypothesis.

Which is correct?

a) Prediction translates into classification of a future binary response in logistic regression.
b) In order to perform classification in logistic regression, we need to first define a classifier for the classification error rate.
c) One common approach to estimate the classification error is cross-validation.
d) All of the above.
d) All of the above.

Comparing cross-validation methods,

a) The random sampling approach is more computational efficient that leave-one-out cross validation.
b) In K-fold cross-validation, the larger K is, the higher the variability in the estimation of the classification error is.
c) Leave-one-out cross validation is a particular case of the random sampling cross-validation.
d) None of the above.
b) In K-fold cross-validation, the larger K is, the higher the variability in the estimation of the classification error is.

What is the canonical link function used in modeling data with a poisson distribution?
Log bruh

Does a poission regression model have constant variance?
No dawg

When the number of units is different accross the observed responses in a poisson distribution, what do you add to a model to account for this?
offset(log(exosure))

Poisson regression can be used:
To model count data.
To model rate response data.
To model response data with a Poisson distribution.
All of the above.
All of the above.

Which one is correct?

a) The standard normal regression, the logistic regression and the Poisson regression are all falling under the generalized linear model framework.
b) If we were to apply a standard normal regression to response data with a Poisson distribution, the constant variance assumption would not hold.
c) The link function for the Poisson regression is the log function.
d) All of the above.
d) All of the above.

In Poisson regression,

a) We model the log of the expected response variable not the expected log response variable.
b) We use the ordinary least squares to fit the model.
c) There is an error term.
d) None of the above.
a) We model the log of the expected response variable not the expected log response variable.

Which one is correct?

a) The estimated regression coefficients and their standard deviations are approximate not exact in Poisson regression.
b) We use the glm() R command to fit a Poisson linear regression.
c) The interpretation of the estimated regression coefficients is in terms of the ratio of the response rates.
d) All of the above.
d) All of the above.

Is inference for a poisson regression based on the normal distribution or the t-distribution?
normal

In Poisson regression,

a) We make inference using t-intervals for the regression coefficients.
b) Statistical inference relies on exact sampling distribution of the regression coefficients.
c) Statistical inference is reliable for small sample data.
d) None of the above.
d) None of the above.

Which one is correct?

a) We use a chi-square testing procedure to test whether a subset of regression coefficients are zero in Poisson regression.
b) The test for subsets of regression coefficients is a goodness of fit test.
c) The test for subsets of regression coefficients is reliable for small sample data in Poisson regression.
d) None of the above.
a) We use a chi-square testing procedure to test whether a subset of regression coefficients are zero in Poisson regression.

What are the assumptions for the poission regression model?
Linearity, Independence, and Variance assumption where this is the logit link function similar to logic regression (so not constant variance)

Residual analysis in Poisson regression can be used

a) To evaluate goodness of fit of the model.
b) To evaluate whether the relationship between the log of the expected response and the predicting variables is linear.
c) To evaluate whether the data are uncorrelated.
d) All of the above.
d) All of the above.

When we do not have a good fit in generalized linear models, it may be that

a) We need to transform some of the predicting variables or to include other variables.
b) The variability of the expected rate is higher than estimated.
c) There may be leverage point that need to explored further.
d) All of the above.
d) All of the above.

For multiple linear regression, how many degrees of freedom are there?
N minus P minus 1

(multiple linear regression) The objective of multiple linear regression is:
a) To predict future new responses
B) To model the association of explanatory variables to a response variable accounting for controlling factors.
C) To test hypothesis using statistical inference on the model.
D) All of the above
D) all of the above

(multiple linear regression) Which one is correct?
A) A multiple linear regression model with p predicting variables but no intercept has p model parameters.
B) The interpretation of the regression coefficients is the same whether or not interaction terms are included in the model.
C) Multiple linear regression is a general model encompassing both ANOVA and simple linear regression.
D) None of the above
C) Multiple linear regression is a general model encompassing both ANOVA and simple linear regression.

(multiple linear regression) Which one is correct?

A) The regression coefficients can be estimated only if the predicting variables are not linearly dependent.
B) The estimated regression coefficient 𝛽∧𝑖 is interpreted as the change in the response variable associated with one unit of change in the i-th predicting variable .
C) The estimated regression coefficients will be the same under marginal and conditional model, only their interpretation is not.
D) Causality is the same as association in interpreting the relationship between the response and the predicting variables.
A) The regression coefficients can be estimated only if the predicting variables are not linearly dependent.

(multiple linear regression) Which one correctly characterizes the sampling distribution of the estimated variance?

A) The estimated variance of the error term has a 𝜒2distribution regardless of the distribution assumption of the error terms.
B) The number of degrees of freedom for the 𝜒2 distribution of the estimated variance is n-p-1 for a model without intercept.
C) The sampling distribution of the mean squared error is different of that of the estimated variance.
D) None of the above
none of the above

(multiple linear regression) What is the model interpretation for Beta i (B hat i)
Beta hat i is the estimated expected change in the response variable associated with one unit of change in the i-th predicting variable holding fixed all other predictors in the model for all i = 1, …, p

What is the difference between a marginal versus conditional relationship.
Marginal: Simple linear regression captures the association of a predicting variable to the response variable marginally. i.e. without consideration of other factorsConditional: Multiple linear regression captures the association of a predicting variable to the response variable, conditional of other predicting variables in the model.Generally, the estimated regression coefficients for the conditional and marginal relationships can be different not only in magnitude but also in sign or direction of the relationship.

(multiple linear regression) What are the different roles of predicting variables?
Controlling – to control for bias selection in the sampleExplanatory – to explain variability in the response variablePredictive – to best predict variability in the response regardless of their explanatory power.

(multiple linear regression) What is the process for testing subsets of coefficients?
basically you first measure the sum of squares explained by using only X1 to predict Y, then you calculate the extra sum of squares explained by adding X2, then X3, etc. until you get to the last predictor.

(multiple linear regression) The sampling distribution of the estimated regression coefficients is

A) Centered at the true regression parameters.
B) The t-distribution assuming that the variance of the error term is unknown an replaced by its estimate.
C) Dependent on the design matrix
D) All of the above
D) All of the above

(multiple linear regression) The estimators for the regression coefficients are:

A) Biased but with small variance
B) Unbiased under normality assumptions but biased otherwise.
C) Biased regardless of the distribution of the data.
D) Unbiased regardless of the distribution of the data.
D) Unbiased regardless of the distribution of the data.

(multiple linear regression) We can test for a subset of regression coefficients

A) Using the F statistic test of the overall regression.
B) Only if we are interested whether additional explanatory variables should be considered in addition to the controlling variables.
C) To evaluate whether all regression coefficients corresponding to the predicting variables excluded from the reduced model are statistically significant.
D) None of the above
D) None of the above

(multiple linear regression) What are the sources of prediction uncertainty when using a regression line to predict future responses?

  1. Due to the new observations2. Due to parameter estimates of betas

T/F In multiple linear regression, we need the linearity assumption to hold for at least one of the predicting variables
FALSE

T/F Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients.
TRUE

T/F The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients.
TRUE

T/F When making a prediction for predicting variables on the “edge” of the space of predicting variables, then its uncertainty level is high.
TRUE

T/F The prediction of the response variable and the estimation of the mean response have the same interpretation.
FALSE

T/F In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors.
FALSE

T/F We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple linear regression model.
TRUE

T/F Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients.
TRUE

T/F The estimator of the mean response is unbiased.
TRUE

T/F The sampling distribution of the prediction of the response variable is a χ 2(chi-squared) distribution.
FALSE

T/F Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent.
FALSE

T/F A linear regression model has high predictive power if the coefficient of determination is close to 1.
TRUE

T/F In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases.
FALSE

T/F Cook’s distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed.
TRUE

T/F The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response.
FALSE

T/F The coefficient of variation is used to evaluate goodness-of-fit.
FALSE

T/F Influential points in multiple linear regression are outliers.
TRUE

T/F We could diagnose the normality assumption using the normal probability plot.
TRUE

T/F If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model.
FALSE

T/F If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation.
TRUE

T/F if the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables.
FALSE

T/F Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent.
TRUE

T/F In logistic regression, R2 could be used as a measure of explained variation in the response variable.
FALSE

T/F The interpretation of the regression coefficients is the same for both Logistic and Poisson regression.
FALSE

T/F We estimate the regression coefficients in Poisson regression using the maximum likelihood estimation approach.
TRUE

T/F The F-test can be used to test for the overall regression in Poisson regression.
FALSE

T/F A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn’t been modeled.
TRUE

T/F Trying all three link functions for a logistic regression model (C-ln-ln, probit, logit) will produce models with the same goodness of fit for a dataset.
FALSE

T/F A Poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected.
TRUE

T/F If a Poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might be not linear.
TRUE

T/F If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data.
False

T/F The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function.
True

T/F We interpret logistic regression coefficients with respect to the response variable.
False

T/F The log-likelihood function is a linear function with a closed-form solution.
False

T/F In logistic regression, there is not a linear relationship between the probability of success and the predicting variables.
True

T/F We can use a Z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model.
True

T/F The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.
False

T/F Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis.
True

T/F A goodness-of-fit test should always be conducted after fitting a logistic regression model without repetition.
False

T/F For a classification model, training error tends to underestimate the true classification error rate of the model.
True

T/F The binary response variable in logistic regression has a Bernoulli distribution.
True

T/F For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it is an indication that the model is a good fit.
True

T/F The error term in logistic regression has a normal distribution.
False

T/F The estimated regression coefficients in Poisson regression are approximate.
True

T/F In Poisson regression, there is a linear relationship between the log rate and the predicting variables.
True

T/F Under logistic regression, the sampling distribution used for a coefficient estimator is a chi-square distribution.
False

T/F An overdispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model.
True

T/F When testing a subset of coefficients, deviance follows a chi-square distribution with q degrees of freedom, where q is the number of regression coefficients in the reduced model.
False

T/F For both logistic and Poisson regression, both the Pearson and deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data.
True

T/F The logit link function is the best link function to model binary response data because the models produced always fit the data better than other link functions.
False

sample residuals
The ___ for multiple linear regression do not have constant variance.

error terms
The __ for multiple linear regression have constant variance.

normality
QQPlot and histogram are used to assess what in MLR?

linearity
Residuals vs predictor are used to predict what in MLR

constant variance and independence
Residuals vs fitted values are used to assess what in MLR

leverage points
Points that are far from the mean of the x’s are called

Influential points
Points that are far from the mean of the x’s and y’s are called

outliers
It is good practice to perform regression analysis with and without what?

Cook’s distance
What is used to quantify outliers?

Cook’s distance
How much all the values in the model change when the ith value is removed is known as what?

D_i > 4/n or D_i > 1 or Large D
Cooks distance that should be investigated.

R-Squared
The proportion of variability in Y than can be explained by the predictor variables.

Controlling factors
Model variables used to account for selection bias.

Indicator variable
Continuous variables are converted to __ when there is a distinct gap in a variable distribution.

n-p-1
The number of degrees of freedom for a T test for the statistical significance of a MLR coefficient?

At least one variable has explainitory power on the response variable.
When the regression model has a high F value/low p-value.

order matters
When testing subsets of coefficients using anova command.

If the variable is not very granular
When can “year” be used as a qualitative variable?

Pearson Chi-squared test
Used to evaluate the relationship between any two qualitative variables.

table
A command needed prior to running a pearson chi-squared test of qualitative variables.

Reduce the dummy variables into groups
What should you do when you have a high number of predicting variables due to a large number of categorical variables resulting in numerous dummy variables.

The first
Which category does R choose as the baseline label when creating dummy variables with as.factor()

No baseline for comparison
If you use a model without an intercept, how will interpreting coefficients be different?

independent
The statistical significance of a predicting variable in a marginal and conditional models are _ .

cross validation
No analysis of prediction is complete without evaluating the performance of the model using this technique.

less bias
Higher number of folds in k-fold cross validation means what?

large samples
For logistic regression, the statistical inference based on the normal distribution applies only under what?

Model is a good fit
In goodness of fit tests, what is the null hypothesis?

True
In MLR, the F test is used to evaluate the overall regression.

True
In MLR, the coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model.

False
Residual analysis is used to measure predictive value of a model.

False
In the presence of multicollinearity, the coefficient of variation decreases.

False
In the presence of multicollinearity, the regression coefficients will tend to be identified as statistically significant even if they are not.

False
In the presence of multicollinearity, the prediction will not be impacted.

True
If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption.

False
If the normality assumption does not hold, we transform the predictive variables, commonly using the Box-Cox transformation.

True
If the constant variance assumption does not hold, we transform the response variable.

False
The residuals have constant variance for the multiple linear regression model.

False
The residuals vs fitted can be used to assess the assumption of independence.

False
The residuals have a t-distribution distribution if the error term is assumed to have a normal distribution.

True
The logistic regression does not have an error term.

False
The logistic regression response variable is normally distributed.

True
The logistic regression model models that probability of a response and not the expectation of a response.

False
The logit function is the only link function used in logistic regression.

ln(p/1-p)
What is the formula for the logit function?

False
The interpretation of the regression coefficients in logistic regression is the same as for standard linear regression assuming normality.

maximum likelihood estimation
The estimation of the regression coefficients is based on what?

False
We can derive exact (close form expression) estimates for the regression coefficients.

False
In logistic regression, the estimations of the regression coefficients is based on minimizing the sum of least squares.

glm(…, family=”binomial”)
The function to perform logistic regression in R is what?

False
In R, the input of the response variable is exactly the same if the binary response data are with or without replications.

True
In R, we can obtain both the estimates and the standard deviations of the estimates for the regression coefficients for a logistic regression model.

True
In logistic regression, the sampling distribution of the regression coefficient is approximate.

True
Logistic regression requires large sample data for making accurate statistical inferences.

True
In logistic regression, a normal sampling distribution is used instead of a t-distribution for statistical inference.

False
In logistic regression, the hypothesis test for subsets of coefficients is a goodness of fit test.

True
In logistic regression, the hypothesis test for subsets of coefficients is approximate; it relies on large sample size.

False
In logistic regression, we can use the partial F test for testing whether a subset of coefficients are all zero.

Data with replications
What is required, in logistic regression, to perform residual analysis?

False
In logistic regression, residuals are derived as the fitted values minus the observed responses.

True
In MLR sampling distribution of the residual is approximately normal distribution if the model is a good fit.

False
In logistic regression, we can evaluate the goodness of fit a model using the testing procedure of the overall regression.

Large
In applying the deviance test for goodness of fit in logistic regression, we seek __ p-values.

The model is a good fit
What is the null hypothesis for a logistic regression goodness of fit test?

False
In logistic regression, because there is no error term, we cannot perform a goodness of fit assessment.

True
In logistic regression, prediction translates into classification of a future binary response in logistic regression.

True
In logistic regression, in order to perform classification in logistic regression, we need to first define a classifier for the classification error rate.

True
One common approach to estimate the classification error is cross-validation.

False
In logistic regression, the random sampling approach is more computational efficient that leave-one-out cross validation.

True
In K-fold cross-validation, the larger K is, the higher the variability in the estimation of the classification error is.

False
Leave-one-out cross validation is a particular case of the random sampling cross-validation.

Poisson regression
Count data can be modeled with what?

Poisson regression
Rate data can be modeled with what?

True
Poisson regression can be used to model response data with a Poisson distribution.

True
The standard normal regression, the logistic regression and the Poisson regression are all falling under the generalized linear model framework.

True
If we were to apply a standard normal regression to response data with a Poisson distribution, the constant variance assumption would not hold.

True
The link function for the Poisson regression is the log function.

True
We model the log of the expected response variable not the expected log response variable.

False
In Poisson regression, we use the ordinary least squares to fit the model.

False
There is an error term in Poisson regression.

True
The estimated regression coefficients and their standard deviations are approximate not exact in Poisson regression.

glm(… family=”poisson”)
We use the _ R command to fit a Poisson linear regression.

True
The interpretation of the estimated regression coefficients is in terms of the ratio of the response rates.

False
In logistic regression, we make inference using t-intervals for the regression coefficients.

False
Statistical inference relies on exact sampling distribution of the regression coefficients.

False
In poisson regression, statistical inference is reliable for small sample data.

chi-squared
We use a __ testing procedure to test whether a subset of regression coefficients are zero in Poisson regression.

False
In Poisson regression, the test for subsets of regression coefficients is a goodness of fit test.

False
In Poisson regression, the test for subsets of regression coefficients is reliable for small sample data in Poisson regression.

Residual analysis
To evaluate goodness of fit of a Poisson model, we use _.

True
Residual analysis for Poisson regression can be used to evaluate whether the relationship between the log of the expected response and the predicting variables is linear.

True
Residual analysis of Poisson models can be used to evaluate whether the data are uncorrelated.

True
We we don’t have goodness of fit in a Poisson model, we may need to transform some of the predicting variables or to include other variables.

True
If the variability of the expected rate is higher than estimated, the model might not be a good fit.

True
Leverage points may affect the goodness of fit of a Poisson model.

Type I errors
Small sample sizes can lead to what in logistic regression?

p (Number of predictors)
In an overall test of a logistic regression model, the distribution used is a chi-squared with how many degrees fo freedom?

q (Number of removed predictors)
A partial test of coefficients in a logistic regression model uses a chi-squared test with how many degrees of freedom?

Simpson’s Paradox
An example of _ is when there is a reversal of association when looking at the marginal vs conditional models.

Log odds
ln(p/(1-p) is referred to as what?

False
In multiple linear regression, we need the linearity assumption to hold for at least one of the predicting variables.

True
Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients

True
The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients.

True
When making a prediction for predicting variables on the “edge” of the space of predicting variables, then its uncertainty level is high.

False
The prediction of the response variable and the estimation of the mean response have the same interpretation.

True
In multiple linear regression, the VIF formula is (1/(1-R^2_(i))

True
We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple linear regression model.

True
Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients.

True
The estimator of the mean response is unbiased.

Chi-squared with n-p-1 degrees of freedom
In MLR, sigma hat squared is distributed

False
Multicollinearity in multiple linear regression means that the rows in the design matrix are nearly linearly dependent.

True
A linear regression model has high predictive power if the coefficient of determination is close to 1.

False
In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means that the response variable will decrease as this predicting variable increases.

True
Cooks distance measures how much the fitted values (response) in the MLR model change when the i-th observation is removed.

False
The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response.

False
The coefficient of variation is used to evaluate goodness of fit

True
Influential points in multiple linear regression are outliers.

True
We could diagnose the normality assumption using the normal probability plot.

False
If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model.

True
If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the box-cox transformation.

Leave a Comment

Scroll to Top