HCAD 750 EXAM 5; Louisiana State University/Questions & Answers

the process of finding correlations or patterns among the data

facilitates data exploration
extract useful knowledge hidden in data
data mining

using patient data for any purpose beyond providing care for the individual patient brings with it some tricky issues regarding privacy, and keeping the information from falling into the wrong hands. There are significant legal issues related to the use of patient data in data mining efforts, specifically related to the de-identification, aggregation, and storage of the data. Failing to take the appropriate steps when using personal health data as a tool for population health could lead to serious consequences
HIPPA in relation to Data mining

-perform induction on the current data in order to make predictions.
Predictive Data Mining

-ability for a device, machine, etc. to be able to take in numerous types of data and learn from the data in order to produce knowledge.
Meta-learning

investigates how computers can learn based on data
automatically learn to recognize complex patterns and make intelligent decisions on their own based on the data
Machine Learning

refers to the process of reducing the inputs for processing and analysis, or finding the most meaningful inputs.
Feature Selection

be applied to obtain a compressed representation of the data set that is much smaller in volume, yet maintains the integrity of the original data.
used when the data selected is too complex or huge
Data reduction

to request or seek out additional information on a specific subject. Makes the data more detailed.
drill down

is an ensemble of models combined sequentially.
can be used to classify data
get a meta-learning device, stack the data in the device, the base learner is combined and produces the data information needed.
Stacking
each of the data classifications are weighted.
once the system learns, it is able to continuously update and learn which ones are incorrect, and the weight shifts to reflect the accuracy
Boosting
method used to increase accuracy with data mining
majority vote; more times a classification is picked, the more reliable the data.
algorithm creates an ensemble of models for learning scheme where each model gives an equally weighted prediction
Bagging (Bootstrap Aggregating)

DMAIC steps: define, measure, analyze, improve, and control

can explain why data behaves a certain way
not necessarily a data mining technique, but a model used to give more of answer to “why” and “how” in regard to data information.
adds additional steps to mining that yields better results
Six Sigma

is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big Data

how we make sense of the data by converting them from their raw form to a more informative one

sometimes known as model building or pattern id
yields a highly predictive, consistent pattern identifying model
-pattern discovery is a complex phase of data mining
Exploratory data analysis (EDA)

due to a need for standardized data mining techniques, this concept and tool was developed.
Sample – selecting the data
Explore – looking for the relationship between variables in data
Modify – methods to select, create, and transform variables in preparation for data modeling
Model – applying various modeling techniques to gain the desired outcome
Assess – looks for reliability and usefulness
SEMMA

Cross Industry Standard Process for Data Mining
six steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment
most projects move back and forth between steps as necessary
·CRISP-DM

“this data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals.”
big data

producing a solution that generates useful forecasting:

problem identification
exploration of the data
pattern discovery
knowledge deployment – application to new data to forecast predictions
4 phases of data mining

transform the repositories of big data into comprehensible knowledge that is useful for guiding their practice and facilitating interdisciplinary research
Knowledge Discovery and Research

data mining method for analyzing outcomes and service use
used to classify and predict an outcome
Classification and Regression Trees (CART)

enhance business aspects
help to improve patient care
Benifits of KDD
dependent on the use of private health information
insure data is de-identified and confidentiality maintained
follow changes and specific requirements for compliance with HIPPA laws
ethics of data mining

thoughtful, planned activity that expands or refines knowledge. the purpose of research is to create generalized knowledge.
research

manipulation of treatment
random assignment to the group
difference between quasi-experimental research and experimental research

the statistical analysis of a large collection of results from individual studies for the purpose of integrating findings
the integrative analysis of findings from many studies that examined the same question
meta-analysis
set of connected input/output units i which each connection has a weight associated with it
AKA connectionist learning – connection between units
neural networks

a flowchart-like structure and a decision support tool that uses a model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

Consists of three types of nodes: decision nodes, chance nodes, end nodes
decision trees

identifies patterns from if/then statements. Statistical significance tests are used on the data
Rule Induction

a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
Algorithm

classifiers that use distance based comparisons that intrinsically assign equal weight to each attribute
nearest neighbor

discovery by computer of new, previously unknown information, by automatically extracting information from written resources
text mining

A method of querying and reporting that takes data from standard relational databases, calculates and summarizes the data, and then stores the data in a special database called a data cube.
Online Analytical Processing (OLAP)

select on-screen specific data points and identify their characteristics or to examine their effects on relations between variables
-used during EDA
brushing

analysis of original research data by the researchers who collected them
primary analysis

the process of detecting, diagnosing, and editing faulty data
data cleansing

refers to the ability to access and extract data from any data source

access to data depends on the type of data and their location
can range from totally uncontrolled to highly protected
Data Access

pertains to the handling and maintenance of data so that the data are not divulged to others without the research participants permission
data confidentiality

the analysis of the original work of another person or organization
secondary analysis

pertains to data that have no identifiers linked to them and cannot be traced back to the research participant
data anonymity

US federal policy that specifies ethics regulations for human subjects research
Common Rule

data mining
the process of analyzing data to extract information not offered by the raw data alone

facilitates data exploration
looks at the data from different vantage points
brings new insights to the data set

HIPPA in relation to Data mining
using patient data for any purpose beyond providing care for the individual patient brings with it some tricky issues regarding privacy, and keeping the information from falling into the wrong hands. There are significant legal issues related to the use of patient data in data mining efforts, specifically related to the de-identification, aggregation, and storage of the data. Failing to take the appropriate steps when using personal health data as a tool for population health could lead to serious consequences

Predictive Data Mining
is data mining that is done for the purpose of using business intelligence or other data to forecast or predict trends. This type of data mining can help business leaders make better decisions and can add value to the efforts of the analytics team.

Meta-learning
A subfield of machine learning where automatic learning algorithms are applied to Metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however, the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems

Machine Learning
is a branch of artificial intelligence devoted to guiding robots in their understanding of human behavior. Scientists and engineers hope machine learning will eventually help machines make unguided choices by independently interpreting input from the world around them.

Feature Selection
refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data.

Data reduction
be applied to obtain a compressed representation of the data set that is much smaller in volume, yet maintains the integrity of the original data.

drill down
to request — or seek out — additional information on a specific subject. In a GUI-environment, drilling down may involve clicking on a link or other representation to reveal more detail

Stacking
is an ensemble of models combined sequentially. It uses a “meta learner” (not voting) to combine the predictions of “base learners.” The base learners (the expert) are not combined by voting but by using a meta-learner, another learner scheme that combines the output of the base learners.

Boosting
refers to a family of algorithms which converts weak learner to strong learners. It is an ensemble method for improving the model predictions of any given learning algorithm. The idea of boosting is to train weak learners sequentially, each trying to correct its predecessor

Bagging (Bootstrap Aggregating)
is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting.

Six Sigma
A disciplined, data-driven approach and methodology for eliminating defects (driving toward six standard deviations between the mean and the nearest specification limit) in any process – from manufacturing to transactional and from product to service.

Big Data
is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Exploratory data analysis (EDA)
how we make sense of the data by converting them from their raw form to a more informative one

sometimes known as model building or pattern id
yields a highly predictive, consistent pattern identifying model

SEMMA
An alternative process for data mining projects proposed by the SAS Institute. stands for “sample, explore, modify, model, and assess.”

·CRISP-DM
Cross Industry Standard Process for Data Mining
most comprehensive, common, and standardized data mining process

big data
“this data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals.”

4 phases of data mining
producing a solution that generates useful forecasting:

problem identification
exploration of the data
pattern discovery
knowledge deployment – application to new data to forecast predictions

Knowledge Discovery and Research
transform the repositories of big data into comprehensible knowledge that is useful for guiding their practice and facilitating interdisciplinary research

Classification and Regression Trees (CART)
data mining method for analyzing outcomes and service use

Benifits of KDD

enhance business aspects
help to improve patient care

ethics of data mining

dependent on the use of private health information
insure data is de-identified and confidentiality maintained
follow changes and specific requirements for compliance with HIPPA laws

research
thoughtful, planned activity that expands or refines knowledge. the purpose of research is to create generalized knowledge.

characteristics of theory

simplify the situation
explain the most facts in the broadest range of circumstances
accurately predict behavior

advantages of models

portray theories with objects, smaller scaled versions, or graphic representations.
aid in comprehension of a theory
includes all of a theory’s known properties

inductive reasoning
involves drawing conclusions based on a limited number of observations

deductive reasoning
involves drawing conclusions based on generalizations

7 basic steps of research

defining the problem
performing a literature review
determining a research method
selecting an instrument
gathering data
analyzing the data
presenting results

5 characteristics of a well developed research question

clearly and exactly stated
has theoretical significance
has obvious links to a larger body of knowledge
results advance knowledge in a definable way
answer to the question is worthwhile

3 sources of meaningful research questions

research models-show all the factors and relationships in a theory
recommendations of earlier researchers
gaps in the body of knowledge

historical research
understand past events

case study
bibliography

descriptive research
describe current status

survey
observation

correlational research
determine existence and degree of relationship

survey
secondary analysis

evaluation research
evaluate effectiveness

case study

experimental research
establish cause and effect. key defining characteristic is control.

clinical trial
pre test & post test control group method

casual comparative research
detect casual relationship

one shot case study
static group comparison
nonparticipant observation

what determines a researchers choice of research design?
depends on the purpose of the research

independent variable
factors that researchers manipulate directly

dependent variable
factors that are measured variables which depend on independent variables

difference between quasi-experimental research and experimental research

manipulation of treatment
random assignment to the group

Related Posts

Leave a Comment Cancel Reply