HCAD 750 EXAM 5; Louisiana State University/Questions & Answers

the process of finding correlations or patterns among the data

  • facilitates data exploration
  • extract useful knowledge hidden in data
    data mining

using patient data for any purpose beyond providing care for the individual patient brings with it some tricky issues regarding privacy, and keeping the information from falling into the wrong hands. There are significant legal issues related to the use of patient data in data mining efforts, specifically related to the de-identification, aggregation, and storage of the data. Failing to take the appropriate steps when using personal health data as a tool for population health could lead to serious consequences
HIPPA in relation to Data mining

-perform induction on the current data in order to make predictions.
Predictive Data Mining

-ability for a device, machine, etc. to be able to take in numerous types of data and learn from the data in order to produce knowledge.
Meta-learning

  • investigates how computers can learn based on data
  • automatically learn to recognize complex patterns and make intelligent decisions on their own based on the data
    Machine Learning

refers to the process of reducing the inputs for processing and analysis, or finding the most meaningful inputs.
Feature Selection

  • be applied to obtain a compressed representation of the data set that is much smaller in volume, yet maintains the integrity of the original data.
  • used when the data selected is too complex or huge
    Data reduction

to request or seek out additional information on a specific subject. Makes the data more detailed.
drill down

  • is an ensemble of models combined sequentially.
  • can be used to classify data
  • get a meta-learning device, stack the data in the device, the base learner is combined and produces the data information needed.
    Stacking
  • each of the data classifications are weighted.
  • once the system learns, it is able to continuously update and learn which ones are incorrect, and the weight shifts to reflect the accuracy
    Boosting
  • method used to increase accuracy with data mining
  • majority vote; more times a classification is picked, the more reliable the data.
  • algorithm creates an ensemble of models for learning scheme where each model gives an equally weighted prediction
    Bagging (Bootstrap Aggregating)

DMAIC steps: define, measure, analyze, improve, and control

  • can explain why data behaves a certain way
  • not necessarily a data mining technique, but a model used to give more of answer to “why” and “how” in regard to data information.
  • adds additional steps to mining that yields better results
    Six Sigma

is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big Data

how we make sense of the data by converting them from their raw form to a more informative one

  • sometimes known as model building or pattern id
  • yields a highly predictive, consistent pattern identifying model
    -pattern discovery is a complex phase of data mining
    Exploratory data analysis (EDA)

due to a need for standardized data mining techniques, this concept and tool was developed.
Sample – selecting the data
Explore – looking for the relationship between variables in data
Modify – methods to select, create, and transform variables in preparation for data modeling
Model – applying various modeling techniques to gain the desired outcome
Assess – looks for reliability and usefulness
SEMMA

Cross Industry Standard Process for Data Mining
six steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment
most projects move back and forth between steps as necessary
·CRISP-DM

“this data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals.”
big data

producing a solution that generates useful forecasting:

  1. problem identification
  2. exploration of the data
  3. pattern discovery
  4. knowledge deployment – application to new data to forecast predictions
    4 phases of data mining

transform the repositories of big data into comprehensible knowledge that is useful for guiding their practice and facilitating interdisciplinary research
Knowledge Discovery and Research

  • data mining method for analyzing outcomes and service use
  • used to classify and predict an outcome
    Classification and Regression Trees (CART)
  1. enhance business aspects
  2. help to improve patient care
    Benifits of KDD
  3. dependent on the use of private health information
  4. insure data is de-identified and confidentiality maintained
  5. follow changes and specific requirements for compliance with HIPPA laws
    ethics of data mining

thoughtful, planned activity that expands or refines knowledge. the purpose of research is to create generalized knowledge.
research

  1. manipulation of treatment
  2. random assignment to the group
    difference between quasi-experimental research and experimental research
  • the statistical analysis of a large collection of results from individual studies for the purpose of integrating findings
  • the integrative analysis of findings from many studies that examined the same question
    meta-analysis
  • set of connected input/output units i which each connection has a weight associated with it
    AKA connectionist learning – connection between units
    neural networks

a flowchart-like structure and a decision support tool that uses a model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

Consists of three types of nodes: decision nodes, chance nodes, end nodes
decision trees

identifies patterns from if/then statements. Statistical significance tests are used on the data
Rule Induction

a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
Algorithm

classifiers that use distance based comparisons that intrinsically assign equal weight to each attribute
nearest neighbor

discovery by computer of new, previously unknown information, by automatically extracting information from written resources
text mining

A method of querying and reporting that takes data from standard relational databases, calculates and summarizes the data, and then stores the data in a special database called a data cube.
Online Analytical Processing (OLAP)

select on-screen specific data points and identify their characteristics or to examine their effects on relations between variables
-used during EDA
brushing

analysis of original research data by the researchers who collected them
primary analysis

the process of detecting, diagnosing, and editing faulty data
data cleansing

refers to the ability to access and extract data from any data source

  • access to data depends on the type of data and their location
  • can range from totally uncontrolled to highly protected
    Data Access

pertains to the handling and maintenance of data so that the data are not divulged to others without the research participants permission
data confidentiality

the analysis of the original work of another person or organization
secondary analysis

pertains to data that have no identifiers linked to them and cannot be traced back to the research participant
data anonymity

US federal policy that specifies ethics regulations for human subjects research
Common Rule

data mining
the process of analyzing data to extract information not offered by the raw data alone

  • facilitates data exploration
  • looks at the data from different vantage points
  • brings new insights to the data set

HIPPA in relation to Data mining
using patient data for any purpose beyond providing care for the individual patient brings with it some tricky issues regarding privacy, and keeping the information from falling into the wrong hands. There are significant legal issues related to the use of patient data in data mining efforts, specifically related to the de-identification, aggregation, and storage of the data. Failing to take the appropriate steps when using personal health data as a tool for population health could lead to serious consequences

Predictive Data Mining
is data mining that is done for the purpose of using business intelligence or other data to forecast or predict trends. This type of data mining can help business leaders make better decisions and can add value to the efforts of the analytics team.

Meta-learning
A subfield of machine learning where automatic learning algorithms are applied to Metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however, the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems

Machine Learning
is a branch of artificial intelligence devoted to guiding robots in their understanding of human behavior. Scientists and engineers hope machine learning will eventually help machines make unguided choices by independently interpreting input from the world around them.

Feature Selection
refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data.

Data reduction
be applied to obtain a compressed representation of the data set that is much smaller in volume, yet maintains the integrity of the original data.

drill down
to request — or seek out — additional information on a specific subject. In a GUI-environment, drilling down may involve clicking on a link or other representation to reveal more detail

Stacking
is an ensemble of models combined sequentially. It uses a “meta learner” (not voting) to combine the predictions of “base learners.” The base learners (the expert) are not combined by voting but by using a meta-learner, another learner scheme that combines the output of the base learners.

Boosting
refers to a family of algorithms which converts weak learner to strong learners. It is an ensemble method for improving the model predictions of any given learning algorithm. The idea of boosting is to train weak learners sequentially, each trying to correct its predecessor

Bagging (Bootstrap Aggregating)
is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting.

Six Sigma
A disciplined, data-driven approach and methodology for eliminating defects (driving toward six standard deviations between the mean and the nearest specification limit) in any process – from manufacturing to transactional and from product to service.

Big Data
is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Exploratory data analysis (EDA)
how we make sense of the data by converting them from their raw form to a more informative one

  • sometimes known as model building or pattern id
  • yields a highly predictive, consistent pattern identifying model

SEMMA
An alternative process for data mining projects proposed by the SAS Institute. stands for “sample, explore, modify, model, and assess.”

·CRISP-DM
Cross Industry Standard Process for Data Mining
most comprehensive, common, and standardized data mining process

big data
“this data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals.”

4 phases of data mining
producing a solution that generates useful forecasting:

  1. problem identification
  2. exploration of the data
  3. pattern discovery
  4. knowledge deployment – application to new data to forecast predictions

Knowledge Discovery and Research
transform the repositories of big data into comprehensible knowledge that is useful for guiding their practice and facilitating interdisciplinary research

Classification and Regression Trees (CART)
data mining method for analyzing outcomes and service use

Benifits of KDD

  1. enhance business aspects
  2. help to improve patient care

ethics of data mining

  1. dependent on the use of private health information
  2. insure data is de-identified and confidentiality maintained
  3. follow changes and specific requirements for compliance with HIPPA laws

research
thoughtful, planned activity that expands or refines knowledge. the purpose of research is to create generalized knowledge.

characteristics of theory

  1. simplify the situation
  2. explain the most facts in the broadest range of circumstances
  3. accurately predict behavior

advantages of models

  1. portray theories with objects, smaller scaled versions, or graphic representations.
  2. aid in comprehension of a theory
  3. includes all of a theory’s known properties

inductive reasoning
involves drawing conclusions based on a limited number of observations

deductive reasoning
involves drawing conclusions based on generalizations

7 basic steps of research

  1. defining the problem
  2. performing a literature review
  3. determining a research method
  4. selecting an instrument
  5. gathering data
  6. analyzing the data
  7. presenting results

5 characteristics of a well developed research question

  1. clearly and exactly stated
  2. has theoretical significance
  3. has obvious links to a larger body of knowledge
  4. results advance knowledge in a definable way
  5. answer to the question is worthwhile

3 sources of meaningful research questions

  1. research models-show all the factors and relationships in a theory
  2. recommendations of earlier researchers
  3. gaps in the body of knowledge

historical research
understand past events

  • case study
  • bibliography

descriptive research
describe current status

  • survey
  • observation

correlational research
determine existence and degree of relationship

  • survey
  • secondary analysis

evaluation research
evaluate effectiveness

  • case study

experimental research
establish cause and effect. key defining characteristic is control.

  • clinical trial
  • pre test & post test control group method

casual comparative research
detect casual relationship

  • one shot case study
  • static group comparison
  • nonparticipant observation

what determines a researchers choice of research design?
depends on the purpose of the research

independent variable
factors that researchers manipulate directly

dependent variable
factors that are measured variables which depend on independent variables

difference between quasi-experimental research and experimental research

  1. manipulation of treatment
  2. random assignment to the group

Leave a Comment

Scroll to Top