All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

t

Tags

Transcript

1.
1.) True positives 2.) True negatives 3.) False positives 4.) False negatives
True positives (TP): These are cases in which we predicted yes and the category is yes. True negatives (TN): These are cases in which we predicted no and the category is no. False positives (FP): These are cases in which we predicted yes and the category is no. False negatives (FN): These are cases in which we predicted no and the category is yes.
2.
3 V's
1.) Volume: terabytes and up. 2.) Velocity: from streaming data 3.) Variety: numeric, video, sensor, unstructured text...
3.
Basic PerformanceMeasure -Classification
Classification Accuracy (1- Error Rate) - Proportion of test cases classified correctly
4.
Basic PerformanceMeasure -Regression
Root Mean Square Error (error = Prediction - Actual)
5.
Baye's Rule
P(b | a) = P(a | b) P(b) / P(a) - Follows from Product Rule - Allows us to reason about causes when we have observed effect
6.
Big Data
Data sets of scale and complexity such that they can be difficult to process using current standard methods -Standard DB tools & data management apps - Moving target 3V's - Volume, Velocity, Variety
7.
ConditionalProbability:
Conditional Probability: - Also known as posterior probability - Probability is conditioned by other evidence - P(cavity | toothache) = 0.8 Prob. of cavity is 0.8, given that all you know is you have toothache
8.
Confusion Matrix
a table that is often used to describe the performance of a classification model (or classifier ) on a set of testdata for which the true values are known
9.
Curse ofDimensionality
The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensionaldata. First, it's very easy to overfit the the training data, since we can have a lot of assumptions that describe thetarget label (in case of supervised learning). In other words we can easily express the target using thedimensions that we have. Second,we may need to increase the number of training data exponentially, to overcome the curse ofdimensionality and that may not be feasible. Third, in ML learning algorithms that depends on the distance, like k-means for clustering or k nearestneighbors, everything can become far from each others and it's difficult to interpret the distance between thedata points.
10.
Data Mining -
Extract interesting knowledge from large unstructured data-sets * non-obvious, comprehensible, meaningful, useful
Machine Learning & Data Mining
Study online at
quizlet.com/_40q9sb
11.
Eager learning
When given training data, construct model for future use in prediction that summarises the data - Analogy: compilation in programming language - Slow in model construction, quicker in subsequent use - Model itself may be useful/informative
12.
Entropy
measure of uncertainty of a random variable (acquisition of information corresponds to a reduction ofentropy)
13.
Explain the DataMining process
crisp-dm - Problem Definition - Data Exploration - Data Preparation - Modelling - Evaluation - Deployment Cross Industry Standard Process for Data Mining
14.
Explain whatSupervised learning is
Refers to the fact that we gave the algorithm a data set in which the right answers were given Regression - Predict continuous valued output Classification: Discrete valued output (0 or 1)
15.
Gradient Descent
is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descentstarts with an initial set of parameter values and iteratively moves toward a set of parameter values thatminimize the function. This iterative minimization is achieved using calculus, taking steps in the negativedirection of the function gradient.
16.
Inductive Learning ofa Decision Tree
Step 1 - For all the attributes that have not yet been used in the tree, calculate their entropy and informationgain values for the training samples Step 2 - Select attribute that has the highest information Step 3 - Make a tree node containing that attribute Step 4 - This node partitions the data: apply the algorithm recursively to each partition
17.
Information Gain
of an attribute in Entropy from partitioning the data according to that attribute
18.
In order to solve agiven problem ofsupervised learning,one has to performthe following steps:
Determine the type of training examples. Before doing anything else, the user should decide what kind ofdata is to be used as a training set. In the case of handwriting analysis, for example, this might be a singlehandwritten character, an entire handwritten word, or an entire line of handwriting. Gather a training set. The training set needs to be representative of the real-world use of the function. Thus,a set of input objects is gathered and corresponding outputs are also gathered, either from human expertsor from measurements. Determine the input feature representation of the learned function. The accuracy of the learned functiondepends strongly on how the input object is represented. Typically, the input object is transformed into afeature vector, which contains a number of features that are descriptive of the object. The number offeatures should not be too large, because of the curse of dimensionality; but should contain enoughinformation to accurately predict the output. Determine the structure of the learned function and corresponding learning algorithm. For example, theengineer may choose to use support vector machines or decision trees. Complete the design. Run the learning algorithm on the gathered training set. Some supervised learningalgorithms require the user to determine certain control parameters. These parameters may be adjusted byoptimizing performance on a subset (called a validation set) of the training set, or via cross-validation. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance ofthe resulting function should be measured on a test set that is separate from the training set.
19.
Key insights tokNN
- Each sample can be considered to be a point in sample space - if two samples are close to each other in space, they should be close to each other in their target values
20.
Lazy Learning
No explicit model constructed - Calculations deferred until new case to be classified
21.
lazy learning
...
22.
Logisticregression
predicts probabilities, and is therefore a regression algorithm. However, it is commonly described as a classificationmethod in the machine learning literature, because it can be (and is often) used to make classifiers. There are also true classification algorithms, such as SVM, which only predict an outcome and do not provide a probability.
23.
Main symptomof over fitting
Much better performance on the training data than on independent test data
24.
ML task entail:Classification
figure this out
25.
ML task entail:clustering
figure out
26.
ML task entail:regression
figure out
27.
Noise
Imprecise or incorrect attribute values or labels - Can't always quantify it, but should know from situation if it is present - E.g. labels may require subjective judgement or values may come from imprecise measurements
28.
Pre-processing
is the initial manipulation of your data for your learner
29.
Q learning
a type of Reinforcement Learning that minimizes behaviour of a system through trial and error - Updates its policy (state-action mapping) based on a reward
30.
ReceiverOperatingCharacteristic
ROC Curve is a graphical plot that illustrates the diagnostic ability of a binary classifier - The Roc Curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at variousthreshold settings
31.
Regressionmodels
predict a continuous variable, such as rainfall amount or sunlight intensity. They can also predict probabilities, suchas the probability that an image contains a cat. A probability-predicting regression model can be used as part of aclassifier by imposing a decision rule - for example, if the probability is 50% or more, decide it's a cat.
32.
reinforcementlearning
is the process of learning with an environment through positive feedback
33.
Relationship ofData Mining toMachineLearning
figure out
34.
Sources ofuncertainty:
- Incomplete knowledge: lack of relevant facts, partial observations, inaccurate measurements, incomplete domaintheory - Inability to process: too complex to use all possible relevant data in computations, or to consider all possibleexceptions and qualifications
35.
SupervisedLearning: Task Def
Given examples, return function h (hypothesis) that approximates some 'true' function f that (hypothetically)generated the labels for the examples
36.
Test sets
used to evaluate/compare hypothesis
37.
Text Classification
where a document is classified into one or more existing classes. Typically words are used asfeatures (attributes). Issues: 1.) Lots of attributes 2.) Large number of rarely used attributes such as neologisms or antiquated words 3.) Large # of frequently used words that contain no useful info 4.) All these can be mitigated with pre-processing
38.
Training Set Quality - MAR
When missing data is not random but can be totally related to a variable where there is completeinformation Example - Men not reporting depression
39.
Training Set Quality - MCAR
The presence/absence of data is completely independent of observable variables
40.
Training Set Quality - MNAR
When the missing values are neither MCAR nor MAR. People w/ depression not reporting it.
41.
Training sets
used to construct hypotheses
42.
Validation set
Randomly sample a validation set and hide it Finally unlock the drawer with the validation set; evaluate (objectively) using it: publish these results
43.
What is Machine Learning?
Field of study that gives computers the ability to learn without being explicitly programmed - Samuel, 1959 Learning is changing behavior in a way that makes performance better in the future - Witten & Frank 1999 Improvement with experience at some task and A well-defined ML problem: - improve over task T - w/ regards to performance measure p - based on experience E ...Mitchell, 1997
44.
Wolpert's No Free Lunch theorem
There are no hard-and-fast rules for which algorithm will work well for your data - Different algorithms make different assumptions, that are either well suited or poorly suited to theparticular dataset

Related Search

Next Document

Related Documents

Sep 20, 2017

Sep 20, 2017

Sep 20, 2017

Sep 20, 2017

Sep 20, 2017

Sep 20, 2017

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks