l

Publish in

Documents

6 views

Please download to get full document.

View again

of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
t
Tags
Transcript
  1. 1.) True positives  2.) True negatives 3.) False positives 4.) False negatives True positives (TP): These are cases in which we predicted yes and the category is yes. True negatives (TN): These are cases in which we predicted no and the category is no. False positives (FP): These are cases in which we predicted yes and the category is no. False negatives (FN): These are cases in which we predicted no and the category is yes. 2. 3 V's 1.) Volume: terabytes and up. 2.) Velocity: from streaming data 3.) Variety: numeric, video, sensor, unstructured text... 3. Basic PerformanceMeasure -Classification Classification Accuracy (1- Error Rate) - Proportion of test cases classified correctly 4. Basic PerformanceMeasure -Regression Root Mean Square Error (error = Prediction - Actual) 5. Baye's Rule P(b | a) = P(a | b) P(b) / P(a) - Follows from Product Rule - Allows us to reason about causes when we have observed effect 6. Big Data Data sets of scale and complexity such that they can be difficult to process using current standard methods -Standard DB tools & data management apps - Moving target 3V's - Volume, Velocity, Variety 7. ConditionalProbability: Conditional Probability: - Also known as posterior probability - Probability is conditioned by other evidence - P(cavity | toothache) = 0.8 Prob. of cavity is 0.8, given that all you know is you have toothache 8. Confusion Matrix a table that is often used to describe the performance of a classification model (or classifier ) on a set of testdata for which the true values are known 9. Curse ofDimensionality The curse of dimensionality refers to how certain learning algorithms may perform poorly in high-dimensionaldata. First, it's very easy to overfit the the training data, since we can have a lot of assumptions that describe thetarget label (in case of supervised learning). In other words we can easily express the target using thedimensions that we have. Second,we may need to increase the number of training data exponentially, to overcome the curse ofdimensionality and that may not be feasible. Third, in ML learning algorithms that depends on the distance, like k-means for clustering or k nearestneighbors, everything can become far from each others and it's difficult to interpret the distance between thedata points. 10. Data Mining - Extract interesting knowledge from large unstructured data-sets * non-obvious, comprehensible, meaningful, useful Machine Learning & Data Mining Study online at quizlet.com/_40q9sb  11. Eager learning When given training data, construct model for future use in prediction that summarises the data - Analogy: compilation in programming language - Slow in model construction, quicker in subsequent use - Model itself may be useful/informative 12. Entropy measure of uncertainty of a random variable (acquisition of information corresponds to a reduction ofentropy) 13. Explain the DataMining process crisp-dm - Problem Definition - Data Exploration - Data Preparation - Modelling - Evaluation - Deployment Cross Industry Standard Process for Data Mining 14. Explain whatSupervised learning is Refers to the fact that we gave the algorithm a data set in which the right answers were given Regression - Predict continuous valued output Classification: Discrete valued output (0 or 1) 15. Gradient Descent is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descentstarts with an initial set of parameter values and iteratively moves toward a set of parameter values thatminimize the function. This iterative minimization is achieved using calculus, taking steps in the negativedirection of the function gradient. 16. Inductive Learning ofa Decision Tree Step 1 - For all the attributes that have not yet been used in the tree, calculate their entropy and informationgain values for the training samples Step 2 - Select attribute that has the highest information Step 3 - Make a tree node containing that attribute Step 4 - This node partitions the data: apply the algorithm recursively to each partition 17. Information Gain of an attribute in Entropy from partitioning the data according to that attribute 18. In order to solve agiven problem ofsupervised learning,one has to performthe following steps: Determine the type of training examples. Before doing anything else, the user should decide what kind ofdata is to be used as a training set. In the case of handwriting analysis, for example, this might be a singlehandwritten character, an entire handwritten word, or an entire line of handwriting. Gather a training set. The training set needs to be representative of the real-world use of the function. Thus,a set of input objects is gathered and corresponding outputs are also gathered, either from human expertsor from measurements. Determine the input feature representation of the learned function. The accuracy of the learned functiondepends strongly on how the input object is represented. Typically, the input object is transformed into afeature vector, which contains a number of features that are descriptive of the object. The number offeatures should not be too large, because of the curse of dimensionality; but should contain enoughinformation to accurately predict the output. Determine the structure of the learned function and corresponding learning algorithm. For example, theengineer may choose to use support vector machines or decision trees. Complete the design. Run the learning algorithm on the gathered training set. Some supervised learningalgorithms require the user to determine certain control parameters. These parameters may be adjusted byoptimizing performance on a subset (called a validation set) of the training set, or via cross-validation. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance ofthe resulting function should be measured on a test set that is separate from the training set.  19. Key insights tokNN - Each sample can be considered to be a point in sample space - if two samples are close to each other in space, they should be close to each other in their target values 20. Lazy Learning No explicit model constructed - Calculations deferred until new case to be classified 21. lazy learning ... 22. Logisticregression predicts probabilities, and is therefore a regression algorithm. However, it is commonly described as a classificationmethod in the machine learning literature, because it can be (and is often) used to make classifiers. There are also true classification algorithms, such as SVM, which only predict an outcome and do not provide a probability. 23. Main symptomof over fitting Much better performance on the training data than on independent test data 24. ML task entail:Classification figure this out 25. ML task entail:clustering figure out 26. ML task entail:regression figure out 27. Noise Imprecise or incorrect attribute values or labels - Can't always quantify it, but should know from situation if it is present - E.g. labels may require subjective judgement or values may come from imprecise measurements 28. Pre-processing is the initial manipulation of your data for your learner 29. Q learning a type of Reinforcement Learning that minimizes behaviour of a system through trial and error - Updates its policy (state-action mapping) based on a reward 30. ReceiverOperatingCharacteristic ROC Curve is a graphical plot that illustrates the diagnostic ability of a binary classifier - The Roc Curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at variousthreshold settings 31. Regressionmodels predict a continuous variable, such as rainfall amount or sunlight intensity. They can also predict probabilities, suchas the probability that an image contains a cat. A probability-predicting regression model can be used as part of aclassifier by imposing a decision rule - for example, if the probability is 50% or more, decide it's a cat. 32. reinforcementlearning is the process of learning with an environment through positive feedback 33. Relationship ofData Mining toMachineLearning figure out 34. Sources ofuncertainty: - Incomplete knowledge: lack of relevant facts, partial observations, inaccurate measurements, incomplete domaintheory - Inability to process: too complex to use all possible relevant data in computations, or to consider all possibleexceptions and qualifications 35. SupervisedLearning: Task Def Given examples, return function h (hypothesis) that approximates some 'true' function f that (hypothetically)generated the labels for the examples 36. Test sets used to evaluate/compare hypothesis  37. Text Classification where a document is classified into one or more existing classes. Typically words are used asfeatures (attributes). Issues: 1.) Lots of attributes 2.) Large number of rarely used attributes such as neologisms or antiquated words 3.) Large # of frequently used words that contain no useful info 4.) All these can be mitigated with pre-processing 38. Training Set Quality - MAR When missing data is not random but can be totally related to a variable where there is completeinformation Example - Men not reporting depression 39. Training Set Quality - MCAR The presence/absence of data is completely independent of observable variables 40. Training Set Quality - MNAR When the missing values are neither MCAR nor MAR. People w/ depression not reporting it. 41. Training sets used to construct hypotheses 42. Validation set Randomly sample a validation set and hide it Finally unlock the drawer with the validation set; evaluate (objectively) using it: publish these results 43. What is Machine Learning? Field of study that gives computers the ability to learn without being explicitly programmed - Samuel, 1959 Learning is changing behavior in a way that makes performance better in the future - Witten & Frank 1999 Improvement with experience at some task and A well-defined ML problem: - improve over task T - w/ regards to performance measure p - based on experience E ...Mitchell, 1997 44. Wolpert's No Free Lunch theorem There are no hard-and-fast rules for which algorithm will work well for your data - Different algorithms make different assumptions, that are either well suited or poorly suited to theparticular dataset
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks