All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

Machine Learning
Study online at quizlet.com/_49727f
1. Actor-Critic 1. Produce action,a, for current state,s 7. Bayes Rule Given A:
Algorithm 2. Observe next state,s', and the Formula $P(A|x) = \frac{P(X|A)P(A)}{P(x)}$
reward, r
3. Update the utility of state, s(critic)
4. Update the probab

Tags

Transcript

1.
Actor-CriticAlgorithm
1. Produce action,a, for current state,s 2. Observe next state,s', and thereward, r 3. Update the utility of state, s(critic) 4. Update the probability of the action
2.
Actor-Critic Methods
A meta-architecture that splitslearning into two elements: - Actor : A function that selectsactions from states - Critic: A function that provides avalue estimate for a state and action.
3.
Actor Only methods
i) Search in policy space - Run n policies for M episodes - Keep the best one - Evolve ii) Ignore state-action valueinformation iii) Potentially less efficient
4.
Addition Law ofprobability
$P(A \bigcup B) = P(A) + P(B) - P(A\land B)$ For example picking an ace and adiamond: $P(A) = \frac{4}{52}$ $P(B) = \frac{13}{52}$ $(A \land B) = \frac{1}{52}$ Therefore: $\frac{4+13-1}{52} = \frac{4}{13}$
5.
Advantages ofTemporal DifferenceLearning
i) Do not need model of theenvironment ii) Bootstrap - Learn a guess from a guess iii) Work in an on-line fashion - No need to wait for terminal state - Update every time step
6.
Auto-encoder
Trained with a standard trainingalgorithm and learns to map inputback onto itself.
7.
Bayes RuleFormula
Given A: $P(A|x) = \frac{P(X|A)P(A)}{P(x)}$ $P(A|x) = \frac{P(X|A)P(A)}{P(X)}$ $P(X) = P(X,A)+P(X,B)$ $P(X,A) = P(A)P(X|A)$ $P(X,B) = P(B)P(X|B)$ For a cluster a given the data point $x_{i}$ itis: $p(a|x_{i}) = \frac{p(x_{i}|a)p(a)}{p(x_{i}|b)p(b)+p(x_{i}|a)p(a)}$
8.
Belief state
Because of not being able to access state, s,directly you introduce a belief state It is essentially a probability distribution
9.
BeliefUpdates
We can update the estimated belief stateusing the probabilities in the observation andtransfer functions Point-based value iteration
10.
Bootstrap
i) Good method for small datasets. ii) Creates a training set by sampling withreplacement iii) Samples a dataset of N instances by Ntimes to create a new training set of Ninstances - Test on all instances
11.
Bootstrapproperties
i) Pessimistic due to high probability of singleinstances not making it to the training set ii) Combine with re-substitution error for arealistic error estimate.
12.
Calculating variance
$\frac{1}{n} \sum_{i=1}^{n} (x_{i}- \mu)^2 $
13.
ClassificationUsing SOMS
i) Map training data classes to winning nodes ii) Find winning node for new test data iii) Classify according to the node's majorityclass from the training data
14.
Clustering
Idea of grouping patterns so that theirvectors are similar to one another within thesame region.
All require a way to determine similarity
Machine Learning
Study online at
quizlet.com/_49727f
15.
Constructing adecision tree
The dataset is split recursively into subsetsbased on a single input variable Until: - All subsets has the same class - Splitting no longer improves predicton - Subsets are too small - Top-down induction of decision trees - Greedy
16.
Constructivistapproach
Doesn't require access to state space ortransition functions.
17.
Cost functionfor a datapoint
$e_{i} = y_{i} - (mx_{i} + c) $
18.
Cost functionover a data set
$e = \sum_{i=1}^{n} (y_{i}-(mx+c))^{2}$
19.
Covariance
Measures the strength of the linearrelationship between 2 variables It is given by: $\epsilon [ (x- \mu x)(y- \mu y)]$
20.
Critic Onlymethods
i) Value function estimation ii) Trivial action selection - TD-learning, Q-learning
21.
Decision tree
i) A set of ordering rules for classifying data ii) Each node addresses an input variable iii) Leaves assign labels or values to thedata
22.
Decision treeadvantages
i) Simple and intuitive ii) Good computational performance iii) Possible to validate a model usingstatistical tests
23.
Decision treeLimitations
i) Greedy algorithms can get stuck in localoptimality. ii) Simple decision structure does notrepresent all problems effectively. - XOR parity. - Producing large trees iii) Can produce overly complex trees thatdo not generalise well - Overfitting
24.
Deep Learning
i) Using a neural network with several layersof nodes between the input and output. ii) Series of layers between the input andoutput do feature detection andprocessing. iii) Model human visual system
25.
DimensionalityReduction
i) Inputs are of dimension, i - Number of input features ii) Outputs are of dimension 2 - can be more SOM is an example of vector quantization - 2 <i
26.
DiscountedFuture Reward
All sequence information is stored Can reconstruct all sequences andcalculate discounted reward for all bottom-level o/a/r tuples
27.
DistanceEquations forclusteringEuclidian Distance:
$d_{ij} = \sqrt{(x_{i}-c_{j})^{T} (x_{i}-c_{j})}$
Mahannabis distance:
$(x-y)^{T} \Sigma^{2} (x-y) $ - This accounts for the fact that thevariances in each direction are different - Accounts for covariance betweenvariables - Reduces the amount of uncorrelatedvariables using unit variance.
28.
EligibilityTraces
i) A record of the most recentlyexperienced state,actions, and rewards - A sequence of tuples <s,a,r> inchronological order - Updated on each step of the algorithm ii) Improved mechanism for temporal creditassignment - Spreading new information about state-action values through the value function
29.
Error rate
fraction of misclassified data points $1-p_{m}$
30.
Evaluating adecision tree
i) Error rate - Proportion of errors across all instances ii) Resubstitution error - Error rate on the training data - Tends to be optimistic due to overfitting iii) Test set error - Performance on previously unseen set testdata iv) Holdout - Reserve some data, often 20% for testing - N-fold cross validation
31.
Functionapproximation
Approximates a function from a subset ofinputs and outputs
32.
Gini Impurity
$ Gini(S) = 1-\sum_{c=1}^{l}p_{c}^{2}$
33.
Greedy-layer-wisetraining
Steps for training 1) Train first layer using unsupervised withoutlabels 2) Use abundant unlabeled data which is notpart of the training set 3) Freeze the first layer parameters and starttraining the second layer using the output ofthe first layers as unsupervised input to thesecond layer. 4) Repeat 5) Unfreeze all weights and fine tune thenetwork
34.
Greedylayer-wisetrainingadvantages
i) Avoids many problems of training deepnetworks in a supervised fashion ii) Each layer gets full learning focus since itsthe top layer. iii) Takes advantage of unlabelled data. iv) Helps with problems of ineffective earlylayer learning v) Deep net local minima
35.
Hidden stateIdentification
The match between STM and LTM providesan estimate of how close traces in LTM are toSTM
36.
HierarchicalClustering
i) Breaks down the dataset into a series ofnested clusters ii) Single cluster at top with all data iii) N clusters are at the bottom one for eachdata point iv) Can be displayed as a dendogram
37.
Hierarchicalencoding ofsequencedata
i) Encode all observation-action-rewardtuples in SOM ii) Record STM as decaying activation ofaugmented SOM nodes iii)Use another SOM to learn the sequenceinformation in the node activations iv) Auto-encoding
38.
Improvementin decisiontrees
Improvement $ I(S_{i1},S_{i2})$ is thedifference in quality between the srcinalsubset $s_{i}$, and the joint quality of twonew subsets $s_{i1},s_{i2}$ $I(S_{i1},S_{i2}) = Q(S_{i}) - \sum_{i=1}^{n}\frac{|S_{in}|}{|S_{i}|} Q(S_{in})$ - Gini gain - Improvement based on giniimpurity - Information gain - based on entropy
39.
Instance-BasedMethods
i) Doesn't construct a theory ii) Does classification/regression from the rawdata each time iii) Complexity is limited to the size of thetraining set iv) Nearest sequence memory - Match current episode against past episodes
40.
Instance-Basedsolutions
i) Nearest sequence memory - Keep current episode in STM - Remember last N episodes ii) Can learn very quickly - Potentially one-shot iii) Fixed memory requirements - Can forget solutions
41.
InterpretingCovariance
if cov(x,y) > 0 => x,y are positively correlated if cov(x,y) < 0 => x,y are inversely correlated if cov(x,y) = 0 => x,y Independent
42.
Issues withstoppingcriteria
i) Decide on a stopping criteria a) Too specific and it will produce small,underfitted trees - local minima b) Loose criteria produce large, overfitting
43.
Kernel Trick
i) Transforming some datasets can make itlinearlhy separable. ii) This can be done more rigorously using amapping function (kernel) which transformsone given space into another iii) Transformation of functions for easier tosolve problems
44.
K-foldcross validation variants
K is commonly set to 5 or 10 based onexperience - Stratification - Each subset has the same proportion of datafrom each class - Leave one out CV - Fold size 1 - Best use of data and most expensive
45.
K-meansclustering
K - the number of clusters K is set by hand 1. Set a suitable number of K clusters 2. Randomly assign the first K data points to becentroids of the k clusters 3. Loop 3.1. For each data point assign closest clustercenter 3.2. Re-compute cluster centers as mean ofassigned 3.3. Terminate loop if no change 4. Repeat processing changes
46.
K-MeansFormula
Selecting which of 2 classes to allocatevector $x_{i}$. Calculate the point distance to mean of clclusters: $d_{i1} = \sqrt{(x_{i}-c_{1})^{T} (x_{i}-c_{1})}$ $d_{i2} = \sqrt{(x_{i}-c_{2})^{T} (x_{i}-c_{2})}$ $d_{i1} < d_{i2} $ Allocate $x_{i}$ to $c_{1}$ $d_{i1} > d_{i2} $ Allocate $x_{i}$ to $c_{2}$
47.
K-NearestNeighbourLearning
i) Select K-Points in the training set that arenearest to the data point you need toclassify - Requires a distance metric for decidinghow close points are - Euclidian distance $ d(p,q) = \sqrt {\sum_{i=1}^{n}(q_{i} -p_{i})^{2}}$
48.
KNN SampleSpace
i) Distance from the distance metric ii) Classification is the majority class of thenearest neighbours iii) Can weight the neighbours by theirproximity
49.
Kohonen Self-OrganizingMaps
i) Preserves similarity between examples - examples that are close in n dimensionalspace will be close in m dimensional space Structure : $X_{i}$ - input vector features $N_{j}$ - Nodes / Map $W_{i,j}$ - Input vector is fully connectedto each node Node weights are similar to K-meanscentroids
50.
Limitations ofBack-propagation
i) Multiple hidden layers ii) Get stuck in local optima - Start weight from random positions iii) Slow convergence to optimum - Large training set needed iv) Only uses labelled data v) Error attenuation with deep nets
51.
LinearDecisionBoundary
Partitions the input space into discrete regions - Each region is associated with a class label i) 2D is a line ii) 3D is a plane ii) n-D is a hyperplane $ g(x) = w^{T}x+b$ - Where w is weight, b is aconstant For boundary Classification if $ g(x)>0 $ then class 1 if $ g(x)<0$ then class 0
52.
Markov decisionproblemconsists of
S - Set of States A - Set of Actions T - Transition Function R - Reward Function
53.
ModelBasedAlgorithms
i) Explicitly estimates T ii) Use T to improve policy
54.
Model FreeMethods
i) Approximate value functions fromexperience ii) Do not model the transition functionexplicitly
55.
Modelling a1D Gaussian
Given parameters with mean $\mu$ andvariance $\delta^{2}$ we can compute theprobability of the data point from the Gaussianmodel. Probabilities of data point $x_{i}$ for model ais: $P(x_{i} | a) = \frac{1}{\sqrt{2 \pi \delta^{2} a}}exp { (- \frac{(x_{i} - \mu)^{2}}{2 \delta^{2}})}$
56.
Models
i) Anything that can be used to predict resultsof actions - Simulated experience ii) Distribution models - Provide probability distributions oversuccessor states iii) Sample Models - Return a single successor state drawn fromthe appropriate probability distribution
57.
Monte-CarloMethods
Assume we don't know the transition function,T - Model Free Evaluate the policy based on experience - Obtain episodes by following policy $\pi$ - Start at state $s$, take actions until goal isreached Use average discounted reward

Related Search

Previous Document

Next Document

Related Documents

Sep 24, 2017

Sep 26, 2017

Sep 26, 2017

Sep 29, 2017

Oct 1, 2017

Oct 2, 2017

Oct 2, 2017

Oct 2, 2017

Oct 9, 2017

Oct 14, 2017

Oct 15, 2017

Oct 19, 2017

Oct 20, 2017

Oct 21, 2017

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x