辅导STAT4650讲解留学生SQL语言程序
- 首页 >> Database
Big Data Stats II                                                                                                 Name:__________                       
STAT4650 Sample Final Exam Solution  
Spring 2020  
Instructions  
1. Make sure to read or parse the entire exam.  
2. Open book and open note.  
3. You are allowed to use a scientific calculator.  
4. Exiting the exam under any circumstance is final, and you will NOT be allowed to take it on  
your second attempt.  
5. If any two or more of you submit identical answers to essay questions, it will result in a score  
of zero for that question.  
6. If you copy the content from the textbook/handout/solution, it will result in a score of zero for  
that question.  
7. You have 120 minutes to complete the test.  
8. Make sure you hit the ‘Submit’ button once you are done with your exam.  
9. If you have any questions, simply enter my online meeting room via zoom and I will help you  
with your questions.  
10. Don't panic.  
Students in my class are required to adhere to the standards of conduct set by Clark University  
and GSOM. Please sign the following Honor pledge that signifies your understanding of the  
rules set by the code of conduct.  
“I pledge my honor that I have not violated Clark University's Code of Conduct during this  
examination.”  
Please sign here to acknowledge_____________________________  
2  
1. This question is about trees and random forests.  
(a) [5 pts] Sketch the tree corresponding to the partition of the predictor space illustrated in the  
following Figure. The Ri inside the boxes indicate region i.   
Sol:  
3  
(b) [5 pts] Create a partition, using the tree illustrated in the following figure. Drag items onto  
the image.  
Solution:  
(c) [5 pts] What is a Bootstrap Aggregation of Decision Trees? Explain (2-3 sentences).   
Sol:  
We first generate B different bootstrapped training datasets. Construct B decision trees on each  
of the B training datasets, and obtain the prediction. To do prediction, we take average of all  
predictions from all B regression trees. In case of classification problem, we take majority vote  
among all B trees.  
4  
(d) [5 pts] How does a Random Forest differ from a Bootstrap Aggregation of Decision Trees?  
Explain (2-3 sentences).  
Solution:  
Build a number of decision trees on bootstrapped training sample, but when building these trees,  
each time a split in a tree is considered, a random sample of m predictors is chosen as split  
candidates from the full set of p predictors (Usually  = √).  
5  
2. This question relates to Support Vector Machines and uses the data below.  
(a) [5 pts] We are given n = 7 observations in p = 2 dimensions. Horizontal axis corresponds  
to 1 and Vertical axis corresponds 2. For each observation, there is an associated class  
label. Sketch the observations in the coordinate grid.  
(b) [5 pts] Provide 0, 1,  2 for the maximum margin separating hyperplane defined by  
0 + 11 + 22 = 0.  
(c) [5 pts] Indicate the support vectors for the maximal margin classifier (you may answer  
this question by writing down the coordinates of the support vectors).  
(d) [5 pts] Argue that a slight movement of the seventh observation would not affect the  
maximal margin hyperplane.  
Obs. 1 2 Y  
1 3 4 Red  
2 2 2 Red  
3 4 4 Red  
4 1 4 Red  
5 2 1 Blue  
6 4 3 Blue  
7 4 1 Blue  
Solution: (a)  
()  
(b) 0.5 − 1 + 2 = 0 (Note: any equation close to this one is Okay)  
6  
If a point falls above the given line meaning that 0.5 − 1 + 2 > 0, we classify the point as  
“red” while if our point is below the given line meaning that 0.5 − 1 + 2 < 0, we classify the  
point as “blue”.  
(c) The support vectors are the four points that pass though the gray lines.  
These points are (2,1), (2, 2), (4,3), (4,4).  
(d)The seventh point is located at (4, 1) which is far from the separating hyperplane and not  
close to any of the supporting vectors which determine the separating hyperplane. As such small  
movements in its location won’t change the separating hyperplane.  
7  
3. (a) [5 pts] Considering the two methods “k-means clustering” and “k-nearest neighbors”,  
which is a supervised learning algorithm and which is an unsupervised learning algorithm?  
Solution:  
“k-means clustering” is unsupervised learning while “k-nearest neighbors” is supervised  
learning  
(b) [5 pts] What quantity does PCA minimize when it generates each principle component?  
Solution: the sum of the squared perpendicular distances to each point  
(c) [5 pts] What are the optimum number of principle components in the below figure?  
Solution:  
We can see in the above figure that the number of components = 30 is giving highest  
variance with lowest number of components.   
8  
4. Suppose that we have four observations, for which we compute a dissimilarity matrix, given  
by  
For instance, the dissimilarity between the first and second observations is 0.3, and the  
dissimilarity between the second and fourth observations is 0.8.  
(a) [5 pts] On the basis of this dissimilarity matrix, sketch the dendrogram that results from  
hierarchically clustering these four observations using complete linkage. Be sure to indicate on  
the plot the height at which each fusion occurs, as well as the observations corresponding to each  
leaf in the dendrogram. Drag items   
(b) [5 pts] Repeat (a), this time using single linkage clustering.  
(c) [5 pts] Suppose that we cut the dendrogram obtained in (a) such that two clusters result.  
Which observations are in each cluster?  
(d) [5 pts] Suppose that we cut the dendrogram obtained in (b) such that two clusters result.  
Which observations are in each cluster?  
(e) [5 pts] It is mentioned in the chapter that at each fusion in the dendrogram, the position of the  
two clusters being fused can be swapped without changing the meaning of the dendrogram.  
Draw a dendrogram that is equivalent to the dendrogram in (a), for which two or more of the  
leaves are repositioned, but for which the meaning of the dendrogram is the same.  
Solution:  
(a)  
9  
10  
(b)  
(c)  
(1,2), (3,4)  
(d)  
(1, 2, 3), (4)  
(e)  
11  
5. Time Series Forecasting  
(a) [10 pts] What are the differences between autoregressive and moving average models?  
Solution: Autoregressive models specify the current value of a series yt as a function of its  
previous p values and the current value an error term, ut, while moving average models  
specify the current value of a series yt as a function of the current and previous q values  
of an error term, ut. AR and MA models have different characteristics in terms of the  
length of their “memories”, which has implications for the time it takes shocks to yt to die  
away, and for the shapes of their autocorrelation and partial autocorrelation functions.  
An autoregressive process has a geometrically decaying acf and  a number of non-zero  
points of pacf, which equal to AR order. A moving average process has a number of non- 
zero points of acf that equal to MA order and a geometrically decaying pacf.  
(b) [10 pts] A researcher wants to test the order of integration of some time series data. He  
decides to use the DF test. He estimates a regression of the form  
∆ =  + −1 +   
and obtains the estimate  = −0.023 with standard error = 0.009. What are the null and  
alternative hypotheses for this test? Given the data, and a critical value of −2.86, perform  
the test. What is the conclusion from this test and what should be the next step?  
Solution: The null hypothesis is of a unit root against a one sided stationary alternative, i.e. we  
have   
H0  : yt is non-stationary process  
H1 : yt is stationary process  
which is also equivalent to  
H0  :  = 0  
H1 :  < 0  
The test statistic is given by /() which equals -0.023 / 0.009 = -2.556. Since this is not more  
negative than the appropriate critical value, we do not reject the null hypothesis.   
We therefore conclude that there is at least one unit root in the series (there could be 1, 2, 3 or  
more). What we would do now is to regress 2yt on yt-1 and test if there is a further unit root. The  
null and alternative hypotheses would now be:  
H0 : yt  I(1)  i.e.  yt  I(2)  
H1 : yt  I(0)  i.e.  yt  I(1)  
If we rejected the null hypothesis, we would therefore conclude that the first differences are  
stationary, and hence the original series was I(1). If we did not reject at this stage, we would  
conclude that yt must be at least I(2), and we would have to test again until we rejected.
