Week 6 Review: 7/9 - 7/13

The PCA method from week 5 was modified and implemented to select features to be used for the remainder of the project. Five prominent features are now attributed to each subset. After selecting the features, the data was normalized using MinMaxScaler. Each subset was grouped into 30-minute windows for clustering. I used k-means on each window separately and created lists of centroid locations and group labels.

With the help of Alina, I was able to find and list the degree changes of centroids between windows. Alex Sim, a mentor of mine in the SDM group, published several papers using this idea of degree change. The degree change is defined as the sum of the distances each centroid moves from one window to another. The following visual help demonstrate the proposed formula. The previous window is on the left and the current window is on the right. The diagonal black line segments show the distance each centroid has moved from the previous window to the current window. Consider that centroids are paired from past to present by assuming that each current centroid is the result of moving one unique past centroid. After considering all combinations, the pairs chosen yield the smallest sum of distances. It should be noted that the size and shape of clusters aren't considered with this approach to quantifying the change between windows.

Example of Windows described by degree change

Following the making of the list of degree change, I found that there was a 6 element sequence of zeroes. This would suggest that there was no change between 6 consecutive windows, which obviously raises some questions. I found the source of this error was a lapse in data from all nodes between 2:40 pm and 6:19 pm on 5/16/2017. Without any data in 30-minute windows of this period, the function used to run k-means and save cluster centers just saved the cluster centers from the first window that had cluster centers previous to the empty windows. Because there were seven identical windows, the changes between them were zero. When determining a relation between throughput and degree change between windows, this period will not be considered because there is absolutely no data to speak of anyways.

Comments

Popular posts from this blog

Week 10 Review: 9/6 - 9/10

Week 9 Review: 7/30 - 8/3