Research Update 2 ~ Alex

We've gotten a lot done this week! Tuesday we were granted access to the data we will be using for our project and so far this week we've been looking it over and putting it in the right format. Our data is from a survey conducted in the Bay Area of a participant's demographic (number of people in household, married or unmarried, etc.) and travel information. Because of the way the survey is conducted there are large gaps due to alignment. For example, if a participant is only 30 he or she would obviously be unable to answer questions about how they will travel at 50. These types of gaps are inevitable in longitudinal data, so understanding how those gaps affect the experiments we run is crucial to run analysis on any longitudinal data. We can't see how household size affects public transportation use until we've filled the gap and have comparable sequences. Currently we are filling the gaps with current methods, Optimal Matching, and making minor adjustments, with the substitution cost,  to see what methods create the most authentic and accurate representation of the data. As the gaps are filled and the data becomes more manipulatable we've been able to run a few experiments on it. We're beginning to get interpretable graphs and even ran some preliminary clustering measures!

This graph has the survey participants along the y-axis and their age along the x so each little line represents the sequence of individual participants. The gray towards the right is the gaps we have to fill (they're more obvious in the next few graphs) and we can see that about half our participants lived in a one person household while they were in their twenties.

 
 This is a graph after clustering (We picked an arbitrary amount of clusters. We're currently testing to see the optimal number of clusters.) We can see that Clusters 2 and 3 are younger participants, with quite a few gaps, and Clusters 1 and 4 are older participants. We can actually also tell that Clusters 2 and 4 tend to be the people that chose to have smaller families and the Clusters 1 and 3 tend to have larger families.



This graph demonstrates the erratic behavior the gaps can have if you look at the 40 to 50 range of the younger 2 clusters. It also more clearly demonstrates our assumptions from earlier about the family size of each cluster.

 This is a graphical representation of the ten most frequent sequences in each cluster. These most clearly demonstrate what we observed in the previous graphs, both the clusters typical age an family size.

We have these graphs for all 14 demographic and travel variables and we are looking to start experimenting with gap treatment and to start comparing travel and demographic variables.

Thanks for reading! I'll keep you updated.


Comments

Popular posts from this blog

Week 10 Review: 9/6 - 9/10

Week 9 Review: 7/30 - 8/3