Research Update 5 ~Alex
The plots work! It took all of Monday and most of Tuesday, but we
finally found the problems in the code. Ironically, as we were trying to
fix the plots they progressed from not clustering at all to all
clustering finally to what they should look like. Once those problems
were figured out we were able to do slight alterations and use different
information until we were left with a robust graphical interpretation
of the optimal NA cost. Here are a couple of the graphs:
These 2 are colored by age group. For example the read in the top plot are the people born before 1983 and the green those born after. So the clusters tell us that the age is playing a significant role in the placement of individuals and thus are creating clusters. Because age is our alignment variable, different age groups are going to have a different level of missing values and thus the higher cost for replacing missing values will affect the younger survey participants much more than the older participants. This is why artificial clusters form by age as the cost to replace missing values is increased.
The clusters also disappear as the later years of the sequence are removed. We also have plots stepping back the ending year from 50 to 35, but they are super tiny when posted here, so I have not included them. In addition to experiments with t-SNE and finishing up the next few sections of my paper, we have started working on removing the noise from our newly created variables. I'm a little unsure of how to adapt the variables all at once, so I plan to do that Monday morning when Alina returns.
Overall this was a pretty good week! Mostly meeting free so we were able to focus on work. Hopefully the productivity continues into next week, but it does look meeting heavy.
These 2 are colored by age group. For example the read in the top plot are the people born before 1983 and the green those born after. So the clusters tell us that the age is playing a significant role in the placement of individuals and thus are creating clusters. Because age is our alignment variable, different age groups are going to have a different level of missing values and thus the higher cost for replacing missing values will affect the younger survey participants much more than the older participants. This is why artificial clusters form by age as the cost to replace missing values is increased.
The clusters also disappear as the later years of the sequence are removed. We also have plots stepping back the ending year from 50 to 35, but they are super tiny when posted here, so I have not included them. In addition to experiments with t-SNE and finishing up the next few sections of my paper, we have started working on removing the noise from our newly created variables. I'm a little unsure of how to adapt the variables all at once, so I plan to do that Monday morning when Alina returns.
Overall this was a pretty good week! Mostly meeting free so we were able to focus on work. Hopefully the productivity continues into next week, but it does look meeting heavy.
Comments
Post a Comment