Posts

Showing posts from June, 2018

Research Update 4 ~Alex

Image
After the craziness of week 3, week 4 has seemed relatively calm. At the beginning of the week I was simply finishing up what my breakthrough last week enabled me to. The new variables are fully created and we are in the process of comparing them to the variables from the survey. We are also starting to focus on how we will be able to visualize our data. Because of the longitudinal and high dimensional nature of our dataset, it is almost impossible to visualize without manipulation. To reduce the number of dimensions, we decide to employ t-sne (a dimensional reduction algorithm). T-sne can take hundreds of thousands of dimensions down to 2 which is much easier to visualize. T-sne can be a little touchy though, so right now we are carefully tuning the parameters, especially perplexity, a number that tells the algorithm whether to prioritize the global or the local aspects of the data (basically a guess on how many neighbors each point has). A graph with the data sets we're focused o

Third weekend in California

Saturday was another fantastic day in San Francisco! It was a very full day. I got up early and tried an adorable little Asian bakery in Berkeley, got breakfast there and ate on Berkeley's campus. The three of us met up at 11 and took BART to San Francisco. It was pride weekend, so on the way we got to see the amazing spirit and pride of party goers also heading to San Francisco. The station we got out at was right in the middle of the festivities. There were booths selling merchandise and live music. It was amazing to be at such a momentous event. We got hungry so we set off toward a Syrian place Alina recommended. It was amazing! I got falafel and baba ganoush. It was my first time trying baba ganoush and it was amazing! Even Alina who has had it before said this one was especially good. The pita was warm when it came out and everything went together so perfectly. I even got to steal some of Tyler's humus and it didn't even taste like store bought humus. It was so goo

Weekend 3: Second Trip to San Francisco

Image
This weekend I got to do something that I knew I needed to do since planning my trip. Visit the location of the picnic in the opening of Full House. It was a lot different than I thought it would be. For one, I always thought the hill went up toward the buildings when it really goes down. Also, the streets out of picture aren't really where I thought they would be in relation to the camera. Because of these preconceived notions, it felt like I wasn't even in the same place though it was obvious I was. Regardless, I couldn't help but whistle the Full House theme while I was there.     According to the show, the Tanners lived in one of the painted ladies. But in reality, the house always shown as the Tanner household was about a 20 minute walk away. I've never been a huge fan of the show, but I do know that you don't see the front of the house nearly as much as you see the picnic scene, so this was only a little less amazing. Finally, we sp

Week 3 Review: 6/18 - 6/22

Image
This week I've taken a closer look at my data and which measurements I may be studying more than others. Because so many variables are measured in the Tstat data, there will certainly be some that are much more relevant than others. I have consulted papers on similar studies to help determine which variables aren't of any use so I can focus on the ones that are relevant. Also in these studies were some experiments. Throughout the week, I have tried to replicate two of these experiments, one from two separate papers. One uses Tstat data like my own and the other uses data retrieved and processed here at Lawrence Berkeley National Laboratory in the Nersc Center. Because Alex Sim, one of my mentors, was an author of the second paper, he gave me access to the exact same data from the paper to try to recreate the experiment. For the first experiment, I analyzed a 41.4 MB zip file using the same methods of cleaning, column calculation, and clustering as the paper. The paper menti

Research update 3 ~Alex

Image
I was looking over what I posted last time and actually couldn't believe how much we have accomplished in just one week. I also realized I never explained how Optimal Matching is. Optimal Matching to a non-Euclidean distance metric which can be used on large sequences of data. It works through a series of substitutions; the less substitution it takes to match two series the "further apart" they are.  Determining the substitution cost for missing values can be tricky especially for missing values due to the alignment of data. Too high of a cost can cause artificial clusters to form by the alignment variable, and too little can under represent possible similarities in the missing data. So determining the optimal cost for missing values, "NA cost", will be an important part of our analysis. Since I last posted there have been good times and bad times. Monday and Tuesday were good days. We worked through the methods used in the last publication on our data set to ge