Photography by Alejandro Escamilla, via Unsplash.com

Photography by Alejandro Escamilla, via Unsplash.com

I learned programming in R during my time in graduate school (see below), and ever since big data has captured my big interest. I enjoy wrangling data, and try to maintain this interest with small (and large) side projects. You can find me getting my feet wet with the plyr package some time ago. I like data that means something to me, either personally or because it relates to a topic of interest. From a personal point of view, watching hurricane Sandy graze by Toronto using R was fun (and informative). More recently I looked at my savings using a bike rental program (or as the Brits call it “a cycle hire scheme”) in London. Soon I will be working on a bigger, and very interesting, project so stay tuned!

Data Science in my Ph.D. Project

My research has been characterized by projects that are technically challenging. Each project has been interesting because nobody had really attempted it before, perhaps in part because of the aforementioned challenges. Most of the time, innovative ideas or lucky collaborations have lead to solutions that have made these projects possible.
Discovering R is part of one of these stories of nearly-impossible-projects. On one hand I was lucky to have access to a tracking software developed by James McCrae for use in our lab. The software could easily track one or multiple zebrafish in a tank. However, the output data is high resolution (30 frames per second), and my relatively short trials resulted in data files containing 14400 observations each. I generated more than 3000 trials. In simples words, that is a lot of data and processing, summarizing, and calculating the statistics would have been prohibitively time-consuming using excel. Luckily, just as I generated the data I discovered R.

One course and many months of study later I became proficient programming in R.

The result of these months of work is a library of R scripts to process and analyze high-resolution positional data for behavioral research.