# Analysis of High Intensity Interval Training with R

High Intensity Interval Training has gained great prominence in recent years. This type of training intersperses short periods of very intense exercise with periods of more gentle exercise to recover. Rachel is a fan and uses a version where she does 6 one minute all out sprints separated by 5 one minute walks. The running app we use (Runkeeper) produces a graph of speed (or pace) vs time but smooths the graph to a sinusoidal curve rather than something more akin to a square wave. She asked me if I could do some analysis. In particular she wanted to know whether her first minute run, when she was fresh, was faster than the rest.
Fortunately it is very easy to download the gpx file for the run from the Runkeeper website. Reading this in to R is straightforward using readGPX from the plotKML package. A bit of processing gives a data frame of longitude, latitude, elevation and time. Adding the lagged data for longitude, latitude and time gives me everything I need in each row of the data frame to calculate the speed between that time and the previous time.
Plotting the speed vs time has the expected pattern with the periods of running and walking very clear. There is a fair amount of scatter, presumably because the gps is not that accurate over short times and distances – hence the smoothing in the first place.
I then used an if_else to add a new variable of “R” if the speed was greater than 9 kph and “W” if it was less. I also removed the final anomalous point. Grouping by this  Run/Walk variable gave me:
```# A tibble: 2 x 3
st    meanspeed meanpace
1 R         12.8      4.81
2 W          6.05    10.5
```
The challenge then is to look at each 1 minute section separately. To do this I introduced a new variable which is 1 if “R” has changed to “W” (or vice versa) or 0 otherwise. The final new variable is a cumulative total of these 1’s and 0’s. Now we are in business! Summarising on this final variable gives me the speed for each of the 11 stages:
```# A tibble: 11 x 4
Stage Type  meanspeed meanpace

1     1 R         14.4      4.29
2     2 W          6.35     9.92
3     3 R         12.3      4.93
4     4 W          5.73    11.4
5     5 R         12.5      4.90
6     6 W          6.38     9.87
7     7 R         12.6      4.83
8     8 W          6.03    10.8
9     9 R         12.1      4.99
10    10 W          5.66    11.0
11    11 R         12.6      4.93
```
So I’ve answered Rachel’s question. She was quite a lot faster in her first sprint and then incredibly consistent for the remaining 5 sprints. Putting the average data on to the plot was more difficult than I thought it would be. The obvious solution is to draw line segments for each of the 11 stages which requires the starting and finishing x and y values which I put into a separate data frame. The elegant solution (see https://stackoverflow.com/questions/32504313/how-to-add-horizontal-lines-showing-means-for-all-groups-in-ggplot2) is to draw a trend line using a linear model but force it to ignore the value of time. If you run a linear model and say that the x variable has no power to explain anything it will just give you the mean of the y value because in the absence of anything else that is the best guess.

### Notes

It’s not perfect. Some of the points were captured at the transition between running and walking and I also ignored the few seconds between each section. Calculating a mean speed for a stage by averaging the instantaneous speeds is only strictly accurate if each point is recorded at regular time intervals. Doing the calculation by summing all the distances and dividing by the total time was very close (less than 5% difference) to the more straightforward calculation of the mean.
I also read the gpx file into the excellent gps track editor (http://www.gpstrackeditor.com/) to check that the distances between each point were correct and the route plotted on the map was reasonably accurate.
Here is the complete code. It assumes you have downloaded your gpx file (HIIT.gpx) into your working directory. It’s entirely possible that the gpx files created by other apps have a slightly different structure so you may need to adapt the code accordingly. The distance is calculated from the Haversine formula  https://rosettacode.org/wiki/Haversine_formula#R.
```library(tidyverse)
library(plotKML)
library(lubridate)

great_circle_distance <- function(lat1, long1, lat2, long2) {
a <- sin(0.5 * (lat2 - lat1)/360*2*pi)
b <- sin(0.5 * (long2 - long1)/360*2*pi)
12742 * asin(sqrt(a * a + cos(lat1/360*2*pi) *
cos(lat2/360*2*pi) * b * b))
}

df <- as.data.frame(gpx.raw[])
colnames(df) <- c("lon","lat","ele","time")

df2 <- df %>% mutate(ele = as.numeric(ele),
time  = ymd_hms(time),
lon2 = lag(lon),
time2 = lag(time),
deltat = time - time2,
lat2 = lag(lat),
dist = great_circle_distance(lat, lon,lat2, lon2),
speed = dist/as.numeric(deltat)*3600,
pace = as.numeric(deltat)/dist/60,
st = if_else(speed >= 9, "R", "W"),
change = if_else(st == lag(st), 0 ,1)) %>%
drop_na() %>%
mutate(Stage = cumsum(change))

df2 %>% ggplot(aes(time, speed)) + geom_line(colour = "grey") + geom_point()

df2 %>% filter(pace < 22) %>% group_by(st) %>% summarise(meanspeed = mean(speed), meanpace = mean(pace))

df2 %>%  filter(pace < 22) %>% group_by(Stage) %>% summarise(Type = first(st),  meanspeed = mean(speed), meanpace = mean(pace))

df2 %>% filter(pace < 22) %>%
ggplot(aes(time, speed, colour = st, group = Stage)) +
geom_line(aes(time, speed),colour = "grey") +
geom_point() +
stat_smooth(method="lm", formula=y~1, se=F, colour = "black") +
labs(title = "Rachel's HIIT 22/4/20", y = "Speed in kmh")
```