1. Interquartile range (IQR) for anomaly detection
In time series analysis, leveraging the interquartile range (IQR) is a robust method for detecting anomalies or outliers within a dataset. The IQR, representing the range between the first quartile (Q1) and the third quartile (Q3) of the data distribution, encapsulates the middle 50% of the observations. By calculating the IQR and defining a threshold, typically based on a multiple of the IQR, anomalies can be identified as data points falling outside this range. This technique is particularly effective in handling skewed or non-normally distributed data, offering a resilient measure of central tendency. In the context of time series data, applying the IQR method allows for the identification of unusual patterns or fluctuations that may deviate from the expected behavior. This approach is valuable in diverse fields, including finance, healthcare, and retail, where anomalous events can significantly impact decision-making processes and necessitate timely interventions.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(timetk)walmart_sales_weekly
# A tibble: 1,001 × 17
id Store Dept Date Weekly_Sales IsHoliday Type Size Temperature
<fct> <dbl> <dbl> <date> <dbl> <lgl> <chr> <dbl> <dbl>
1 1_1 1 1 2010-02-05 24924. FALSE A 151315 42.3
2 1_1 1 1 2010-02-12 46039. TRUE A 151315 38.5
3 1_1 1 1 2010-02-19 41596. FALSE A 151315 39.9
4 1_1 1 1 2010-02-26 19404. FALSE A 151315 46.6
5 1_1 1 1 2010-03-05 21828. FALSE A 151315 46.5
6 1_1 1 1 2010-03-12 21043. FALSE A 151315 57.8
7 1_1 1 1 2010-03-19 22137. FALSE A 151315 54.6
8 1_1 1 1 2010-03-26 26229. FALSE A 151315 51.4
9 1_1 1 1 2010-04-02 57258. FALSE A 151315 62.3
10 1_1 1 1 2010-04-09 42961. FALSE A 151315 65.9
# ℹ 991 more rows
# ℹ 8 more variables: Fuel_Price <dbl>, MarkDown1 <dbl>, MarkDown2 <dbl>,
# MarkDown3 <dbl>, MarkDown4 <dbl>, MarkDown5 <dbl>, CPI <dbl>,
# Unemployment <dbl>
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year
frequency = 13 observations per 1 quarter
trend = 52 observations per 1 year