Interpretable Time Series Autoregression
Quantifying periodicity and seasonality of time series with sparse autoregression. The optimization on sparse autoregression is used to identify dominant and positive auto-correlations of time series (e.g., human mobility and climate variables).
(Updated on August 9, 2025)
In this post, we intend to explain the essential ideas of our research work:
- Xinyu Chen, Vassilis Digalakis Jr, Lijun Ding, Dingyi Zhuang, Jinhua Zhao (2025). Interpretable time series autoregression for periodicity quantification. arXiv preprint arXiv:2506.22895.
- Xinyu Chen, Qi Wang, Yunhan Zheng, Nina Cao, HanQin Cai, Jinhua Zhao (2025). Data-driven discovery of mobility periodicity for understanding urban transportation systems. arXiv preprint arXiv:2508.03747.
Content:
In Part I of this series, we introduce the essential idea of time series autoregression in statistics.
I. Univariate Autoregression
Time series autoregression is a statistical model used to analyze and forecast time series data. The class of autoregression models is widely used in the fields of economics, finance, weather forecasting, and signal processing. Exploring auto-correlations from univariate autoregression is meaningful for understanding time series.
I-A. Definition of Autoregression
The essential idea of time series autoregression is that a given data point of a time series is linearly dependent on the previous data points. Mathematically, the th-order univariate autoregression of time series
can be written as follows,
for all . The integer
is the order. Here,
is the value of the time series at time
. The vector
represents the autoregressive coefficients. The random error
is assumed to be normally distributed, following a mean of zero and a constant variance.
There is a closed-form solution to the coefficient vector from the optimization problem such that
which is equivalent to
where denotes the
-norm. The symbol
is the the Moore–Penrose inverse of a matrix. While using
-norm, the vector
consists of the last
entries in the time series vector
, i.e.,
The matrix is also comprised of the entries in the time series vector
, which is given by
In essence, given the data pair constructed by the time series
, the univariate autoregression can be easily converted into a linear regression formula. Thus, the closed-form solution is least squares.
Considering one quick example:
I-B. Motivation of Sparse Autoregression
However, the challenges arise if there is a sparsity constraint in the form of -norm, for instance,
where the upper bound the constraint is an integer , which is supposed to be no greater than the order
. In the constraint,
counts the number of nonzero entries in the vector
, and
is the sparsity level.
II. Sparse Autoregression
II-A. Mixed-Integer Programming
II-B. Semidefinite Programming
III. Time-Varying Sparse Autoregression
The optimization problem is formulated as follows,
Example 1. For any vectors , verify that
.
According to the definition of inner product, we have . In contrast, the outer product between
and
is given by
Recall that the trace of a square matrix is the sum of diagonal entries, we therefore have
as claimed.
III-A. Ridesharing Data
III-B. Formulating Time-Varying Systems
III-C. Solving the Optimization Problem
IV. Periodicity of Hangzhou Metro Passenger Flow
IV-A. Data Description
IV-B. Periodicity Analysis
IV-C. Spatially-Varying Systems
(Posted by Xinyu Chen on February 15, 2025)