Time Series Convolution
A convolutional kernel approach for reinforcing the modeling of time series trends and interpreting temporal patterns, allowing one to leverage Fourier transforms and learn sparse representations. The interpretable machine learning models such as sparse regression unlock opportunities to better capture the long-term changes and temporal patterns of real-world time series.
(Updated on January 22, 2025)
In this post, we intend to explain the essential ideas of our research work:
- Xinyu Chen, Zhanhong Cheng, HanQin Cai, Nicolas Saunier, Lijun Sun (2024). Laplacian convolutional representation for traffic time series imputation. IEEE Transactions on Knowledge and Data Engineering. 36 (11): 6490-6502.
- Xinyu Chen, Xi-Le Zhao, Chun Cheng (2024). Forecasting urban traffic states with sparse data using Hankel temporal matrix factorization. INFORMS Journal on Computing.
- Xinyu Chen, HanQin Cai, Fuqiang Liu, Jinhua Zhao (2024). Correlating time series with interpretable convolutional kernels. arXiv:2409.01362.
Content:
In Part I of this series, we introduce motivations of time series modeling with global and local trends. These time series trends are important for improving the performance of time series imputation. If there is one appropriate interpretable machine learning model, it is also possible to quantify the periodicity of time series. Following the basic motivations, we elaborate on several key concepts such as circular convolution, convolution matrix, circulant matrix, and discrete Fourier transform in Part II.
Part III and Part IV give the modeling ideas of circulant matrix nuclear norm minimization and Laplacian convolutional representation, addressing the critical challenges in time series imputation tasks. The optimization algorithm of both models makes use of fast Fourier transform in a log-linear time complexity. Part V presents an interpretable convolutional kernel method in which the sparsity of convolutional kernels is modeled by -norm induced sparsity constraints.
I. Motivation
The development of machine learning models in the past decade is truly remarkable. Convolution is one of the most commonly-used operations in applied mathematics and signal processing, which has been widely applied to several machine learning problems. The aims of this post are revisiting the essential ideas of circular convolution and laying an insightful foundation for modeling time series data.
Nowadays, although we have quite a lot of machine learning algorithms on hand, it is still necessary to rethink about the following perspectives in time series modeling:
- How to characterize global time series trends?
- How to characterize local time series trends?
- How to learn interpretable local and nonlocal patterns as convolutional kernels?
Sometimes, time series exhibit complicated trends if they are not stationary.
(a) Global trends (e.g., long-term daily/weekly periodicity)
(b) Local trends (e.g., short-term time series trends)
Figure 1. Illustration of time series trends.
II. Preliminaries
In this study, we build the modeling concepts of Laplacian convolutional representation (LCR) upon several key essential ideas, including circular convolution, discrete Fourier transform, and fast Fourier transform, from the field of signal processing. In the following sections, we will discuss:
- What are circular convolution, convolution matrix, and circulant matrix?
- What is the convolution theorem?
- How can fast Fourier transform be used to compute the circular convolution?
II-A. Circular Convolution
Convolution is a powerful operation in many classical deep learning frameworks, such as convolutional neural networks (CNNs). In the context of discrete sequences (typically vectors), circular convolution refers to the convolution of two discrete sequences of data, and it plays an important role in maximizing the efficiency of certain common filtering operations (see circular convolution on Wikipedia).
By definition, for any vectors and with , the circular convolution (denoted by the symbol ) of these two vectors is formulated as follows,
Elment-wise, we have
where represents the th entry of . For a cyclical operation, it takes when . While the definition of circular convolution might seem complicated for beginners, it becomes much clearer when you consider the concept of a convolution or circulant matrix, as demonstrated in the examples provided below.
As mentioned above, the vector has a length of , and the vector has a length of . According to the definition of circular convolution, the resulting vector will also have a length of , matching the length of the original vector .
Given any vectors and , the circular convolution between them can be expressed as follows,
where and according to the definition.
In this case, each entry of the resulting vector is computed as the inner product between the vector and a reversed and truncated version of the vector . Specifically, the entry is obtained by computing the inner product between and , i.e.,
For subsequent entries , the vector is cyclically shifted in reverse and truncated (only preserving the first entries), and the inner product with is calculated. Figure 2 illustrates the basic steps for computing each entry of the circular convolution.
Figure 2. Illustration of the circular convolution between and . (a) Computing involves and . (b) Computing involves . The figure inspired by Prince (2023).
As can be seen, circular convolution between two vectors can essentially be viewed as a linear operation. This perspective allows one to reformulate the circular convolution as a linear transformation using a convolution matrix (or a circulant matrix when these two vectors have the same length).
Example 1. Given vectors and , the circular convolution has the following entries:
where and according to the definition.
II-B. Convolution Matrix
Using the notations above, for any vectors and with , the circular convolution can be expressed as a linear transformation:
where denotes the convolution operator. The convolution matrix can be represented as follows,
In the feild of signal processing, this linear transformation is a fundamental property of circular convolution, highlighting its role in efficently implementing filtering operations.
Figure 3. Illustration of circular convolution as the linear transformation with a convolution matrix.
Example 2. Given vectors and , the circular convolution can be expressed as:
where is the convolution matrix with columns. Specifically, the convolution matrix is structured as follows,
As a result, it gives
This representation shows that circular convolution is equivalent to a matrix-vector multiplication, making it easier to understand, especially in signal processing applications.
In this post, we aim to make the concepts clear and accessible by incorporting programming code in Python, intuitive illustrations, and detailed explanations of the formulas. To demonstrate how circular convolution can be computed, we will use Python’s numpy
library. First, we will construct the convolution matrix on the vector and then perform the circular convolution as follows.
import numpy as np
def conv_mat(vec, tau):
n = vec.shape[0]
mat = np.zeros((n, tau))
mat[:, 0] = vec
for i in range(1, tau):
mat[:, i] = np.append(vec[-i :], vec[: n - i], axis = 0)
return mat
x = np.array([0, 1, 2, 3, 4])
y = np.array([2, -1, 3])
mat = conv_mat(x, y.shape[0])
print('Convolution matrix of x with 3 columns:')
print(mat)
print()
z = mat @ y
print('Circular convolution of x and y:')
print(z)
II-C. Circulant Matrix
Recall that the convolution matrix is specified with columns, corresponding to the length of vector ). In the case of with same columns, the convolution matrix becoms a square matrix, known as a circulant matrix. In this study, we emphasize the importance of circulant matrices and their properties, such as their strong connection with circular convolution and discrete Fourier transform, even through we do not work directly with circulant matrices.
For any vectors , the circular convolution can be expressed as a linear transformation such that
where denotes the circulant operator. The circulant matrix is defined as:
which forms a square matrix. It always holds that where and are the Frobenius norm of matrix and the -norm of vector, respectively.
Figure 4. Illustration of circular convolution as the linear transformation with a circulant matrix.
Example 3. Given vectors and , the circular convolution is identical to
where is the circulant matrix formed from , defined as:
Thus, the result can be written as follows,
The example shows that the circular convolution of and is equivalent to the circular convolution of and with its last two entries padded with zeros. Thus, to compute the circular convolution of and when is greater than , one can simply append zeros to the end of and perform the circular convolution.
II-D. Discrete Fourier Transform
Discrete Fourier transform (see Wikipedia) is a fundamental tool in mathematics and signal processing with widespread applications to machine learning. The discrete Fourier transform is the key discrete transform used for Fourier analysis, enabling the decomposition of a signal into its constituent frequencies. The fast Fourier transform is an efficient algorithm for computing the discrete Fourier tranform (see the difference between discrete Fourier transform and fast Fourier transform), significantly reducing the time complexity from to , where is the number of data points. This efficiency makes fast Fourier transform essential for processing large problems.
A crucial concept in signal processing is the convolution theorem, which states that convolution in the time domain is the multiplication in the frequency domain. This implies that the circular convolution can be efficiently computed using the fast Fourier transform. The convolution theorem for discrete Fourier transform is summarized as follows,
or
where and denote the discrete Fourier transform and the inverse discrete Fourier transform, respectively. The symbol represents the Hadamard product, i.e., element-wise multiplication.
In fact, this principle underlies many efficient algorithms in signal processing and data analysis, allowing complex operations to be performed efficiently in the frequency domain.
Example 4. Given vectors and , the circular convolution can be computed via the use of fast Fourier transform in numpy
as follows,
import numpy as np
x = np.array([0, 1, 2, 3, 4])
y = np.array([2, -1, 3, 0, 0])
fx = np.fft.fft(x)
fy = np.fft.fft(y)
z = np.fft.ifft(fx * fy).real
print('Fast Fourier transform of x:')
print(fx)
print('Fast Fourier transform of y:')
print(fy)
print('Circular convolution of x and y:')
print(z)
in which the outputs are
Fast Fourier transform of x:
[10. +0.j -2.5+3.4409548j -2.5+0.81229924j -2.5-0.81229924j
-2.5-3.4409548j ]
Fast Fourier transform of y:
[ 4. +0.j -0.73606798-0.81229924j 3.73606798+3.4409548j
3.73606798-3.4409548j -0.73606798+0.81229924j]
Circular convolution of x and y:
[ 5. 14. 3. 7. 11.]
II-F. Hankel Matrix Factorization & Discrete Fourier Transform
The Hankel matrix plays a fundamental role in numerous areas of applied mathematics and signal processing. By definition, a Hankel matrix is a square or rectangular matrix in which each ascending skew-diagonal (from left to right) has the same value. Given vector , the Hankel matrix can be constructed as follows,
where the Hankel matrix has rows and columns. This matrix is often used to represent signals or time series data, capturing their sequential dependencies and structure, see e.g., Chen et al. (2024) in our work.
On the Hankel matrix , if it can be approximated by the multiplication of two matrices and , then one can compute the inverse of Hankel matrix factorization as follows,
where and are the -th and -th rows of and , respectively. Herein, denotes the inverse operator of Hankel matrix. For any matrix of size , the inverse operator is given by
where . is the number of entries on the -th antidiagonal of the matrix, satisfying
Following the aforementioned formula, it is easy to connect the Hankel factorization with circular convolution and discrete Fourier transform (Cai et al., 2019; Cai et al., 2022) such that
where we define two vectors:
of length . Notably, they are different from the vectors and . If , then the vector consists of the first entries of . If , then the vector consists of the vector and zeros, i.e.,
This principle is well-suited to the construction of vector .
Example 5. Given vector , let the number of rows of the Hankel matrix be , the Hankel matrix is given by
If it takes a rank-one approximation such that with and , then the inverse of Hankel matrix factorization can be written as follows,
which can be converted into circular convolution. By doing so, the computing process can be accelerated with fast Fourier transform.
Example 6. For any vector , it always holds that
in which the operator is defined as follows,
Verify this property on the vector if the number of rows of the Hankel matrix is set as .
According to the defintion, we have
The Hankel matrix is given by
Thus, the Frobenius norm of this Hankel matrix is equivalent to the -norm of the vector .
III. Circulant Matrix Nuclear Norm Minimization
Circulant matrices are fundamental in many computational and theoretical aspects of signal processing and machine learning, providing an efficient framework for implementating various algorithms such as circulant matrix nuclear norm minimization. By definition, a circulant matrix is a spcial square matrix where which shifts the previous row to the right by one position, with the last entry wrapping around to the first position. As we already discussed the circulant matrix above, we will present the circulant matrix nuclear norm, its minimization problem, and applications.
III-A. Definition
Nuclear norm is a key concept in matrix computations and convex optimization, frequently applied in low-rank matrix approximation and completion problems. For any matrix , the nuclear norm is defined as the sum of singular values:
where denotes the nuclear norm. As illustrated in Figure 5, the singular values are with .
Figure 5. Singular value decomposition of matrix . In the decomposed matrices, the unitary matrix (or ) consists of orthogonal left (or right) singular vectors, while the diagonal entries of are singular values such that . Note that for notational convenience.
Example 7. Given vector , the circulant matrix is
Thus, the singular values are
As a result, we have the nuclear norm as follows,
Please reproduce the results by using the following numpy
implementation.
import numpy as np
def circ_mat(vec):
n = vec.shape[0]
mat = np.zeros((n, n))
mat[:, 0] = vec
for i in range(1, n):
mat[:, i] = np.append(vec[-i :], vec[: n - i], axis = 0)
return mat
x = np.array([0, 1, 2, 3, 4])
mat = circ_mat(x)
w, s, q = np.linalg.svd(mat, full_matrices = False)
print('Singular values of C(x):')
print(s)
III-B. Property
One of the most intriguing properties of circulant matrices is that they are diagonalizable by the discrete Fourier transform matrix. The eigenvalue decomposition of circulant matrix (constructed from any vector ) is given by
where is the unitary discrete Fourier transform matrix, is the Hermitian transpose of , and is a diagonal matrix containing the eigenvalues of . Due to this property, the nuclear norm of the circulant matrix can be formulated as the -norm of the discrete Fourier transform of the vector :
This relationship draws the strong connection between circulant matrices and Fourier analysis, enabling efficient computation and analysis in various applications, e.g., circulant matrix nuclear norm minimization (Chen et al., 2024).
Example 8. Given vector , the circulant matrix is
The eigenvalues of and the fast Fourier transform of can be computed using numpy
as follows.
import numpy as np
x = np.array([0, 1, 2, 3, 4])
mat = circ_mat(x)
eigval, eigvec = np.linalg.eig(mat)
print('Eigenvalues of C(x):')
print(eigval)
print('Fast Fourier transform of x:')
print(np.fft.fft(x))
in which the outputs are
Eigenvalues of C(x):
[10. +0.j -2.5+3.4409548j -2.5-3.4409548j -2.5+0.81229924j
-2.5-0.81229924j]
Fast Fourier transform of x:
[10. +0.j -2.5+3.4409548j -2.5+0.81229924j -2.5-0.81229924j
-2.5-3.4409548j ]
In this case, the -norm of the complex valued —the imaginary unit is defined as —is given by
III-C. Optimization
For any partially observed time series in the form of a vector with the observed index set , solving the circulant matrix nuclear norm minimization allows one to reconstruct missing values in time series. The problem is formulated as follows,
where in the constraint represents the tolerance of errors between the reconstructed time series and the partially observed time series .
Example 9. Given vector and observed index set , the orthogonal projection supported on is
On the complement of , we have
In this case, one can rewrite the constraint as a penalty term (weighted by ) in the objective function:
Since the variable is associated with a circulant matrix nuclear norm and a penalty term, the first impluse is using variable separated to convert the problem into the following one:
III-D. Solution Algorithm
The augmented Lagrangian function is
where the dual variable is an estimate of the Lagrange multiplier, and is a penalty parameter that controls the convergence rate.
Thus, the variables , , and the dual variable can be updated iteratively as follows,
where the dual variable takes a standard update in the ADMM. In what follows, we give details about how to get the closed-form solution to the variables and .
III-E. -Subproblem
Solving the variable involves discrete Fourier transform, convolution theorem, and -norm minimization in complex space. The optimization problem with respect to the variable is given by
Using discrete Fourier transform, the optimization problem can be converted into the following one in complex space (Chen et al., 2024):
where the complex-valued variables refer to in the frequency domain. The closed-form solution to the complex-valued variable is
with . For reference, this closed-form solution can be found in Lemma 3.3 in Yang et al., (2009), see Eq. (3.18) and (3.19) for discussing real-valued variables. In the sparsity-induced norm optimization of machine learning, this closed-form solution is also called as proximal operator or shrinkage operator.
import numpy as np
def update_x(z, w, lmbda):
T = z.shape[0]
h = np.fft.fft(z - w / lmbda)
temp = 1 - T / (lmbda * np.abs(h))
temp[temp <= 0] = 0
return np.fft.ifft(h * temp).real
Shrinkage Operator. For any vectors , the closed-form solution to the optimization problem
can be expressed as
or
Figure 6. Illustration of the shrinkage operator for solving the -norm minimization problem.
III-F. -Subproblem
In terms of the variable , the partial derivative of the augmented Lagrangian function with respect to is given by
while
As a result, letting the partial derivative of the augmented Lagrangian function with respect to be a zero vector, the least squares solution is given by
where the partial derivative of the augmented Lagrangian function with respect to the variable is the combination of variables and .
import numpy as np
def update_z(y_train, pos_train, x, w, lmbda, gamma):
z = x + w / lmbda
z[pos_train] = (gamma * y_train + lmbda * z[pos_train]) / (gamma + lmbda)
return z
III-G. Time Series Imputation
As shown in Figure 7, we randomly remove 95% observations as missing values, and we only have 14 volume observations (i.e., 14 blue dots) for the reconstruction. The circulant matrix nuclear norm minimization can capture the global trends from partial observations.
Figure 7. Univariate time series imputation on the freeway traffic volume time series. The blue and red curves correspond to the ground truth time series and reconstructed time series achieved by the circulant matrix nuclear norm minimization.
Please reproduce the experiments by following the Jupyter Notebook, which is available at the LCR repository on GitHub. For the supplementary material, please check out Appendix I(A).
IV. Laplacian Convolutional Representation
IV-A. Representing Circulant Graphs by Laplacian Kernels, Instead of Laplacian Matrices?
Laplacian convolutional representation model proposed by Chen et al., (2024) integrates local trends of time series into the global trend modeling via the use of circulant matrix nuclear norm minimization. As shown in Figure 1(b), the transition of time series data points can be naively modeled by smoothing regularization. We started by introducing Laplacian matrix to represent the prescribed relationship among time series data points.
Figure 8. Undirected and circulant graphs on the data points with certain degrees. The degrees of the left and right graphs are 2 and 4, respectively.
By definition, the Laplacian matrix of a circulant graph (see e.g., Figure 8) is a circulant matrix (refer to the formal definition of circulant matrix in Section II-C). In Figure 8, the Laplacian matrix corresponding to the left graph is expressed as
while the Laplacian matrix for the right one is
In both two Laplacian matrices, the diagonal entries are the degrees of circulant graphs. The vector encodes the structural information of these graphs, capturing their underlying circulant structure. In this work, we define the first column of Laplacian matrices as the Laplacian kernels (Chen et al., 2024):
and
respectively.
Laplacian Kernel. Given any time series , suppose be the kernel size of an undirected and circulant graph, then the Laplacian kernel is defined as
which is the first column of the Laplacian matrix and the degree matrix is diagonalized with entries .
IV-B. Reformulating Temporal Regularization with Circular Convolution
In machine learning, one can write the temporal regularization on the time series , e.g.,
Since the Laplacian matrix is a circulant matrix, the matrix-vector multiplication can be reformulated as a circular convolution between Laplacian kernel and the vector , i.e., . For instance, given a Laplacian kernel , we have
As can be seen, this Laplacian kernel can build local correlations for the vector . Thus, the purpose of introducing Laplacian kernels on time series is local trend modeling with temporal regularization such that
As a matter of fact, we have several motivations and reasons for reformulating temporal regularization with circular convolution. Among them, there are some important properties inspired us a lot. In particular, one of the most useful properties of circular convolution is its relationship with discrete Fourier transform, i.e.,
where denotes the discrete Fourier transform. The symbol is the Hadamard product or element-wise product.
Example 10. Given vectors and , the circular convolution is
and the regularization is
How to compute the regularization with fast Fourier transform? 1) Compute the fast Fourier transform on and as and , respectively; 2) Compute the regularization as .
import numpy as np
ell = np.array([0, 1, 2, 3, 4])
x = np.array([2, -1, 0, 0, -1])
f_ell = np.fft.fft(ell)
f_x = np.fft.fft(x)
print('Regularization R(x):')
print(np.linalg.norm(f_ell * f_x, 2) ** 2 / (2 * len(x)))
IV-C. Global + Local Time Series Trend Modeling
V. Learning Interpretable Convolutional Kernels
V-A. Convolutional Kernels
On the univariate time series in the form of a vector, the circular convolution between time series and convolutional kernel can be constructed with certain purposes. The expression is formally defined by . When using this kernel to characterize the temporal correlations of time series, one simple yet interesting idea is making the loss of circular convolution, namely, , as small as possible. However, the optimization problem
is ill-posed because the optimal solution is all entries of kernel being zeros. To address this challenge, we assume that the first entry of is one and the following entries are in which is a non-negative vector. Thus, the optimization problem becomes
where the constraint is of great significance for minimizing the objective function.
V-B. Sparse Linear Regression
Recall that the circular convolution can be converted into a linear transformation with circulant matrix. Considering the kernel , the loss of circular convolution is equivalent to
where the auxiliary matrix is comprised of the last columns of the circulant matrix , namely,
where the matrix is separated into the vector (i.e., first column of ) and the matrix (i.e., last columns of ), see Figure 9 and Figure 10 for illustrations.
Figure 9. Illustration of circular convolution as the linear transformation with a circulant matrix.
As can be seen, one of the most intriguing properties is the circular convolution can be converted into the expression (see Figure 10), which takes the form of a linear regression with the data pair .
Figure 10. Illustration of circular convolution with a structured vector .
Example 11. Given vectors and , the circular matrix is
and the auxiliary matrix is
Thus, we can compute the circular convolution by
To reinforce the model interpretability, we impose a sparsity constraint on the vector . This is extremely important for capturing local and nonlocal correlations of time series. The process of learning interpretable convolutional kernels can be formulated as follows,
where is the sparsity level, i.e., the number of nonzero values.
Figure 11. Illustration of learning -sparse vector from the time series with the constructed formula as .
-Norm. For any vector , the -norm is defined as follows,
where and denote the cardinality and the support set of , respectively. For instance, given vector , the support set is and the number of nonzero entries is .
V-C. Subspace Pursuit
V-D. Mixed-Integer Programming
There are often some special-purpose algorithms for solving constrained linear regression problems, see constrained least squares. Some examples of constraints include: 1) Non-negative least squares in which the vector must satisfy the vector inequality (each entry must be either positive or zero); 2) Box-constrained least squares in which the vector must satisfy the vector inequalities .
In fact, the linear regression with sparsity constraints in the form of -norm can be easily converted into a mixed-integer programming problem. Thus, the problem illustrated in Figure 11 is equivalent to
where is the binary decision variable. The weight (possibly a great value for most cases) can control the upper bound of the vector . If , then the value of is ranging between and . Otherwise, the value of is because both lower and upper bounds are . In fact, the last constraint can also be written as follows,
Example 12. Given vector , the auxiliary matrix (i.e., the last 4 columns of circulant matrix ) is
The question is how to learn the vector with sparsity level from the following optimization problem:
In this case, the empirical solution of the decision variable is . To reproduce it, the solution algorithm of mixed-integer programming is given by
import numpy as np
import cvxpy as cp
x = np.array([0, 1, 2, 3, 4])
A = circ_mat(x)[:, 1 :]
tau = 2 # sparsity level
d = A.shape[1]
# Variables
w = cp.Variable(d, nonneg=True)
beta = cp.Variable(d, boolean=True)
# Constraints
constraints = [cp.sum(beta) <= tau, w <= beta, w >= 0]
# Objective
objective = cp.Minimize(cp.sum_squares(x - A @ w))
# Problem
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.GUROBI) # Ensure to use a solver that supports MIP
# Solution
print("Optimal beta:", w.value)
print("Active indices:", np.nonzero(beta.value > 0.5)[0])
Please install the optimization packages in-ahead, e.g., pip install gurobipy
. One can also check out the optimization solvers in cvxpy
:
import cvxpy as cp
print(cp.installed_solvers())
VI. Insight into Ridesharing Trip Time Series
The City of Chicago’s open data portal provides a large amount of human mobility data, including taxi trips and TNP ridesharing trips. Figure 12 shows the realistic time series data of ridesharing trips with a strong weekly periodicity, allowing one to examine the usage of interpretable convolutional kernels.
Figure 12. Hourly time series of aggregated ridesharing trip counts in the City of Chicago during the first two weeks (i.e., hours in total) since April 1, 2024. The time series exhibits weekly periodicity, referring to the regularity of human mobility.
Take the time series of Figure 12 as an example, the mixed-integer programming solver in CPLEX produces the temporal kernel with
where the sparsity level is set as in the constraint , or equivalently on the binary decision variable . This result basically demonstrates local correlations such as and , as well as the weekly seasonality at . Below is the Python implementation of the mixed-integer programming solver with CPLEX.
import numpy as np
from docplex.mp.model import Model
def kernel_mip(data, tau):
model = Model(name = 'Sparse Autoregression')
T = data.shape[0]
w = [model.continuous_var(lb = 0, name = f'w_{k}') for k in range(T - 1)]
beta = [model.binary_var(name = f'beta_{k}') for k in range(T - 1)]
error = [data[t] - model.sum(w[k] * data[t - k - 1] for k in range(T - 1)) for t in range(T)]
model.minimize(model.sum(r ** 2 for r in error))
model.add_constraint(model.sum(beta[k] for k in range(T - 1)) <= tau)
for k in range(T - 1):
model.add_constraint(w[k] <= beta[k])
solution = model.solve()
if solution:
w_coef = np.array(solution.get_values(w))
error = 0
for t in range(T):
a = data[t]
for k in range(T - 1):
a -= w_coef[k] * data[t - k - 1]
error += a ** 2
print('Objective function: {}'.format(error))
ind = np.where(w_coef > 0)[0].tolist()
print('Support set: ', ind)
print('Coefficients w: ', w_coef[ind])
print('Cardinality of support set: ', len(ind))
return w_coef, ind
else:
print('No solution found.')
return None
import numpy as np
import time
tensor = np.load('Chicago_rideshare_mob_tensor_24.npz')['arr_0'][:, :, : 14 * 24]
data = np.sum(np.sum(tensor, axis = 0), axis = 0)
tau = 3
start = time.time()
w, ind = kernel_mip(data, tau)
end = time.time()
print('Running time (s):', end - start)
For the entire implementation, please check out the Jupyter Notebook on GitHub. The processed time series dataset is available at the folder of Chicago-data.
VII. Concluding Remarks
From a time series analysis perspective, our modeling ideas can guide future research by using the properties of circular convolution and connecting with various signals. Because circular convolution is a core component in many machine learning tasks, this post could provide an example for how to leverage the circular convolution, discrete Fourier transform, and linear regression. While we focused on the time series imputation and the convolutional kernel learning problems, we hope that our modeling ideas will inspire others in the fields machine learning and optimization.
(Posted by Xinyu Chen on April 17, 2024.)