Would it be possible to implement Time Series K-Fold Cross-Validation?
With a Time Series Training/Validation Interval which goes from 2017-01-01 to 2019-12-31:
With a regularly spaced interval of 1 [month] step.
For example, a regular K-Fold Cross-Validation is trained between 2017-01-01 and 2017-02-01, and error within that Time Frame is minimised (Eg: RMSE). However, in order to evaluate by ourselves the error, we use out-of-sample validation data, which goes from 2017-02-01 to 2017-03-01.
The process is repeated iteratively:
A regular Cross-Validation is trained between 2017-01-01 and 2017-03-01, and error within that Time Frame is minimised (Eg: RMSE). However, in order to evaluate by ourselves the error, we use out-of-sample validation data, which goes from 2017-03-01 to 2017-04-01.
And so successively.
Some scikit-learn context:
How would be used?:
If you are generating a point estimate you could calculate t+1, and use t+1 target prediction mean to update the features, and to calculate t+1 back again, and so successively, for a reasonable time window. This would require calculated variables, which “Update” with Target Mean Predictions, by considering it as already known data.
Another use case (Which requires Time Series Cross-Validation + Group-Wise Cross-Validation):
We would have 2 Time axis:
Our Date Axis: 2019-01-01, 2019-01-02, 2019-01-03, …, 2019-12-31.
Our Snapshot Date Axis (Linked to a given Date Axis, that is, grouped by Date Axis): That is, how on a given Date, we see our sales till snapshot, for our Target Date. We could call this variable “Number of Days Out” (NDO), and we aim to predict at NDO = 0 (NDO = Date - Snapshot).
Our Target would be Sales at NDO = 0 for each Date.
This is a regular scenario, as many enterprise database tables are many times versioned-tables.
In addition, some further regression forecast that could be added to our H2O Tree Model, in order to benefit from its regularisation, and use it as an ensemble for a large number of forecasts:
We could add additional regressors as features, in order to benefit from our H2O tree model regularisation: We have:
facebook’s prophet: https://facebook.github.io/prophet/
google’s bsts: https://cran.r-project.org/web/packages/bsts/index.html
forecast’s / fable’s auto.arima() or ets(): https://cran.r-project.org/web/packages/forecast/index.html
amazon’s gluonts deepar: https://github.com/awslabs/gluon-ts
Currently I apply previous strategies manually for my Time Series Projects, but is always nice to see those automated, so that others can benefit.