WaveNet vocoder for prediction of time series with extreme events

Cover Page


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription or Fee Access

Abstract

Extreme events are typically defined as rare or unpredictable events that deviate significantly from typical behavior. Despite this, objective criteria for extreme events have yet to be established. Rareness may be characterized by certain scales or spatial and temporal boundaries, while intensity is an indication of an event’s potential to cause a significant change. One of the most prominent occurrences of extreme events in both neuroscience and medicine is in the case of epileptic seizures [1].

In speech synthesis, vocoder networks like WaveNet [2] generate audio. The model is a multi-layer convolutional neural network that functions as a causal filter and doesn’t predict the future. Due to this quality, the vocoder may have potential in time series prediction. Audio time series can be regarded as a dynamic system characterized by unpredictable switching regimes. For instance, transitioning from one letter to another can result in significant deviations in amplitude, similar to extreme events. This network receives r previous input counts known as a receptive field, and uses them to predict the next sample. The network is tree-like in structure, with exponentially increasing distances between subsequent layers of inputs. This is a necessary feature since the receptive field r is usually quite large, on the order of one or two thousand. Without this exponential increase in distance, the number of layers would depend linearly on r. Recurrent neural networks pose a challenge in optimizing the loss function when predicting time series sequences, as they tend to predict samples very similar to the previous one, causing the network to converge towards the mode. However, in a convolutional network, the output to the model will be longer due to the large receptive field. In the case of sound analysis, for instance, multiple oscillations occur within a given timeframe and the network does not elevate any specific sample.

The study used artificial data generated from two coupled Hidmarsh–Rose neurons with chemical synaptic couplings. The observed variable was determined by the biological significance of the system, specifically the total membrane potential. The results exhibited extreme events across various coupling parameter values. Based on prior research [3], a numerical standard was selected for the events. The WaveNet vocoder model exhibits a 91% accuracy rate and 82% recall rate when forecasting extreme events of the same width as the prediction. It is noteworthy that recall is crucial in the forecast of extreme events since it identifies instances where the model predicted falsely that an extreme event would not occur.

Full Text

Extreme events are typically defined as rare or unpredictable events that deviate significantly from typical behavior. Despite this, objective criteria for extreme events have yet to be established. Rareness may be characterized by certain scales or spatial and temporal boundaries, while intensity is an indication of an event’s potential to cause a significant change. One of the most prominent occurrences of extreme events in both neuroscience and medicine is in the case of epileptic seizures [1].

In speech synthesis, vocoder networks like WaveNet [2] generate audio. The model is a multi-layer convolutional neural network that functions as a causal filter and doesn’t predict the future. Due to this quality, the vocoder may have potential in time series prediction. Audio time series can be regarded as a dynamic system characterized by unpredictable switching regimes. For instance, transitioning from one letter to another can result in significant deviations in amplitude, similar to extreme events. This network receives r previous input counts known as a receptive field, and uses them to predict the next sample. The network is tree-like in structure, with exponentially increasing distances between subsequent layers of inputs. This is a necessary feature since the receptive field r is usually quite large, on the order of one or two thousand. Without this exponential increase in distance, the number of layers would depend linearly on r. Recurrent neural networks pose a challenge in optimizing the loss function when predicting time series sequences, as they tend to predict samples very similar to the previous one, causing the network to converge towards the mode. However, in a convolutional network, the output to the model will be longer due to the large receptive field. In the case of sound analysis, for instance, multiple oscillations occur within a given timeframe and the network does not elevate any specific sample.

The study used artificial data generated from two coupled Hidmarsh–Rose neurons with chemical synaptic couplings. The observed variable was determined by the biological significance of the system, specifically the total membrane potential. The results exhibited extreme events across various coupling parameter values. Based on prior research [3], a numerical standard was selected for the events. The WaveNet vocoder model exhibits a 91% accuracy rate and 82% recall rate when forecasting extreme events of the same width as the prediction. It is noteworthy that recall is crucial in the forecast of extreme events since it identifies instances where the model predicted falsely that an extreme event would not occur.

ADDITIONAL INFORMATION

Authors’ contribution. All authors made a substantial contribution to the conception of the work, acquisition, analysis, interpretation of data for the work, drafting and revising the work, final approval of the version to be published and agree to be accountable for all aspects of the work.

Funding sources. This study was supported by the Russian Science Foundation grant No. 19-72-10128.

Competing interests. The authors declare that they have no competing interests.

×

About the authors

N. V. Gromov

National Research Lobachevsky State University of Nizhny Novgorod

Author for correspondence.
Email: gromov@itmm.unn.ru
Russian Federation, Nizhny Novgorod

T. A. Levanova

National Research Lobachevsky State University of Nizhny Novgorod

Email: gromov@itmm.unn.ru
Russian Federation, Nizhny Novgorod

References

  1. Engel JrJ, Pedley TA. Generalized convulsive seizures. In: Tassinar CA, Michelucci R, Shigematsu H, et al, editors. Epilepsy: a comprehensive text-book. 1997.
  2. Van den Oord A, Dieleman S, Zen H, et al. Wavenet: a generative model for raw audio. arXiv. 2016;1609:03499. doi: 10.48550/arXiv.1609.03499
  3. Gromov N, Gubina E, Levanova T. Loss functions in the prediction of extreme events and chaotic dynamics using machine learning approach. In: Proceedings of the Fourth International Conference Neurotechnologies and Neurointerfaces (CNN); 2022 Sept 14–16; Kaliningrad, Russian Federation. Kaliningrad; 2022. P. 46–50. doi: 10.1109/CNN56452.2022.9912515

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2023 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: 

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies