Analysis of Time Series Water Level Data Prediction Using Deep Learning Method at the Water Gate of DKI Jakarta Water Resources Office

Prediction of water level Deep


Introduction
The Jakarta Water Resources Agency currently does not have a system that can predict future water levels by referring to past and current water level data.In terms of monitoring, for now there is an application called JAKI in which there is a Jak Pantau feature that will make it easier to get all flood information.Flood events occur due to flood water runoff from rivers because flood discharge cannot be glimpsed by river channels or flood discharge is greater than the existing river drainage capacity (Fredrik et al., 2021).DKI Jakarta itself has 8 main water gates, namely Manggarai, Karet, Marina Ancol, Pulo Gadung, Istiqlal, Jembatan Merah, Flushing Ancol and Hek (Jakarta, 2020).The sluice gate serves to control water, so as to prevent flooding in fast and high flow.The process of opening or closing the floodgates is based on the level of rainwater level and water discharge rate.The flowing water will be directed by the sluice gate to the sea or river depending on the size and size of the flowing water discharge.
e-ISSN: 2723-6692 🕮 p-ISSN: 2723-6595 -6595 From this background, the problem in this study is related to flood control by predicting water level data in the form of time series.The purpose of this study is to analyze water level prediction with deep learning methods using Python programming language using 2 deep learning methods , namely the RNN (Recurrent Neural Network) and LSTM (Long Short Term Memory) methods.Where the data used is the water level dataset in January 2022.
The advantage of the LSTM method compared to the RNN method is that LSTM can remember data that is time series or data with long-term dependency information and LSTM can store previous information using cells contained in LSTM (Lattifia, Wira Buana, &;Rusjayanthi, 2022).The RNN method has a unique property, which is that it can store data in a network structure because it has at least one feedback loop.The advantage of the RNN model in forecasting algorithms is the ability to predict nonlinear time series data (Journal &;Mathematics, 2023).So, the two methods are compared with their respective prediction values by finding the lowest error value which is used to be a method in predicting water level values by the DKI Jakarta Water Resources Agency in the future.The data used water level data in January 2022, with 3 dataset division compositions g.

Literature Review
Some deep learning-based models that have high accuracy are when used for face detection, image processing, recommendation systems, natural language processing, and time series prediction (Sanjaya &;Budi, 2020).In deep learning, methods that are often used in previous research in processing time series data are Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM).

Deep Learning
Deep Learning is one part of various machine learning methods that use Artificial Neural Networks (ANN).The advantages of deep learning compared to traditional machine learning methods are more complex feature extraction, less modeling and having more accurate predictions even when paid for with higher computation.Deep learning can be technically defined as machine learning that has more than one hidden layer.Deep Learning illustration can be seen in Figure 6 there are 4 layers and each layer has a different number of nodes (Rizki, Basuki, & Azhar, 2020) Figure 2. Deep learning illustration (source: Journal of Repositors, 2020)

Recurrent Neural Network (RNN)
RNN is form of Artificial Neural Network (ANN) architecture specifically designed to process continuous or sequential data.RNN is usually used to solve historical data problems or time series, one example is weather forecasting.Here's the RNN architectural drawing:

Long Short Term Memory (LSTM)
LSTM is the architecture of RNN.LSTM can be used to process sequential data so that it can be used for time series data prediction.LSTM can detect data to be stored and data that is not used to be trimmed, because LSTM has 4 layers of neurons commonly called gates to organize memory on each neuron.The advantage of the LSTM method compared to the RNN method is that LSTM can remember data that is time series or data with long-term dependency information and LSTM can store previous information using cells contained in LSTM.There are 3 types of gates on LSTM, namely forget gates, input gates, and output gates.

Python
Python is a programming language that can execute a number of multipurpose instructions directly with object-oriented and uses mass semantics to provide a level of readability of code or syntax.Most define Python as a language with a high level of capability, combining very clear capabilities and code syntax and complemented by the functionality of a very large and comprehensive base library.Although this python is classified as a high-level programming language, it is still designed in such a way that it is easy to understand and learn.Python can also run on many platforms such as Mac, Linux and Windows etc. Python is open source so there are still many people who contribute to developing (Pane &; Yogi Aditya Saputra, 2020).Python was chosen as a programming language in this study, because this language has many libraries that make it easy to create programs that involve a lot of vector and matrix manipulation, as well as visual displays of various attractive and easy-to-read graphics (scikit learn matplotlib library, and also heatmaps (seaborn library) to show correlations in the form of color and numerical maps (Hastomo, Karno, Kalbuana, Nisfiani, & ETP, 2021).

Collaboratory
Collaboratory or collab for short, is a product of Google research.Colab allows anyone to write and execute arbitrary python code through a browser and is perfect for machine learning data analysis and education.In addition to being easy to use, colab is quite flexible in its configuration and requires no setup.(Naik &;Girish, 2021) Some of the advantages of Google Colaboratory are : Support for python 2.7 and python 3.6, Free gpu acceleration, all major python libraries like TensorFlow, Scikit-Learn, Matplotlib, Keras among many others are pre-installed and ready to import, Support bash commands, Google colab notebooks are stored back on the drive.

Research Needs
In the process of data analysis, things are needed that support system testing or data processing, such as hardware and software needs.

Hardware Requirements
For hardware using 2 computers / laptops that have the following specifications: Memory or RAM 8GB, Processor Intel Core™ i7-855OU CPU @ 1.80 GHz 1.99 Ghz and System Type 64 bit, Memory 16GB Intel®® Xeon® CPU E5-2609 @ 1.90 GHz

Software Requirements
For software include: Google Collaboratory, Python 3 programming language, Windows 10, Windows Server 2012 R2.The water level data used for the study is water level data from the DKI Jakarta Water Resources Office in the northern region (marina water gate), which is data from January 2022 in the form of an Excel file totaling 744 records.

Preprocessing Data
Data preprocessing is prepared by going through a process to handle missing or empty data in various ways such as finding the average of an attribute for the same class.After that the data is normalized using MinMaxScaler with range (0, 1)

Split Data
In the data split process using experiment 3 split data.The following is the division process, among others: a -Testing data: 24 x 7 = 168 hours/7 days or (22.6%) -Training data: Total data 744 -168 = 576 hours / 7 days or (77.4%)

Process Training
Training data is used to find the best parameters from the Long Short Term Memory (LSTM) and Recurrent Neural Network (RNN) methods.The results of the best parameters will be tested on testing data.The training process will be carried out, where the model will be trained using training data.

Deep Learning Model Parameter Testing
Deep learning model testing is carried out using several experiments with different parameters to get the best values, namely n-input, split data, batch size, learning rate, dropout and epoch.Information:

Random Batch Size Testing
From the random dropout testing table above, it can be explained that the LSTM method has a low average error value compared to RNN, namely RMSE (39.8),MAPE (0.40) and MAE (5.52) and a runtime time of 170 seconds, longer than the faster RNN of 89 seconds.

Results of LSTM Model and RNN Model Prediction Analysis
The results of previous test analysis using several hyperparameters with several different input values such as, n-input, split data, batch_size, learning_rate, dropout can be drawn conclusions including : I. V.
In parameter testing using the dropout table, it can be seen that the error value in the LSTM model has an average value of RMSE (39.8),MAPE (0.40), MAE (5.52) with a runtime of 170 seconds, while for the RNN model it has an average value of RMSE (46.67),MAPE (0.44), MAE (6.04) with a runtime of 89 seconds.From the conclusion of the analysis above, there are still differences in error values generated using learning_rate and dropouts, the errors generated are still quite large compared to without using learning_rate and dropouts.So, researchers will conduct further testing using the best model for predictive testing on deep learning methods, namely (RNN and LSTM), using (n-input = 4, epoch = 10, batch size = 1, dense = 1, layer = 1, neuron = 100), and split data (72 hours), and using different epoch values, among others, epoch 10, 50, 100 as comparison material.Here are the results: From the table of prediction analysis results using the LSTM method using different epoch counts, it can be explained that the greater the number of epochs, the lower the average error value produced and the runtime time is relatively longer.From the table of prediction analysis results using the RNN method using different epochs, it can be explained that the greater the number of epochs, the higher the average error value produced and the runtime is relatively faster.

Best Deep Learning Model Evaluation Results
From the tests that have been done above, it shows that using the LSTM model provides better test results because it can produce low error values from several parameters that have been tested before but have a long runtime in processing better prediction results, compared to using the RNN model provides test results with error values which is higher, but in a fairly fast runtime.In the operation of the floodgates there are several categories or statuses where the water level as information that can be a reference in making decisions.

Comparison of actual data with TMA prediction results
Table 10 below explains the results of the comparison between actual data in February 2022 and prediction data from January 2022 as many as 72 data using the LSTM method (n-input = 4, epoch = 100, batch size = 1, dense = 1, layer = 1, neurons = 100).

Conclusion
The following is a discussion related to water level prediction using the LSTM and RNN methods: 1.The application successfully processed the prediction of water level at the Marina sluice gate of DKI North Jakarta, using the water level dataset (Tma).2. The composition of train data and test data with the most optimal results is with a train data composition of 90.33% and test data of 9.67%.This is because the composition of 90.33% train data and 9.67% test data has the lowest error rate, with an average value of RMSE (17.65),MAPE (0.29), MAE (3.37) and with an average runtime of 39 minutes.3. The best parameters used in testing the LSTM method and RNN method are with data composition criteria (90.33%: 9.67%), n-input (4), dense (1), batch size (1), epoch (100), layer (1), neuron (100) provide the following evaluation: 4.After conducting analysis, implementation and test results of deep learning implementation using LSTM and RNN architecture for water level prediction at the DKI Jakata Marina sluice, it shows that the prediction results obtained using the LSTM method are quite good because they have the smallest error value.So it can be concluded that deep learning research with Long Short Term Memory (LSTM) architecture can work quite optimally.5. Predictive testing using different epochs of 10, 50, and 100 can give different error values.It can be concluded that the larger the epoch used, the lower the error value produced and the longer the runtime e-ISSN: 2723-6692  p-ISSN: 2723-6595

Figure 1 .
Figure 1.Deep Learning Flowchart Water Level Prediction

Table 1 .
Random Input Test Results

Table 2
From the table of random split data test results above, it can be explained that the composition of train data (90.33%) and test data (9.67%)gives low error values , including RMSE (16.23),MAE (3.76) and MAPE (0.30).

Table 3 .
Result of Random Batch size Testing average error value compared to RNN, namely RMSE (25.55),MAPE (0.35) and MAE (4.55) and a process time (runtime) of 160 seconds, longer than the faster RNN of 88 seconds.
Information:From the random learning rate testing table above, it can be explained that the LSTM method has a low

Table 6 .
LSTM Method Prediction Analysis Results

Table 7 .
Results of RNN Method Prediction Analysis

Table 8 .
Evaluation comparison of LSTM Method and RNN Method

Table 9 .
Prediction results for the next 3 days of data (72 data)

Table 10 .
Comparison results of actual and predicted TMA Figure 5. Water level prediction (tma) graph for February 2022

Table 11 .
Conclusion Comparison of evaluation of LSTM model and RNN model