What Is Long Short-Term Memory (LSTM)?

Have you ever thought about how a video you haven’t seen but are interested in suddenly appears on your YouTube recommendations? Or how Google seems to know what you’re searching for by typing the first few letters? Machine learning and neural networks have made them possible.

Machine learning is a branch of artificial intelligence concerned with creating algorithms that can learn through experience. It was born from the ‘tediousness of human learning,’ as economist and cognitive psychologist Herbert Simon wrote. Provided with the right algorithm, a machine could learn as much as a human would’ve taken years to do so.

One of the most prevalent machine learning methods involves neural networks, which is also known today as deep learning. It exposes an algorithm to vast amounts of data, instructed to identify details or patterns shared among the data. In creating these networks, the human brain serves as the template–layers of nodes in place of nerve endings.

Among the ways of making neural networks include long short-term memory (LSTM). Introduced in the mid-1990s, LSTM is the conceived solution to the limitations of recurrent neural networks. Before delving deeper, an explanation of the types of neural networks in use is in order.

Feed-forward vs. Recurrent

Current machine learning techniques mostly involve feed-forward neural networks, where the data goes from input to output. It’s as simple as it appears, which is one of the reasons for its widespread application. However, the more data fed into the network, the results, most likely, will be inaccurate. 

Data scientists cite this tendency to overfitting, where the network refuses to account for additional data. Instead, it works with the data already on hand.

Recurrent neural networks add an extra step called a feedback loop, feeding data back to the input before refining it further. The results are more accurate, but two problems make this neural network almost unusable–vanishing and exploding gradients.

These problems occur while training neural networks. Adding more layers may allow for accurate results, but it risks diminishing gradients (vanishing gradients). Similarly, the gradients could grow too large (exploding gradients). Either way, the outcome would be a network that stopped learning.

LSTM and Cell Memory

LSTM, a type of recurrent neural network, adds a memory unit or cell to the algorithm. It controls the flow of memory data into the process, either letting everything or nothing in. With this kind of data regulated, the network will be more protected from vanishing and exploding gradients.

Cnvrg published an article stating that the memory unit enables LSTM neural networks to process sequential data. The network would learn to arrange strings of data in the right order (i.e., ‘My name is John’ can’t be rearranged as ‘Name is my John’). Some applications of sequential data include analyzing periodic stock market trends and DNA sequences.

In summary, an LSTM model works fourfold:

  1. Forget irrelevant memory data
  2. Store new data after computation
  3. Repeat steps 1 and 2 for a time
  4. Release as output once it’s ready

Boundless Potential

Data science experts see potential in LSTM for broader applications. Below are several examples:

  • In 2019, researchers from China tested an LSTM framework for predicting fog in the short term. They discovered that LSTM achieved better short-term predictions compared to three other neural networks.
  • Researchers in Czechia (Czech Republic) in 2013 used LSTM in language modeling. The experiment involved determining the probability of words used in a sentence; in this case, analyzing data from a series of local phone calls.
  • In South Korea, disaster prevention experts studied using LSTM to predict flooding risk in Vietnam’s Da River Basin. The model they created managed to achieve an average of 90% accuracy in forecasting based on the river’s one/two/three-day flow rate.
  • A more recent example involves Chinese researchers determining stock market trends with an LSTM model. Compared to three other neural network models, LSTM achieved a higher coefficient and lower mean square error.
  • A Danish researcher managed to promote awareness about how vulnerable smartwatches and other wearable devices are to hackers. He used an LSTM algorithm to retrieve various PIN codes by analyzing touch and key logs in the devices.


Given the growing number of published experiments that utilized LSTM, it should be no surprise if future algorithms will employ this neural network. Machine learning as a whole will continue to evolve in accuracy and efficiency, finding work in countless industries from business to medicine. Even with the next generation of neural networks currently being studied, LSTM and its precursors will remain relevant for as long as the world needs machine learning.