This algorithm has been tested on dozens of websites
Information security experts developed a machine learning algorithm able to break text-based CAPTCHA controls in a very easy and more accurate way than any previously developed method, as reported by specialists in cybersecurity from the International Institute of Cyber Security.
This new algorithm, developed by a team of specialists from the UK and China, is based on the implementation of a Generative Antagonistic Network (GAN), a special class of artificial intelligence algorithms useful in scenarios where the algorithm does not have access to large amounts of data for learning.
According to experts in cybersecurity, a classification machine learning algorithm usually requires millions of data points to train before it can perform a task with the required accuracy degree.
On the other hand, a GAN algorithm has the advantage of being able to work with a much smaller amount of data to start learning thanks to a GAN using a “generative” complement to produce similar data. The generated data points are then fed to a “solver” algorithm that tries to guess the output.
The experts who developed this algorithm applied the same concept to break the CAPTCHA text, which had only been tested with machine learning algorithms that required large amounts of initial data points.
The researchers mentioned that, in a real environment, an attacker would not be able to generate millions of CAPTCHA in real time without being detected or banned from the website. Therefore, for the investigation only 500 text based CAPTCHA present in 32 of the 50 most visited sites according to Amazon’s Alexa were used.
The data list to train the algorithm includes text CAPTCHA from sites like Wikipedia, Microsoft, EBay and Google.
After compiling and training the “solver” to generate 200k “artificial” CAPTCHA, cybersecurity experts tested their algorithms against multiple text CAPTCHA systems used in the network, which had previously also tested other algorithms.
Researchers say that their method managed to solve text CAPTCHA with 100% accuracy when tested in sites such as Megaupload, Blizzard or Authorize.NET. Experts added that the method also proved to be highly accurate in sites like Amazon, PayPal, Yahoo or Slashdot.
In addition to improving accuracy, researchers report that the GAN algorithm’s solver component they developed is also more efficient and inexpensive than any other method to overcome the CAPTCHA. “The algorithm can solve a CAPTCHA in 0.05 seconds using a desktop PC,” researchers say.