Solving CAPTCHA challenges through Google’s voice-to-text conversion mechanism

A couple of years ago a group of information security specialists from the University of Maryland published research to show how online voice to text conversion services could be exploited to solve reCAPTCHA v2 audio challenges with a high degree of success.

The Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is a challenge-response test controlled by a computer system used to determine whether the user of such a system is a person or an automatic program. This is one of the most efficient ways to prevent bots from being used on websites in general.

Although Google applied some changes to prevent these attacks, over time new versions of the hack have appeared capable of successfully dodging this popular security mechanism, even managing to develop a proof of concept (PoC) of this scenario.

The code in this PoC became obsolete over time. However, researcher Nikolai Tschacher managed to modify this code to make it useful with the latest version of reCAPTCHA v2 using Google’s own speech and text API. Tschacher gained more than 95% accuracy in his attack.

In 2018 Google released reCAPTCHA v3 in order to improve the user experience, although the researcher mentions that this new version is still backed by reCAPTCHA v2. The expert published a PoC, in addition to explaining the changes made by Google. Various requests for information have been sent to Google, but the company has not mentioned anything about it.

The automatic resolution of CAPTCHA challenges has become a very popular area of research, even free browser extensions have been developed that help users respond to these tests at the push of a button.