How to spy on phone calls via sensors like accelerometer & gyroscope using new attack technique: EarSpy

Eavesdropping on smartphone users is always a recognized risk, and it should be a serious worry for users of these devices. Eavesdropping on a conversation is easiest for an opponent to do via the use of call recording. On the other hand, mobile phone operating systems are putting limits on the ability of third-party applications to record phone conversations by exploiting microphones prevents the majority of attacks that depend on having access to the microphone.

An eavesdropping assault designed for Android smartphones has been created by a group of researchers. This attack can, to varying degrees, determine the gender and identity of the caller, and it can even decipher private conversation.

The objective of the side-channel attack, which has been given the name EarSpy and is designed to capture motion sensor data readings induced by reverberations from ear speakers in mobile devices, is to investigate new avenues open to eavesdropping.
By using a side-channel attack, attackers may be able to circumvent security measures by gleaning information about speech from motion sensors that have zero-permission access. It is a big privacy risk that people are ignorant of, but academics have been looking into it extensively over the last decade. Researchers have found that motion sensors, keystrokes on touchscreens, stylus pen writing, and the use of external devices all have the potential to be used for eavesdropping. In addition, there have been reports in the past of eavesdropping occurring using light sensors and gyroscopes.

The motion sensors that are incorporated into smartphones are the ones that have the most well-known reputation for being susceptible to eavesdropping. Adversaries employ motion sensors to gather audio (e.g., voice  dialogue), inputs via touch screens, and even interior locales are all examples. Because opponents do not need express authorization to gather raw data from motion sensors, eavesdropping via them is a simple and uncomplicated process.
Eavesdropping attacks that are triggered by vibration generated by phone loudspeakers have been the subject of a substantial amount of research and development. Eavesdropping ear speakers are a built-in internal speaker in a smartphone that may be used to listen to the conversation while the phone is held to the ear. However, very few works have been done on the subject of eavesdropping ear speakers. Because most individuals are unwilling to reveal important communication, particularly in public locations, eavesdropping on the ear speaker is the most viable attack vector that may eavesdrop on phone calls.
Recent research has shown that high-resolution wireless sensors may be used to detect the vibrations that are generated by ear speakers even when they are positioned in close proximity to the victim.
A obvious question to ask is whether or not it is feasible to listen in on conversations using the built-in motion sensors of ear speakers. As a result of the zero-permission aspect of motion sensors, which means that it is not necessary to place any devices in the surroundings of the victim or hack into any of those devices, this kind of attack setup is quite practical. Previous research could not discover sufficient evidence that ear speakers had an effect on accelerometers.

On the other hand, we’ve seen that the audio quality of smartphone speakers is becoming better and more sophisticated. Recent flagship smartphones have followed the trend of including stereo speakers, which necessitates the placement of two speakers at the top and bottom of the device. In most situations, conventional ear speakers are being replaced by stereo speakers that have a more pronounced presence. As a direct consequence of this, phones that are equipped with stereo speakers generate a higher sound pressure compared to phones that just have standard ear speakers.

The motion sensors included in modern smartphones, such as the accelerometer and gyroscope, have a high degree of sensitivity and are specifically intended to detect vibrations in the phone. Existing research has shown that the motion sensor is able to detect vibrations in the body of the phone that are generated by sound being sent from the speaker that is incorporated into the device. The fundamental idea is that the sound waves that travel through the body of the smartphone cause vibrations, which may then be detected by the motion sensor located on that smartphone. To be more exact, Spearphone discovered that the accelerometer featured in smartphones had a high reaction to sound frequencies ranging from 100 Hz to 3300 Hz. Due to the fact that the low-frequency aliasing signals are formed from the primary sound at a variety of frequencies, this phenomenon demonstrates that the accelerometer is capable of capturing a wealth of information in these signals. In addition, they compared the frequency responses of both accelerometers and gyroscopes and found that the response of the accelerometer was stronger than the response of the gyroscope in the frequency range of 100Hz to 3300Hz. This was discovered when they compared the frequency responses of both devices. As a result, the sole kind of sensor we use in our investigations is an accelerometer.

In their studies, the researchers used a OnePlus 7T and a OnePlus 9 smartphone, in addition to a variety of pre-recorded audio sets that were only played via the ear speakers of the two devices. The music was only played in one direction.

During a simulated conversation, the researchers also utilized an application called “Physics Toolbox Sensor Suite” to record data from the accelerometer. They then sent that data to MATLAB so that it could be analyzed and characteristics could be extracted from the audio stream.

For the purpose of recognizing voice content, caller identity, and gender, a machine learning (ML) system was trained using datasets that were easily accessible.

Although the results of the tests varied depending on the dataset and the device, on the whole, they indicated that eavesdropping via the ear speaker may be a viable option.

The OnePlus 7T was able to identify the gender of callers with an accuracy ranging from 63.0% to 98.7%, caller ID categorization with an accuracy ranging from 63.0% to 91.2%, and voice recognition with an accuracy ranging from 51.8% to 56.4%.

The amount of volume that users turn up on their earphones or headphones might be one factor that reduces the effectiveness of an EarSpy attack. A lower volume might prevent eavesdropping via this side-channel attack, and it also makes for a more pleasant listening experience for the ear.

The dispersion of the reverberation produced by the speakers is also affected by the configuration of the hardware components of the device and the degree of assembly precision.

The accuracy of the speech data that was extracted is further reduced when the user moves about or when vibrations from the surroundings are added.

Android 13 has a limitation that prevents users from collecting sensor data without first obtaining permission for sampling data rates that are higher than 200 Hz. Even while this hinders speech recognition at the normal sampling rate of 400 Hz to 500 Hz, the accuracy is only reduced by roughly 10% when the attack is carried out at a sampling rate of 200 Hz.