Discovery of natural multisensory connections between what we see and hear is promising for blind-assistive technology

Researchers at Caltech have recently discovered intrinsic connections in the brain that link images with both sounds and tactile patterns – in other words, textures that can be felt by touch. These connections, called crossmodal correspondences, are how the brain combines information from more than one sense to improve perception. This discovery has the potential to make significant improvements in visual-to-audio sensory substitution technology, which uses software to translate real-time images and videos into sound patterns, thereby enabling the visually impaired to regain some sight ability.

Past research has shown that both sensory-impaired and normal individuals possess the capability to make these cross-modal connections. However, for sense-impaired individuals in particular, this opens up many possibilities for assistive technology. Assistive technologies support people with disabilities by helping them perform tasks which they would find difficult without assistance, thereby providing them with improved independence and mobility, greatly improving their quality of life.

For the visually impaired, an example of a succesful assistive technology are sensory substitution devices (SSDs). These devices assist the blind by exploiting these crossmodal mappings to utilise the individual’s other senses, helping them regain some form of visual perception. Currently, there is a category of SSDs that functions by translating a live video stream into a sound pattern, enabling the visual areas of the brain to do auditory analysis. However, these substitution devices operate on an assumption of crossmodal plasticity – i.e., that crossmodal mappings are learned rather than innate. This means that current SSDs often require extensive training to attain only limited functionality; they are not widespread amongst the blind community due to limited accessibility. The SSD program used in the Caltech study, vOICe—the middle letters stand for “Oh I See”—was developed by Doctor Peter Meijer in the 1990s and translates certain image characteristics into noises. For example, on a black background of silence, one bright dot translates to one beep; rising or falling lines give rising or falling tones (where the steepness of the line translates to the speed of the tone change); and an upright rectangular shape gives a noise burst, where pitch and pitch range give elevation and height. Combining these sounds allows the software to “show” more complex images. The video below shows how simple objects such as cutlery are translated into sounds through vOICe.


However, while vOICe has existed for over a decade, it has not been widely adopted by the blind community. One of the problems is that there is a lack of standardised training programs – currently, most vOICe users are self taught. The method of learning vOICe, as posted to the vOICe website, starts with basic geometric shapes and then use these shapes as ‘building blocks’ that form the components required to process more complex scenes, such as natural environments. As it’s impossible for the training to cover all potential scenes/environments, this keeps the range of vision fairly limited. The video shows that the images are black and white and simplified for translation into sounds. Another argument against the use of visual-to-audio SSDs is that they substitute vision for hearing, which is a sense that many visually-impaired individuals are heavily dependent on already. Other SSDs, such as tongue display units—which convert images into electrodes on the tongue—may be easier to adapt to. Notable studies of echolocation in fully-blind individuals have also shown the remarkable sense of hearing that can develop when people are fully dependent on it for navigation. Switching to an audio-based sensory substitution device would in fact be a hindrance to their mobility and independence, especially for congenitally blind adults who have developed their own methods of navigation without the aid of complex technology.

However, the Caltech researchers have discovered that crossmodal mapping—the mechanism by which SSDs operate—can be engaged effectively by using stimuli (such as images and textures) associated with already-existing connections in the brain that link images with sounds and tactile patterns. Rather than trying to create crossmodal plasticity (deliberate crossmodal correspondences) through learned sensory substitutions, the study suggests that there are intuitive crossmodal mappings in the brain that we should make use of when designing sensory substitution software. In one study, participants untrained in vOICe were able to correctly match images of natural stimuli with the corresponding vOICe encoded sounds. The experiment compared matching accuracy using encodings and images of nature images, such as single flowers in bloom and tree trunks, and basic shape patterns – such as horizontal lines across a screen. Untrained participants had more correct answers when matching these natural images and their corressponding sounds than when matching the simple horizontal lines. This is evidence of natural stimuli being rich in crossmodal correspondences that require less training to be activated. Furthermore, sensory substition was found to be at least partially automatic (i.e., associations were present even when the subjects were distracted), which opens up the possibility that the use of these SSDs could be made into a less conscious process and therefore would not require the user’s full attention (much more closely resembling actual sight perception).

This discovery was limited to a small group of subjects in a laboratory setting, however, including a mix of sighted and blind individuals. Further study is needed to fine-tune these intrinsic mappings before they can be effectively used to improve SSD training. The vOICe system is unlikely to become more popular without the implementation of standardised training programs and improvements to make it simpler than current aids.

Overall, these recent discoveries serve as proof of concept for the existence of intuitive crossmodal correspondences. Based on these discoveries, training procedures for using SSDs—such as learning the shapes and corresponding sounds for vOICe—could be modified to improve performance as well as training times. Training could be more organic, i.e., based on natural scenes (such as images of plants and flowers) rather than simple shapes. The results of the study suggest that humans do possess intrinsic crossmodal correspondences between natural scenes and certain vOICe sound encodings. Therefore, by utilising these correspondences, the training period for vOICe usage could be significantly shortened. Further study with visually-impaired individuals is required (both congenital and late blind, as their sensory substitutions will differ) in a more tailored manner, but this will hopefully enable the creation of better SSDs that make full use of sensory integration. At the moment, these results are best put to use in the development of better training programs for users of assistive technology. For the 285 million people worldwide living with visual impairment, visual-to-audio devices becoming easier and more efficient to use could mean huge improvements to their quality of life. Someday, this type of technology might even be capable of fully replacing sight with hearing and other senses.



Deroy, Ophelia, and Malika Auvray. “Reading the World through the Skin and Ears: A New Perspective on Sensory Substitution.” Frontiers in Psychology 3 (2012): 3-13.ScienceDirect. Web. 27 Jan. 2016.

Elli, Giula V., Stefania Benetti, and Olivier Collignon. “Is There a Future for Sensory Substitution Outside Academic Laboratories?” Multisensory Research 27 (2014): 271-91.ScienceDirect. Web. 26 Jan. 2016.

Maidenbaum, Shachar, Sami Abboud, and Amir Amedi. “Sensory Substitution: Closing the Gap between Basic Research and Widespread Practical Visual Rehabilitation.”Neuroscience and Biobehavioral Reviews 41 (2014): 3-15. Elsevier. Web. 26 Jan. 2016.

Renier, Laurent. “Sensory Substitution Devices: Creating “Artificial Synesthesias”.” Oxford Handbook of Synesthesia. Ed. Anne G. De Volder. Oxford: Oxford UP, 2013. N. pag.Oxford Handbooks Online. Web. 25 Jan. 2016.

Spence, Charles. “Crossmodal Correspondences: A Tutorial Review.” Attention Perception Psychophysics 73 (2011): 971-95. ScienceDirect. Web. 26 Jan. 2016.

Stiles, Noel R.B., and Shinsuke Shimojo. “Auditory Sensory Substitution Is Intuitive and Automatic with Texture Stimuli.” Scientific Reports – Nature 5 (2015): n. pag. 22 Oct. 2015. Web. 25 Jan. 2016.

Striem-Amit, Ella, Miriam Guendelman, and Amir Amedi. “‘Visual’ Acuity of the Congenitally Blind Using Visual-to- Auditory Sensory Substitution.” PLoS ONE 7.3 (2012): 1-6.ScienceDirect. Web. 26 Jan. 2016.