Envision AI enables visually impaired people to read text in real time or by recognizing documents, even handwritten texts. It also has options for color identification, scene description, and product identification by barcode scanning.
This application is available for Android, requiring a version equal to or greater than 5.0, and for IOS, where at least version 10.0 is required.
Next, an analysis of the interface and the different functions of the application will be carried out, both in Android and IOS.
The interface of the Android and IOS versions differs slightly in terms of the number of tabs, their location and their content.
The first difference is that in Android the tabs are located at the top of the screen, being 3, while in IOS they are located at the bottom and are 4. The tab that the Android version does not have is called Scan and find.
Another important difference between both versions is that in Android you can activate the camera flash to improve lighting, which cannot be done in the IOS version as it lacks a button for this option.
The other significant differences are given by the buttons that exist in one version and another to activate the different functions of the application, as well as the location in one tab or another. In this sense, the first button that appears on Android but not on IOS is the Read Handwritten Text button in the Text tab. Also within the Text tab there is a button in IOS called More Actions, which allows you to recognize text from PDFs, images or multiple pages. However, the Android version lacks such a button.
A very important button for the visually impaired is Color Detect. However, this function is only present in IOS within the General tab.
As for the Scan Bar Code and Teach Envision buttons, these are present in both Android and IOS, although they are located in different tabs. In Android they are located in the General tab while in IOS they are located within the Scan and Find tab.
Finally, we must mention the Find People and Find objects options, which are present in IOS within the Scan and Find tab, but are not present in Android.
Below is a detailed analysis of the different functions that Envision AI can perform, both in the Android and IOS versions.
Magnification and text functions
The Magnifier function allows you to use the device's camera as a magnifying glass. To activate this function, you have to press the button called "Magnifier". When pressed, a slider is displayed to indicate from 0 to 100 the magnification degree of the magnifying glass. To close the magnifying glass, press the button again.
Start reading instantly
To activate this option, you have to go to the Text tab and press the button that says "Start reading instantly", located at the bottom left of the screen. Once the button is pressed, Envision immediately begins to read whatever text the camera captures. To stop the instant reading you have to press the button again, which now says "Stop reading instantly".
Read handwritten text
To activate this option, only available on Android, you have to go to the Text tab and press the button that says "Read handwritten text", located at the bottom center of the screen. After pressing it, the application begins to read the text when it is detected.
This option allows the user to take a photograph of a text to be recognized, or for the application to automatically take a photo of it when it detects a document for recognition.
Once the text has been recognized, the application displays a screen with it. In IOS, at the bottom of the screen there are several buttons to automatically start reading the text and pause it, to change the size of the text and to export it. On the other hand, in Android there are similar buttons, although there are differences since there is no button to change the size of the text and two buttons are included, "Next page" and "Previous page".
This option, located within the Text tab, is only available in the IOS version, and allows reading texts with multiple pages, PDFs and images. the button to activate it is located in the lower right part of the screen and after pressing it a submenu with the different options is displayed. Clicking on "Import PDF" opens a browser to search for files on the phone. If you click on "Import image" the device's photo library opens. Pressing on "multiple pages" opens the camera to take a picture and an indicator is displayed showing the number of pages captured along with two more buttons, one to take a picture and one to stop taking pictures.
Describe the scene
To activate this option, located within the General tab, you must press the button with that name, located at the bottom left of the screen. After activating the function, the user has to take a picture of their surroundings and the application will describe the scene shown in it.
This option is located within the General tab and is available only for IOS. It allows the user to identify the colors of objects by using the camera. Once the corresponding button, located in the lower central part of the screen, is pressed, the application begins to speak the colors it detects. You can do this with a standard precision of 30 colors or with a more descriptive precision of 950 colors.
Describe the scene
This option allows the application to recognize the faces of friends or family, either by taking photographs of the person, or from images saved in the phone's photo library. It is located in the General tab in Android, while in IOS it is located in the Scan and find tab. To do this, click on the button that says "Teach Envision", located at the bottom right of the screen. If the "Show a face" option is chosen, the camera will open and up to five photos of the person can be taken for Envision to recognize. If the "Open Library" option is chosen, the phone's library will be displayed for the user to select a photo. Once the person has been recognized and given a name, when the "Describe the scene" option is pressed, if Envision recognizes any face that it has stored, when describing it it will indicate the person or persons who are in it.
This option is located in the General tab and allows the user to identify a product by scanning its barcode. To do this, press the button located at the bottom right in IOS and at the bottom center in Android. After pressing it, using the camera, the barcode is scanned, and if it is identified, the product name is displayed along with a button that allows more information about it.
Scan and Find Features
This option is only available in IOS and is located in the Scan and Find tab. To activate it, you have to press the button with the name "Find people" located at the bottom left of the screen. After pressing it, the user can move the mobile around him, and when Envision detects a person, it will emit a beep to indicate it.
Search for objects
This feature is only available on IOS and is located on the Scan & Find tab. To activate it, you have to press the button with the name "Search for objects", located in the lower central part of the screen. Once the button is pressed, a list is displayed with a series of objects, such as car, cat, toothbrush, purse or fork. The user must select one of the options, and then move the mobile around it. When Envision detects an object of the type selected by the user, the application will emit a sound to indicate this.
Envision AI allows you to configure different parameters, which depend on the version used. The parameters that can be configured in IOS are more than in Android because in the first one the application has more functionality.
The possible configuration parameters are detailed below, all of which are located within the tab called Help.
Offline text recognition
This option can be enabled or disabled. When enabled, it enables faster text recognition for texts written in Latin-based languages.
Automatic language detection
This option can be enabled or disabled. When disabled, the application only reads text in the system language.
Text to speech
This option is called "Text to speech" in Android and "Speech" in IOS, and allows you to configure the voice that is not the screen reader.
In Android it allows you to choose the synthesizer that you want to use from those that are installed in the system.
In IOS this option allows you to adjust the speed of the voice that is not VoiceOver, as well as to choose the language and the "person" who speaks.
This setting is only available on IOS. Allows you to select the accuracy of the color identification when using the Color Detect functionality. You can choose between standard (30 colors) or descriptive (950).
Within the help, you can access a series of online tutorials in English by clicking on the "Read tutorials" option, send an opinion by clicking on the "Give opinion" option or request that the developer contact the user by phone. by clicking on the "Request a call" option.
Other options on the help tab
To carry out the tests, an iPhone SE with IOS 12.4 and a Huawei Mate P20 Lite with Android 9.0 were used. A Samsung Galaxy J3 with Android 8.0 has also been used.
It should be noted that the application does not work correctly when the Silent Mode is activated in Android, because the speech synthesis in functions such as Read instantly does not work, although the screen reader works perfectly.
In order to analyze and evaluate the application, a series of tests of the different options have been carried out, both in the Android and IOS versions.
The tests carried out for the recognition of texts with the different options for this purpose have been very positive in general. In both instant text reading and document recognition, the application has shown very satisfactory results, clearly identifying the texts presented. The only difficulty that the application has encountered has been with texts with multiple boxes and images, where the results obtained have not allowed a clear reading of the text. It should also be noted that the recognition of handwriting has been equally satisfactory, reading these texts without problem. Where the application has presented more difficulties has been in the recognition of PDFs and images in IOS, which may have been affected by the content of images, as well as the structure and quality of the documents.
A very noteworthy point when reading documents is that the application indicates whether or not there are visible edges. It should be noted that this function is designed to scan documents, such as letters or folios, and that these indications are very useful for people who completely lack the sense of sight or whose reduced vision does not allow them to see even the document. In this sense, in the tests carried out, the application has always identified the edges and has correctly indicated when they were within the limits of the camera and when they were not.
The option "Describe the scene", in the tests carried out, has presented a reasonable behavior, although it has presented some failures. These have been due to the fact that it has not correctly identified some objects, although the application has indicated in these cases that the description could not be exact. He has also used some word that was not the correct word, but this has been a mistranslation of the description. It should also be noted that in the tests carried out on Android the scene was described in the default language of the phone, but when updating to version 0.9.6 the scenes are described in English.
On the other hand, during the tests carried out, the identification of the color has been carried out with great precision and in a very precise way, giving a wide range of nuances to the colors when the descriptive mode is activated in the parameters.
The tests carried out on the functionality of finding people have been very positive. It is not a question of describing or identifying the person or persons, but of indicating to the user if there is a person, which is quite useful for people who cannot do it. In all cases, the application has worked correctly, emitting a sound to indicate this.
The Find Objects function has correctly identified most objects, from handbags to armchairs to laptops. It has only been unable to identify objects of the telephone category, whether they were mobile or fixed.
Teaching Envision has shown completely satisfactory results. During the tests carried out, the application has perfectly recognized the man and the woman who have entered it. When scenes have been captured, the man and woman have been easily identified, either separately or appearing together.
The negative aspects of the application, like all those that make intensive use of the camera and the Internet connection, refer to the high consumption of data and battery. The first is because most of the functions require sending the data through an Internet connection to be processed on an external server. Since it is images that are sent, and surely with high resolution for a good analysis, the data consumption is high. Regarding the second, as the camera is constantly working, the energy consumption of the device is very high, which makes the application quickly consume the battery of the device.
Finally, the magnifying glass function has been tested, which works perfectly, although it should be noted that it is likely to work better or worse, in terms of magnification, depending on the quality of the device's camera.
As seen in the tests carried out, Envision AI has shown great performance with very remarkable results. This application, as has been seen, is designed primarily for people with total blindness or very low vision, since functions such as "Find people" and "Find objects" demonstrate this. This does not mean that it is not very useful for other people with low vision, since the functions of text recognition and color identification, as well as the magnifying glass are very useful for this group.
A very noteworthy aspect is that it is indicated if the edges are within the limits of the camera when recognizing a document, as it is very useful for totally blind people.
It is worth highlighting the possibility of choosing between the standard and the descriptive mode when identifying colors. This is of great interest since users who are satisfied with a basic color gamut can activate the standard mode, while more demanding users can activate the descriptive mode for greater precision.
The voice configuration options are also very interesting. Although it seems that in IOS it is more configurable, allowing to select the speed, the language and the "person", in Android, once the voice synthesis is selected, it can be configured from the Android settings.
The less positive aspects of the application are given by the high consumption of data, since the application has to connect to the Internet in most of its functions to carry out the processing of the images on an external server, as well as the high consumption battery required when the camera is permanently active.
In general, it can be concluded that it is a great application, very interesting for the group of blind and low vision people, which performs the functions it incorporates with great precision and correction.
- Handwritten text recognition.
- Recognition of PDFs and images in IOS.
- High precision in text recognition.
- Possibility of recognizing 950 colors.
- Ability to obtain additional product information by scanning the barcode.
- Possibility to learn faces of friends and family./li>
- Possibility of configuring the voice that is not the screen reader.
- It could be suggested to include the functions of Detect Color, Find People and Find Objects in the Android version.
- The possibility of including a button to activate the flash in IOS could be studied.
- You could study the description of the scenes in the default language of the phone in the Android version.
- In future versions it could be studied how to reduce the battery consumption of the device.
- It could be analyzed how to reduce data consumption in future versions.