Group 12: State of the Art: Difference between revisions

From Control Systems Technology Group
Jump to navigation Jump to search
Line 33: Line 33:
== Voice Recognition ==
== Voice Recognition ==
Voice signal identification consist of the process to convert a speech waveform into features that are useful for further processing. Generally, human voice conveys much information, such as gender, emotion and the identity of the speaker. This information could be used to improve the voice recognition performance. Using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques, by using the individual information in the voice signal the particular speaker can be authenticated which improves finding voice patterns. <ref>Muda L., Begam M, Elamvazuthi I. (2010) ''Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques'' https://arxiv.org/abs/1003.4083</ref>
Voice signal identification consist of the process to convert a speech waveform into features that are useful for further processing. Generally, human voice conveys much information, such as gender, emotion and the identity of the speaker. This information could be used to improve the voice recognition performance. Using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques, by using the individual information in the voice signal the particular speaker can be authenticated which improves finding voice patterns. <ref>Muda L., Begam M, Elamvazuthi I. (2010) ''Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques'' https://arxiv.org/abs/1003.4083</ref>
Another approach of processing voice signals, is by means of a neural network. When using a finite vocabulary on voice commands, the system must be trained on these commands. This can be achieved by having multiple people repeating the utterances several times. The more people in the training, the better the voice recognition would be.
Another approach of processing voice signals, is by means of a neural network. When using a finite vocabulary on voice commands, the system must be trained on these commands. This can be achieved by having multiple people repeating the utterances several times. The more people in the training, the better the voice recognition would be.
After this training, the system is put in a ''ready-to-use'' mode where it is waiting for voice commands. The part of the system that recognizes the start of a voice command is called the Voice Activity Detection (VAD). VAD is done by measuring the average energy of the signal over a specified interval ''i''. A circular buffer would continually contain ''i'' seconds of speech. A voice activity is detected if the energy track exceeds a specified treshold for a number of consecutive segments. If there is a voice activity, the contents of the circular buffer will be copied to another buffer, from which the voice data will be extracted.
After this training, the system is put in a ''ready-to-use'' mode where it is waiting for voice commands. The part of the system that recognizes the start of a voice command is called the Voice Activity Detection (VAD). VAD is done by measuring the average energy of the signal over a specified interval ''i''. A circular buffer would continually contain ''i'' seconds of speech. A voice activity is detected if the energy track exceeds a specified treshold for a number of consecutive segments. If there is a voice activity, the contents of the circular buffer will be copied to another buffer, from which the voice data will be extracted.
After the extraction of the voice data, it will be used for feature modeling and pattern matching, which we call classification. For this task feedforward neural networks can be used, which have been trained by the commonly used back-propagation method. <ref>AL-Rousan, M., & Assaleh, K. (2011). ''A wavelet- and neural network-based voice system for a smart wheelchair control.'' Journal of the Franklin Institute, 348(1), 90-100. doi:10.1016/j.jfranklin.2009.02.005</ref>


<!--
<!--

Revision as of 15:43, 11 March 2018

Robotic Guides

There are two different types of guidance systems, a wear type (e.g. A cane) and a mobile robot type, which mimickes the behaviour of a guide dog [1]. The second type has its own mobility, which gives the ability of active guidance. When the robot is used as a guidance tool it should use a pattern of behaviours, which enables the user to follow the robot easily

Most robotic guides for the visually impaired work on the principle, that the robot changes the direction when an obstacle is detected in its path, the change is communicated to the user by having enough mass for the user to feel the movement through the handle (haptically) [2]. The current guides are all wheeled, which are easier to design than legged robots and they are more stable. However, legged robots have the ability to move up and down stairs and walk on uneven terrain. Appearance of the robot is an important characteristic, because user acceptance much depends on it. Users would like to see all proposed functions, namely: obstacle avoidance, location, navigation, location of goods and reading street names. The appearance of the robot should be as invisible as possible and not attract any attention, but it should be robust, small, lightweight and elegant.

Among travel aids, the guide dog is a popular device for obstacle avoidance, however most travel aids have not yet gone beyond the prototype stage [3]. Users have some desires of specifications, for instance the battery life should be at least 16 hours, but several days is preferred. Furthermore, the robot should recharge without the use of vision and maintenance should be minimal. The robot should be easy to use and it should need little training. The robot should be robust and reliable, so it should be able to cope with different types of weather, water, knocks and uneven terrain. The interface should be accessible and the appearance should be customizable. For the user to be able to feel the movement of the robot a long handle is required.

With use of the guide robot in an assistive mode, the visually impaired was able to find obstacle free moving direction, detect stairs/steps and obtain information of the environment [4]. To guide the visually impaired to robot has to move with a moderate constant speed (e.g. 0.6 m/s). Furthermore, the motion should be smooth without immediate changes in robot speed.

Force that is transmitted through a stick from the robot to the user, is used by the user to avoid obstacles, but this causes that the stick also transmits all bumps directly to the user [5]. When the user follows the robot with a dog leash, the shock isn’t transmitted due to the flexibility of the leash. However, this results in the robot being unable to acquire information about the location of the user and the robot relative to each other. A robot is needed that follows a cooperative relationship with the user and does not only follow commands.

Environment Perception

Using sound localization system is a possible way to detect the environment. This technique uses multiple microphones to determine the angles at which the sound source is located. This technique can be used even to detect if an object is in between the sensors and the sound source.

There are several techniques to determine a model of the environment using different sensors, but all of them have pros and cons. Most of the focus is being put on static objects, like roads and traffic lights, since these remain stationary. However, in the case of traffic lights there are many different colours that can be emitted by a traffic light, making detection harder. All the techniques described have at most 83% precision rate for detection of traffic lights.

Learning a dynamic environment is not possible, since it is dynamic. What is possible is to determine possible configurations with respect to the dynamic objects themselves, making maps of possible configurations of the environment. This is mainly useful for low-dynamic environments, since the objects that make up the possible configurations should not be so many that the state space explodes. Furthermore, the state of the environment should be observed at different times to determine what the dynamic objects are and how the configuration maps should be made. [6] [7] [8] [9]

Obstacle Avoidance

After we detected and defined all the obstacles we know the surrounding of the guiding robot. When this is done we can start determining the path the guiding robot can walk to get from the initial place to the goal place of the visually impaired. Finding this path is also called the motion planning problem in robotics. In this problem we have a object with a staring position, an goal and objects in the workspace. In our case this means that the visually impaired with the guiding robot is the object with the staring position, the goals position. And the surroundings are the workspace with the obstacles.

One approach to solve this motion planning problem is to divide the problem into two sub-problems, the ‘Findspace’ and ‘Findpath’ problem [10]. The ‘Findspace’ problem is already explained in the environment perception. So now we still have the ‘Findpath’ problem, which means that we have to find a continuous path through the obstacles form the starting position to the goal position.

To find this continues path the generic algorithms (GAs) have gained popularity. These algorithms generically find random paths, rather than finding a new random path by crossing and matching the old random paths. After repeating this and checking if the final path matches all the conditions, this path will be the path that is executed. This algorithm is mainly based on random search, and afterwards taking the best possible option. However, because in this case there is a big change there is a better path than the path we find, there are better algorithms we could use to find the continuous path we want to walk with the guiding robot. So in the article 'on a PBIL Algorithm ROBIL: Robot Path Planning Based on PBIL Algorithm' [11] they talk about an algorithm called ROBIL, which has a better potential than the GA algorithms seen so far. To show this potential, they compared ROBIL with two well-known evolutionary algorithms, conventional a GA and a KGA. And ROBIL shows to have a higher success rate, a stable performance and uses less parameters. Also ROBIL has the potential to provide scalability on the larger-scaled maps, so if we have a bigger environment it can still determine the path. There is only one problem when working with this algorithm, namely that this algorithm uses a lot of memory and storing space. So the idea used in the algorithm is something we can use for inspiration.

Voice Recognition

Voice signal identification consist of the process to convert a speech waveform into features that are useful for further processing. Generally, human voice conveys much information, such as gender, emotion and the identity of the speaker. This information could be used to improve the voice recognition performance. Using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques, by using the individual information in the voice signal the particular speaker can be authenticated which improves finding voice patterns. [12]

Another approach of processing voice signals, is by means of a neural network. When using a finite vocabulary on voice commands, the system must be trained on these commands. This can be achieved by having multiple people repeating the utterances several times. The more people in the training, the better the voice recognition would be. After this training, the system is put in a ready-to-use mode where it is waiting for voice commands. The part of the system that recognizes the start of a voice command is called the Voice Activity Detection (VAD). VAD is done by measuring the average energy of the signal over a specified interval i. A circular buffer would continually contain i seconds of speech. A voice activity is detected if the energy track exceeds a specified treshold for a number of consecutive segments. If there is a voice activity, the contents of the circular buffer will be copied to another buffer, from which the voice data will be extracted. After the extraction of the voice data, it will be used for feature modeling and pattern matching, which we call classification. For this task feedforward neural networks can be used, which have been trained by the commonly used back-propagation method. [13]


References

  1. Kang, D. O. H., Kim, S. H., Lee, H., & Bien, Z. (2001). Multiobjective navigation of a guide mobile robot for the visually impaired based on intention inference of obstacles. Autonomous Robots, 10(2), 213–230. https://doi.org/10.1023/A:1008990105090
  2. Hersh, M. A., & Johnson, M. A. (2010). A robotic guide for blind people. Part 1. A multi-national survey of the attitudes, requirements and preferences of potential end-users. Applied Bionics and Biomechanics, 7(4), 277–288. https://doi.org/10.1080/11762322.2010.523626
  3. Hersh, M. A., & Johnson, M. A. (2012). A robotic guide for blind people Part 2: Gender and national analysis of a multi-national survey and the application of the survey results and the CAT model to framing robot design specifications. Applied Bionics and Biomechanics, 9(1), 29–43. https://doi.org/10.3233/ABB-2011-0034
  4. Capi, G., Kitani, M., & Ueki, K. (2014). Guide robot intelligent navigation in urban environments. Advanced Robotics, 28(15), 1043–1053. https://doi.org/10.1080/01691864.2014.903202
  5. Cho, K. B., & Lee, B. H. (2012). Intelligent lead: A novel HRI sensor for guide robots. Sensors (Switzerland), 12(6), 8301–8318. https://doi.org/10.3390/s120608301
  6. Huang J., Supaongprapa T., Terakura I., Wang F., Ohnishi N., Sugie N. (1998) A model-based sound localization system and its aplication to robot navigation
  7. Yagi Y. (1995) Map-ased Navigation for a moile robot with Omnidirectional Image Sensor COPIS
  8. Zhu H., Yuen K., Mihaylova L. (2017) Overview of Environment Perception for Intelligent Vehicles
  9. Stachniss C., Burgard W. Mobile Robot Mapping and Localization in Non-Static Environments
  10. Danica Janglová, (2004), Neural Networks in Mobile Robot Motion, International Journal of Advanced Robotic Systems, 15-22, http://journals.sagepub.com.dianus.libr.tue.nl/doi/abs/10.5772/5615
  11. Bo-Yeong Kang, Miao Xu, Jaesung Lee and Dae-Won Kim, (2014), on a PBIL Algorithm ROBIL: Robot Path Planning Based on PBIL Algorithm, International Journal of Advanced Robotic Systems, 1, http://journals.sagepub.com/doi/abs/10.5772/58872
  12. Muda L., Begam M, Elamvazuthi I. (2010) Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques https://arxiv.org/abs/1003.4083
  13. AL-Rousan, M., & Assaleh, K. (2011). A wavelet- and neural network-based voice system for a smart wheelchair control. Journal of the Franklin Institute, 348(1), 90-100. doi:10.1016/j.jfranklin.2009.02.005