Construction of Julius' robot speech recognition system

2024-07-21 20:01:38

With the continuous development of modern technology and computer technology, people need a more convenient and natural way of interacting with machines to realize voice interaction between humans and machines, so that machines can understand people's words. The development of speech recognition technology makes this ideal realized. The combination of speech recognition technology and robot control technology is becoming a hot spot in current research.

The application of voice recognition technology to robot systems is mostly designed for specific environments and controlled by voice commands. Only need to enter voice recognition on the command line of dozens of words or words, can make the work that originally required manual operation easily completed by voice. This paper designs an isolated word speech recognition system for non-specific people for the existing robot platform.

1 Speech recognition principle and JuliUS introduction

1.1 Principle of speech recognition based on HMM

The speech recognition system is a pattern recognition system. The system first analyzes the speech signal to obtain the characteristic parameters of the speech, and then processes these parameters to form a standard template. This process is called training or learning. When test speech enters the system, the system will process these speech signals, and then match the reference template to get the result. This completes the speech recognition process.

At present, HMM, as a statistical model of speech signals, is the mainstream modeling method of speech recognition technology and is being widely used in various fields of speech processing. Now many commercial speech software, as well as various speech recognition systems with excellent performance, are developed on this model, and a complete theoretical framework has been formed.

The speech recognition system based on the HMM pattern matching algorithm is as follows: In the training stage, the HMM training algorithm is used to establish an HMM model for each entry. After repeated entry training, the corresponding HMM model is added to the HMM model library and stored in the form of data. In the matching stage, that is, the recognition stage, the HMM matching algorithm is used to match the input unknown speech signal with the model in the model library obtained in the training stage, and the speech recognition result is output.

1.2 Introduction to JuliUS

Julius is a practical and efficient dual-channel large vocabulary continuous speech recognition engine jointly developed by Kyoto University in Japan and Japan's IPA (Information-tech-nology Promotion Agency). At present, it can be well applied to a large vocabulary continuous speech recognition system in Japanese and Chinese. Julius is developed in pure C language, follows the GPL open source agreement, and can run on Lin-ux, Windows, Mac: OS X, Solaris and other Unix platforms. The latest version of Julius adopts a modular design idea, so that each functional module can be configured by parameters.

The operation of Julius requires a language model and an acoustic model. Using Julius, a speech recognition system can be easily established by combining language models and acoustic models. The language model includes a word pronunciation dictionary and grammatical constraints. The language models supported by Julius include: N-gram model, rule-based grammar and simple word list for isolated word recognition. The acoustic model must be defined by HMM in terms of word segmentation.

The application can interact with Julius in two ways: one is a socket-based server-client communication method, and the other is an embedded method based on a function library. In both cases, if the recognition process is over, the recognition result is sent to the application, and the application can get the current status and statistics of the Julius engine and can operate the officer.

2 System framework

2.1 Hardware structure

Atom Z510 is a training learning machine brain (Intel Atom Z510 embedded control platform with 1.1 GHz main frequency) in the robot recognition system of speech recognition. It mainly completes the function of speech recognition. PXA270 controller (Intel â€™s powerful PXA27x series embedded processor launched at the end of 2003, based on the XScale core of ARMv5E, the highest frequency can reach 624MHz) as the core intelligent controller on the robot dog body, after receiving the recognition As a result, a control command is issued. The ATmega128 controller (one of the 8-bit microcontrollers from Atmel Corporation, running at 16 MHz) completes the control of the digital servo based on the serial bus, and controls the joints of the robot dog's front and rear legs and tail. The robot dog hardware structure platform is shown in Figure 2.

2.2 Software structure

The entire robot system includes 3 modules: Julius voice recognition module, GUI human-machine interface, and robot control module. Julius submits the recognized voice commands to the GUI module and displays them on the GUI; meanwhile, the GUI converts the voice commands into motion control commands and sends them to the robot control module; the GUI can also control the start and stop of Julius. The robot control module is mainly on the PXA270, while the voice recognition and GUI are on the Atom Z510.

3 Construction of speech recognition system

A complete speech recognition system generally includes 3 parts: acoustic model, language model and recognizer. In this system, only the recognition grammar based on control commands (verbs) is established, and other words are ignored, so no language model is built; the recognizer uses the Julius open source platform, this part only uses configuration parameters and related files. The main work of this paper is acoustic model training and speech recognition system construction.

3.1 Acoustic model training

The acoustic model is the bottom model of the recognition system and the most critical part of the speech recognition system. It is the acoustic model parameter set of each acoustic unit. The acoustic model of this system is extracted after multiple iterative trainings on the collected speech library using HTK, based on the acoustic feature vector set of words. HTK (HMM Tools Kit) is developed by the Speech Vision and Robotics Group of the Engineering Department of the University of Cambridge in the United Kingdom. It is specifically used to establish and process an experimental toolkit for HMM. It is mainly used in the field of speech recognition. Can be used for testing and analysis of voice models. The specific training steps are as follows:

(1) Data preparation

Collect a corpus of standard Mandarin Chinese, and mark the speech in the corpus to create a list file of speech recognition unit elements.

(2) Feature extraction

This system uses MFCC to extract the feature parameters of speech. During the training, each speech file is converted into MFCC format with the tool HCopy.

(3) HMM definition

The initial framework of the model should be given when training the HMM model. The HMM model in this system chooses the same structure, as shown in Figure 4. The model contains 4 active states {S2, S3, S4, S5), start and end (here S1.S6), which is non-divergent. The observation function bi is a Gaussian distribution with a diagonal matrix, and the possible transitions of states are represented by aij.

(4) HMM training

This system first uses the HInit tool to initialize the HMM model, and then uses the HCompv tool to initialize the model flatly. Each state of the HMM model is given the same average vector and change vector and is calculated globally on the entire training set. Finally, multiple estimation iterations of HRest are used to estimate the optimal value of the HMM model parameters. After multiple iterations, the single HMM model obtained by training is integrated into a hmmsdef.mmf file.

3.2 Julius application

3.2.1 Julius deployment

In this system, the speech recognition part is deployed on the Atom Z510, and the Atom Z510 needs to be transplanted with the linux operating system (this system uses ubuntu8.10). The above steps are not described in detail here, and the literature has detailed instructions. The core part of speech recognition is the Julius recognizer, and Julius source code needs to be compiled and deployed to the Atom Z510 platform. The steps are as follows:

â‘ Make sure that the following support libraries are in the Linux system: Zlib, flex, OSS audio driver interface, ESounD and libsndfile.

â‘¡ Download the source code Julius-4.1.5 from the Julius official website.

â‘¢Unzip: tar-zxjf julius-4.1.5.

â‘£ Compile:%. / Configure,% make,% make install.

3.2.2 Julius configuration

Julius is all implemented in C language code, adopts modular design, and each functional module can be configured. Before use, the configuration parameters need to be written to the jconf file. This file is loaded into the system as operating parameters. The system scans the parameter configuration and starts each function block. Which focuses on the following configuration parameters:

â—† -dfa rtdog.dfa, specify the syntax file rtdog.dfa;

â—† -v rtdog.dict, specify the dictionary file;

â—† -h rtdog.binhmm, specify the HMM model file;

â—† -lv 8000, set the audio threshold to filter out noise;

â—† -rejectshort 600, set the minimum voice length;

â—† -input mic, set the voice input mode to microphone. (This article is organized by China Education Equipment Purchasing Network)

Platform Trolley

Platform Trolley,Flatbed Trolley,Platform Hand Truck,Platform Truck Trolley

Jiangmen Junerte Stainless Steel Kitchenware Co.,Ltd , https://www.junertejm.com