Automatic Recognition of Handwritten Dates on Brazilian Bank Cheques

Abstract

In this thesis, an HMM-MLP hybrid system for segmenting and recognizing unconstrained handwritten dates written on Brazilian bank cheques is presented. The system evolves by dealing with many sources of variability, such as heterogeneous data types and styles, variations present in the date field, and difficult cases of segmentation that make the recognizer task particularly hard to do. The system takes an HMM-based strategy for identifying and separating the date into sub-fields. It makes use of the concept of meta-classes of digits in order to reduce the lexicon size of the day and year and produce a more precise segmentation. After that, the three obligatory sub-fields (day, month, and year) are recognized using specialized classifiers according to their respective data types which are known. In such cases, we propose an HMM word recognition and verification scheme to process month words and an MLP approach to decipher strings of digits (day and year). The digit string recognition strategy also makes use of the meta-classes of digits in order to reduce the lexicon size on digit string recognition and improve the recognition results. In addition to the date database, we have used other databases in order to validate the strategies employed in digit string recognition and word verification. Experiments show encouraging results on date, word, and digit string recognition. The system also contains a final decision module which makes an accept/rejection decision. Finally, a methodology for feature selection in unsupervised learning is proposed. It makes use of an efficient multi-objective genetic algorithm to generate a set of solutions, which contain the more discriminant features and the more pertinent number of clusters. The proposed strategy is assessed using two synthetic data sets where the significant features and the appropriate clusters in any given feature subspace are known. Afterwards, it is applied to optimize classifiers in a supervised learning context, i.e., handwritten word recognition. In this scenario, our approach is evaluated by conducting some experiments on isolated month word recognition. In this thesis, it is also used to optimize the word verifier of the date recognition system. Comprehensive experiments demonstrate the feasibility and efficiency of the proposed methodology.