OCR font
You must specify the Aurora Imaging Library OCR font to read/verify the character strings in target images. Aurora Imaging Library uses fonts (or typesets) to specify the style and size of characters in the images to be read or verified.
An OCR font contains the following information:
- The grayscale representations of the characters.
- Codes identifying each character (ASCII codes for the characters).
- The number of characters in the OCR font.
- Character dimensions.
The above information can be calibrated (with MocrCalibrateFont or MocrControl), modified (with MocrModifyFont), and saved (with MocrSaveFont) for later restoration.
User-defined Aurora Imaging Library OCR fonts
Unless using one of the two provided predefined OCR fonts, you must define a custom OCR font. You can create a user-defined OCR font from scratch, or make minor modifications to an existing OCR font, already saved to disk.
To create a custom font:
- Allocate an OCR font context, using
MocrAllocFont.Note: When allocating an OCR font context, you must specify the maximum number of characters that can be stored in the font, and the dimensions of the font's character representations and their character cells.
- Grab/create the grayscale character representations of the font in an Aurora Imaging Library image buffer and then copy them from the image buffer to the OCR font context, using
MocrCopyFont. Alternatively, you can import grayscale character representations from a text file or an image file (for example, a TIFF) into an OCR font context usingMocrImportFont.Note: You can use the Aurora Imaging Library OCR Reader utility to create new characters to add to an existing font. Note: When importing, or copying, character representations to the OCR font context, the OCR font context must have sufficient space to hold the representations of all the specified characters. You can use
MocrInquireto determine the maximum number of characters that can be stored in the font and the size of each character. - Once copied or imported into an OCR font context, the entire OCR font context can be saved on disk using
MocrSaveFont, and then later restored usingMocrRestoreFont.
The following is an example of a character in its Aurora Imaging Library OCR font character cell and the dimensions that you will have to specify during OCR font context allocation. Values are to be specified in pixels. Each square in the grid represents one pixel.
[Image: dimen.png]
The parameters CharCellSizeX, CharCellSizeY, CharOffsetX, CharOffsetY, CharSizeX, and CharSizeY of MocrAllocFont must comply with the following restrictions:
- 2*
CharOffsetX+CharSizeX<=CharCellSizeX. - 2*
CharOffsetY+CharSizeY<=CharCellSizeY. CharCellSizeY,CharCellSizeX,CharSizeY,CharSizeXmust be >= 6 pixels and <= 256 pixels.
Note: When the characters in a font do not have a uniform width or height,
CharSizeXshould specify the width of the widest character in the font andCharSizeYshould specify the height of the tallest character in the font. Also, to be able to search for a string over a range of angles, the font context's character size must be greater than 16x16. You can use the Aurora Imaging Library OCRReader utility to determine font character widths and heights.
When copying the character representations from an image buffer, or importing them from an image file, the characters must have the dimensions specified during OCR font context allocation.
The following is an example of how to create a user-defined Aurora Imaging Library OCR font using a font definition image and MocrCopyFont. In this case, the _CharImageForFontDefinition.mim_file contains the grayscale character representations. The image is loaded into an image buffer and then the character representations are copied to an allocated OCR font context using MocrCopyFont.
Code example: userguide.optical_character_recognition.defining_a_font01
The following is an example of how to create a user-defined Aurora Imaging Library OCR font directly from a font definition image file, using MocrImportFont. In this case, the _CharImageForFontDefinition.mim_file contains the grayscale character representations. which are imported into an allocated OCR font context using MocrImportFont. MocrImportFontimports into an allocated OCR font context.
Code example: userguide.optical_character_recognition.defining_a_font02
When importing the character representations from an ASCII file, font character representations must be presented as follows:
[Image: CharValue_65.png]
Note: Note that in this format, 'pixels' are delimited by a blank space. So '00' counts as one pixel.
This information breaks down into the following:
| Row | Description |
|---|---|
| 01 | Specifies ASCII file format. |
| 02 | Blank row. |
| 03 | Specifies the start of a new character representation and its associated (generally ASCII) character. |
| 04 to 36 | Specifies the alpha-numerical representation of the character. |
| 37 | Blank row. |
| 38 | Specifies the start of a new character representation and its associated (generally ASCII) character. |
| 39 to 71 | Specifies the alpha-numerical representation of the character. |
| etc. | This pattern is repeated for every character in the font. |
The following is an example of how to create a user-defined Aurora Imaging Library OCR font using character representations from an ASCII file, using MocrImportFont. In this case, the _AsciiFileForFontDefinition.txt_file contains the ASCII character representations, in the format above. The character representations are imported into an allocated OCR font context using MocrImportFont.
Code example: userguide.optical_character_recognition.defining_a_font03
Existing Aurora Imaging Library OCR fonts
Once created, an Aurora Imaging Library OCR font can be saved and restored as needed. Restoring this information (using MocrRestoreFont) rather than creating the Aurora Imaging Library OCR font from scratch saves time, especially if the restored font requires no further modifications. Note that the entire OCR font context is restored when restoring the font using MocrRestoreFont. Aurora Imaging Library comes with three predefined Semi fonts; for more information, see the next subsection.
SEMI fonts
Aurora Imaging Library OCR comes with two ISO compatible SEMI fonts (M_SEMI_M12_92 or M_SEMI_M13_88) and one generic SEMI font that has no constraints and no checksum (SEMI.mfo). These can be used directly or modified to suit your needs.
Using a SEMI font
To use a SEMI font directly, restore it using MocrRestoreFont with the FileName parameter set to "SEMI_M12-92.mfo", "SEMI-M13-88.mfo", or "SEMI.mfo". These files are located in directory "\contexts" under the Aurora Imaging Library installation folder. Once restored, the font can be modified using the Aurora Imaging Library OCR functions.
Creating a SEMI font
To create a new font based on a SEMI font:
- Create an OCR font context using
MocrAllocFontwith:- The
FontTypeparameter set to eitherM_SEMI_M12_92orM_SEMI_M13_88. - The
StringLengthparameter set to 12 when usingM_SEMI_M12_92and 18 when usingM_SEMI_M13_88. - The
CharNumberparameter set to 38. This allows for capital letters (A-Z), digits (0-9), hyphen (-), and period (.).
- The
- Use either
MocrCopyFontorMocrImportFontto add character representations from an existing SEMI font.
Quality and scale are important
Using high-quality character representations will produce the best results. OCR processing relies on using the cleanest font characters possible.
When using an M_GENERAL OCR font context, broken characters and spaces, even if expected in the target string, should not be defined in the font. Instead, you should enable the ability to read broken characters using MocrControl with M_BROKEN_CHAR, and/or enable the ability to read spaces using MocrControl with M_BLANK_CHARACTERS.
When using an M_GENERAL font context type, the threshold between the characters and the background must preserve the shape of the characters and have a clearly-visible point of differentiation (binarization).
If the characters in the target image are brighter than the background (for example, white on black), then the character representations included in your font must also be of characters that are brighter than the background. The foreground is specified at context allocation time (MocrAllocFont) and can be changed later using MocrModifyFont with M_INVERT. This changes both the character representations and the setting specified at allocation time.
If the size of the character representations in the font is not the same as those in the target string, you can calibrate the font (discussed later). Alternatively, when the physical size of the character representations of the OCR font differ from those in the target image, changing the size of the character representations of the OCR font could improve the robustness of the search. To change the size, use MocrModifyFont with M_RESIZE. Changing the size of the font permanently in the OCR font can be faster than resizing the font before each read/verify operation, as is done when the font is calibrated.
Visualizing
It might be necessary, at some point during application development, to display the character representations of your Aurora Imaging Library OCR font. To do so, use MocrCopyFont to copy the character representations to a displayable image buffer.
Erasing characters
To remove a character from the OCR font, use MocrControl with M_CHAR_ERASE and specify the ASCII code associated with the character representation to remove. An OCR font can contain a limited number of characters; this number is set during OCR font context allocation. Removing unused or erroneously added characters is the easiest way to assure that these characters will not be used when looking for matches in the target string and that there is space for new characters to be added.