Reference Manual

Print2CAD 2018 x64 Artificial Intelligence

AI Function 1:
Enhanced OCR Text Recognition

German
AI Function 1: Enhanced OCR Text Recognition

Enhanced OCR Text Recognition

The text in PDF files can be placed as a native PDF text, as a text deconstructed in lines, as a text deconstructed in hatches, and as a text presented in raster pictures.

To recognize this kind of text the program uses artificial intelligence methods of OCR (Optical Character Recognition) and Symbol Recognition.

 

Enhanced Text recognition allows the program to recognize text in construction plans with different text directions. The direction of a text will be defined using a special editor.

Enhanced OCR Text Recognition
Enhanced OCR Text Recognition

Text Presented in Raster Pictures

Text Deconstructed in Hatches

Enhanced OCR Text Recognition

Text Deconstructed in Lines

Text Representation

The right selection of Text representation is very important for correct text recognition.

The text for OCR text recognition can be placed in PDF as native Text, as text deconstructed in lines or path, as text deconstructed into hatches, or pixel pictures with a text.

The Analysis of a PDF file shall be done before the activation of a text representation. The Analysis of a PDF file shows in separate pictures what kind of text representation is used in the input PDF file.

If you find more than one text representation, choose all of it. Only choose Native Text Representation if the software is not able to convert native PDF text into native CAD text. In normal case do not activate the option "Include Native Texts".

Analysis of PDF file

OCR Parameter: Text Language

The right text language selection helps to build the right words. Print2CAD uses artificial intelligence methods for the text control and an internal dictionary to eliminate unusual text combinations.

OCR Parameter: Maximum Resolution in DPI

The right resolution for OCR text recognition is very important. The resolution has to be as low as possible, but the text has to be very clear and readable. Try first with 300 DPI and push the button “Preview”. If the smallest text is not readable, increase the resolution 50 DPI steps.

 

OCR Parameter: Minimum and Maximum Text Height in Pixel

The parameter for maximum and minimum text height are very important. The preseparation of a text works based on this parameter. Push the button “Preview”. If you see that not all text are separated then increase the maximum height. If you see a lot of free pixels are separated increase the minimum height.

OCR Parameter: Image Threshold

If you choose the raster images as text representation, the threshold decides what pixel  belongs to the color black group and what pixels belongs to the white background. Push the button “Preview”. If you see that the text letters connect to each other then decrease the threshold.

OCR Parameter: Image Threshold
OCR Parameter: Image Threshold
OCR Parameter: Image Threshold

Text Areas

Sample: Threshold = 120

The OCR Text recognition only works if the right text direction can be detected. Unfortunately, in one construction plan the text can exist in very different directions.

A manual preseparation of the text areas with a common direction is needed for well done OCR text recognition.  

Print2CAD offers a special editor for these text areas.

One “Text Area” will be defined with the help of 3 points. The first two points give the text direction and the third point gives the right upper corner of a text box.

Editor forText Areas

In the text area editor you can choose different boxes for “Text Area” and for “Number Area”.

“Text Area” recognizes letters, numbers, and special characters like “+”, “-“ etc. If a number and a letter are in question (like the letter “l” and number “1”) the recognition will choose the letter “l”.

“Text Number” recognizes numbers, letters, and special characters like “+”, “-“ etc. If a number and letter are in question (like the letter “l” and number “1”) the recognition will choose the number “1”.

Text Areas, Numbers Area
Editor for Numbers

If the text area cuts one PDF element, this element will be not considered in the OCR text recognition.

 OCR text recognition

 

Tips:
- Try to separate numbers and letters in different text areas.
- Try to separate in one text area only text with a common or similar text height.
- Try to separate clean text areas with no disruption from other drawing elements.

Text Area Editor

Text Area Editor allows the user to move, change, and delete the text areas.

Text Area Editor allows the user to zoom any part of a drawing.

Never define one common text area for the whole drawing.

A thorough selection of the text through text areas vastly improves the quality of the text recognition.

BacktoCAD Technologies, LLC

601 Cleveland St, Suite 310

Clearwater, FL 33755, USA

 

Email: bc-sales@cad-pdf.com
Phone: (727) 303 0383

© Copyright 2017 BackToCAD Technologies, LLC. All rights reserved. Kazmierczak® is a registered trademark of Kazmierczak Software GmbH. Print2CAD, AzubiCAD, and CAD2Print are Trademarks of BackToCAD Technologies LLC. CADconv is a Trademark of Expert Robotics Inc.. DWG is the name of Autodesk’s proprietary file format and technology used in AutoCAD® software and related products. Autodesk, the Autodesk logo, AutoCAD, DWG are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. This website is independent of Autodesk, Inc., and is not authorized by, endorsed by, sponsored by, affiliated with, or otherwise approved by Autodesk, Inc. The material and software have been placed on this Internet site under the authority of the copyright owner for the sole purpose of viewing of the materials by users of this site. Users, press or journalists are not authorized to reproduce any of the materials in any form or by any means, electronic or mechanical, including data storage and retrieval systems, recording, printing or photocopying.