Document Image Formats
The SLA will try to process document images in all common formats. However here are some recommendations for document image file formats that are particularly well suited for e-filing purposes:
- TIFF (Tagged Image File Format) is very common with electronic documents that have been scanned from an original paper document. TIFF is a very feature rich format that allows images to be bitonal (Black & White), grayscale or color with a number of different compression methods. For text documents the best results are achieved with bitonal images and CCITT Group IV compression. This compression algorithm was originally created to reduce the bandwidth needed for Fax transmissions and is equally well suited to do this for sending images across the Internet. Another optional feature of TIFF is that it allows multiple images (all the pages of a document) to be stored in a single file. While TIFF is not an international standard (the official specification is managed by Adobe), it is nevertheless a de-facto standard due to its wide support by many vendors on all platforms. It is worth mentioning that TIFF supports compression algorithms that are perfectly fine for continous tone images (photographs) but not for text documents due to lossy compression and introduction of compression artifacts. A common compression algorithm used with color TIFF images is JPEG which is NOT recommended for document images.
- PDF/A (Portable Document Format/Archive) is a subset of the popular PDF specification that has become international standard ISO 19005-1:2005. This standard excludes features of the general PDF specification that can make sharing and longterm archiving of documents difficult. Specifically PDF/A omits audio and video content, javascript and launching of executable files. It also requires that all fonts used in the document are embedded (no external references or assumptions about specific fonts being available at the time the document is viewed or printed) and requires some PDF/A specific metadata to be included. PDF/A like TIFF allows all pages for a document to be contained in a single file.
- PDF (Portable Document Format) is a versatile and very popular format for publishing electronic documents. However some of the features that make PDF files so versatile also can cause problems when documents are meant to be shared between organizations or when those documents are to be archived and need to still faithfully reproduce the original document after many years of software evolution. Even without strict compliance with the PDF/A specification, as long as PDF documents are self-contained (no external references to fonts other than the Base14 fonts and no external references to any other content), static (no dynamic content produced by embedded scripts), without DRM (no digital rights management restricting viewing or printing) then regular PDF files are perfectly fine for e-filing purposes. PDF documents can be created from word processing applications in which the resulting PDF file will contain the actual text together with font and formatting information. However PDF documents can also be created by scanning paper documents in which case the resulting PDF will contain a raster image. Both types of PDF documents (and any combination thereof) are equally acceptable.
- PNG (Portable Network Graphics) is both a W3C Recommedation and an international standard: ISO/IEC 15948:2003. It was designed as image file format for the World Wide Web and as a result has achieved wide adoption into many imaging applications. While the strength of this format are more on the side of color images, the fact that it always does lossless compression makes it suitable for document imaging purposes as well. With PNG each page of a document needs to be stored in a separate file.