The Mudcat Café TM
Thread #93687   Message #1806094
Posted By: Howard Kaplan
10-Aug-06 - 08:21 AM
Thread Name: Tech: Wanna Make PDF files?
Subject: RE: Tech: Wanna Make PDF files?
I would like to respond to two different messages from JohnInKansas, clarifying some minor details that both he and I will find fascinating but which may exceed the trivia threshold of some other readers.

First, he wrote that "All it knows, and the only information it contains, is where the dots are." This is an oversimplification. A pdf document produced by Adobe knows where the characters are, and it knows in what fonts they are intended to be represented. It is the existence of the original characters as characters, not as their dot representation, that allows text to be extracted from a .pdf for pasting into other formats.

Second, he wrote that I was confusing "metafile" with "vector graphics". In the Windows world, "metafile" generally means either "Windows Metafile" (wmf) or "Enhanced Metafile" (emf). According to the Wikipedia, both wmf and emf are essentially vector graphics formats which also allow the optional inclusion of raster (bitmap) graphics. In this context, "vector graphics" does not mean only geometric shapes such as circles and squares but also text renderable by a TrueType font, where the characters are defined by shapes such as lines and curves. I agree that the term "metafile" could be used more generally, but I am rarely aware of such usage under Windows.

I also disagree with the assertion 'Discarding the vector part, which is usually quite large compared to the rasterized page image, may also be done to make a PDF more "web friendly" by reducing the file size.' A pdf document may assume that common fonts, such as Arial, are present on the displaying/printing device, or it may embed the definition of every font in the document. In either case, but especially in the former case, the pdf representation should be considerably more compact than the bitmap representation. To define the shape of the letter "e" once and then represent it evermore as an 8-bit character is more efficient than to repeat the shape every time the letter "e" is used in text. Even though the bitmap of a page of text is somewhat compressible (as happens in formats such as TIFF and GIF), I would be very surprised if this compression was as efficient as the compression of representing each character with its ASCII code. Perhaps I can find the time later today to run an experiment with a suitable sample document and to post some results.