This document serves to test various sample HTML elements and there representation in the PDF.
We tried to address many of the missing elements in some of the web page print solutions we found. In many solutions, they are merely converting the HTML fed to the browser, not the content from the browser DOM. We wanted to be able to process live content, that could vary based on user interaction.
We found most all solutions lacking in SVG support. You cannot actually inject the SVG into the print document, you can only insert an image of the SVG in raster form. We wanted high resolution vector-based SVG content. We wanted to leverage formatting technology far superior than a "PDF" writer and/or browser which is not designed as a print composition engine.
We also found many situations where we wanted the flexibility of just having some XML content in the page for special formatting. You can easily extend this solution to format any XML markup in your page, not only HTML.
Push the "Print It!" button, get your result and keep reading.
Throughout this document you will also see print buttons like this located on certain headings. This is a demonstration of using the code to print a single <div>. Some of the buttons vary settings for print.
The XEPOnline javascript library extracts the content of a named <div> element in the HTML page. It processes that <div> and embeds all css-based styling into the HTML. You can use the library to generate a PDF of any <div>, including those generated with dynamic content as it is processing the browser DOM of the HTML at the time execution. In other words, this is not some canned HTML to print, it is the current HTML to print.
We'll cover advanced use concepts in the future, but the whole system is extensible and leverages XSL FO technology for print file generation while maintaining ease of styling via css. The extensibility allows you to customize output in many ways, to extend and expand upon a simple File->Print operation. Because it uses XSL FO at the core, one can certainly do more than generate PDF. You could generate PostScript, AFP or XPS print files if desired.
Because HTML is not XML, the solution does some additional processing to ensure a well-formed XML document is exported. That well-formed XML document is generated and sent to the XEPOnline formatter via REST with a reference to a specific XSL stylesheet for processing the HTML tag content to create XSL FO.
XEPOnline accepts the REST request, attempts to format the document with RenderX XEP and returns the result. There are several opions to return data from initiating a download to base64 encoding the result and inserting it into the document.
The XSL which processes incoming (X)HTML to XSL FO is written in XSL version 1.0. Most modern web environments make use of very few tags and control appearance through headings, div's, img's and span's with css styles. The template not only supports these core elements but also has support for many legacy structures. The following HTML structures are currently supported:
The new HTML5 tags <section> and <article> are only supported as block elements, not seperate page generating elements at this time. One <header> and one <footer> element are allowed inside the printable <div> and are used for printed document header and footer.
A selected list of appearance styles are passed in css format to the XSL stylesheet which parses and interprets css. There are some special considerations in the XSL to handle differences between HTML css and XSL FO style attributes. Every attempt is made to keep the XSL FO and resulting PDF output as close as possible to the css/browser representation.
The solution also takes into account hidden
items, those with "display" attribute to "none". These items are not extracted to
the print file. Clicking this
paragraph
I am a test paragraph!
Note also that some other attributes you can use within the HTML that have no effect on the appearance of the page can also be used. For instance, you can use the css attribute "page-break-before" to start a new page.
This solution also supports various aspects of print media. The print media stylesheet is applied to the data before sending it to be processed and as such, you can affect various style changes as well as inject page information this way. Some browsers fully support CSS3 print media @page attributes while others do not. However, the system was created so you can either use @page or you can specify in code.
The following shows various block-level structures, some with standard HTML interpretation of the style and other with css styling applied to augment/change the formatting.
This is a paragraph with some CSS styling applied
This is the standard <blockquote> element. It provides an indented look to the text. This is rarely used in the days of <div> elements, classes and css styling but it is supported.
This is the standard <pre> element. Again, most folks would use css to style output like this, but for old HTML compatibility we support the "pre" element.
Testing some inline elements. There are some elements still used like
"b" for bold and "i" for italic. They can even be combined
like bold italic underline. The more modern approach of
using <span> with classes and css is also
supported. A variety of other elements supported like the quote element
evensuperscripts andsubscripts.
Tables: This section tests various tabular structures including borders, colors and spanning.
Heading 1 | Heading 2 | Heading 3 |
---|---|---|
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | ||
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | Body Cell 2 |
A fancier table with all CSS styling. Also note that this table implements "thead" and "tbody" which map to the appropriate XSL FO constructs so that the table header and table footer is repeated at a break in the page.
Company Header | Contact Header | Country Header |
---|---|---|
Company Footer | Contact Footer | Country Footer |
Alfreds Futterkiste | Maria Anders | Germany |
Berglunds snabbköp | Christina Berglund | Sweden |
Centro comercial Moctezuma | Francisco Chang | Mexico |
Ernst Handel | Roland Mendel | Austria |
Island Trading | Helen Bennett | UK |
Königlich Essen | Philip Cramer | Germany |
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada |
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy |
North/South | Simon Crowther | UK |
Paris spécialités | Marie Bertrand | France |
There are many list styles in HTML. Currently this solution supports the following:
Testing un-numbered lists, first a simple list
Another list with list-style-type setting in CSS
Now, nested lists
Testing the same lists as above, numbered, first a simple list
Another set of lists with list-style-type setting in CSS
Now, nested lists
A very common use of the list element in HTML is to provide specialty structures like breadcrumbs or navigation tabs. While most of these structures are likely not something you would want in the print output, we wanted to do as close as possible representation of the HTML. These type of lists normally make use of the attribute "display" set to "inline" on the list tag.
We have attempted to take this into account, mapping a list HTML tag with "display:inline" differently than a normal list. This example actually is a set of lists with css applied that turns that list into a breadcrumb. It also shows some advanced concepts introduced later like creating links to internal destinations in the document.
Or maybe a cool effect like a set of tabs you can drop in the document to format and control the navigation.
Images in web pages can be linked via absolute or relative references or they can be embedded directly in the web page using a data-uri scheme. All of these methods are supported. In addition, modern browsers make signifcant use of SVG as a format that can be directly included in the page. This is why we selected XSL FO for back-end processing of the information. XSL FO and specifically XEPOnline supports SVG not by converting the SVG to an image, but by processing SVG to the output, retaining all the vector-based information.
This is a static SVG inserted directly in the HTML page.
This is a PNG inserted using "src" attribute of the "img" tag. When formatting, XEPOnline needs to be able to access the image in question so the path of the web page is sent using xml:base to provide XEPOnline with the ability to resolve the path to the image.
Processing the <img> also requires special handling because HTML5 allows non-closed tags. Since XSL FO is an XML-based processing solution, the XEPOnline javascript handles this by creating valid XML first before submitting.
Take special care if using auto-scaling of images. If you desire an exact size of the image, it should be specified in the HTML or css. The width is carried through to the PDF. The following two images are one locally referenced on the submit website and the same image remotely referenced.
This image of a folder icon is directly in the web page as base-64 encoded.
The following chart is dynamically generated using Anychart JavaScript library. This shows that even page-based dynamic information can be sent to the engine after processing in its full SVG format. There is not pre-processing of this SVG to image format at any time. The full vector information is carried through to the PDF.
The following chart is dynamically generated using d3 JavaScript library. You can even see that dynamic SVG is printed as it exists at any time. This sample is dynamic, click this paragraph and the pie chart data changes. The print file will represent what is on the screen at that time.
Hyperlinks can be carried into the output PDF.
This link should go to the XEPOnline web site.
Links can also point to internal destinations which are carried through to the PDF. Using many of the concepts above, you could create a list that is the document table of contents styled with css to add images and control appearance.
Many current implementations in HTML make use of responsive designs. The challenge is to try and replicate this HTML into the printed page. The trigger here is the float element. We will start with an easy example, how about replicating a drop-cap.
Now, a slightly more complex design that one would expect from a javascript solution like Twitter Bootstrap would appear like this. We are directly writing the styles and not using Twitter Bootstrap for testing purposes.
This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column.
This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column.
This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column. This is a bunch of text in the column.
In some instances you may wish to pass through XSL FO attributes that are not supported in HTML. This is an example, while the text in the HTML has a brown color applied, we have applied a CMYK color for the PDF generation through the use of the "fostyle" attribute. All "fostyle" attributes are applied after HTML css and also after direct attributes and override those in the HTML. This paragraph also has "text-align" justify in the HTML and font-stretch, font-size-adjust and hyphenate only in the PDF output. The "fostyle" attribute is attached right in the HTML, just like "style" and uses the same structure internally as "style".
Since HTML supports arbitrary XML inside the markup, one of the easiest ways to
do some simple footnotes is to simply use the regular XSL FO
markup
<footnote> <sup>your character here for inside the HTML</sup> <footnote-body> <block> <sup>optional footnote char here for inside print footnote</sup> your inline HTML markup here for footnote formatting in PDF. Only inline styles (b, i, span) are supported in the css provided for footnotes. Of course, you can implement your own. </block> </footnote-body> <footnote>
With today's browser technology, you can use css to style any content you wish. This includes any generic XML inserted into the HTML file. For example, consider that you just wish to place the sales data through some dynamic process into a table. If you examine the source HTML for this page here, you will see only XML. It is 100% styled with external css in the file named "xmlsamp.css".
And this table is formatted according to the css styling, even carried through to the PDF. You so not have to change XML tags, you only indicate their style as "table" or "table-row" or "block" or "inline" in the css and XEPOnline will format the generic XML according to the css styling to print.
We've done the implementation to support @media print. Would love to implement page-templates (first, last, left/right). Only Chrome supports @media print @page directives though. Guess that will need to wait a bit.
Next a little clean-up on the conversion to XSL FO, there are some issues pointed out in this document. There is certainly more work here to do, we probably did not think of everything.
After that, well we can do form fields ... a fillable HTML form to a PDF fillable form ... that would be cool.