This document serves to test various sample HTML elements and there representation in the PDF.
We tried to address many of the missing elements in some of the web page print solutions we found. In many solutions, they are merely converting the HTML fed to the browser, not the content from the browser DOM. We wanted to be able to process live content, that could vary based on user interaction.
We found most all solutions lacking in SVG support. You cannot actually inject the SVG into the print document, you can only insert an image of the SVG in raster form. We wanted high resolution vector-based SVG content. We wanted to leverage formatting technology far superior than a "PDF" writer and/or browser which is not designed as a print composition engine.
We also found many situations where we wanted the flexibility of just having some XML content in the page for special formatting. You can easily extend this solution to format any XML markup in your page, not only HTML.
Push the "Print It!" button, get your result and keep reading.
Throughout this document you will also see print buttons like this located on certain headings. This is a demonstration of using the code to print a single <div>. There are also checkboxes next to these headings, you can select several checkboxes and use the "Print Selected!" button on the top bar to generate a document with multiple sections, one for each of the checked divs. These are all demonstratons of the power of the Javascript library and XEPOnline backend for generating PDFs.
The XEPOnline javascript library extracts the content of a named <div> element in the HTML page. It processes that <div> and embeds all css-based styling into the HTML. You can use the library to generate a PDF of any <div>, including those generated with dynamic content as it is processing the browser DOM of the HTML at the time execution. In other words, this is not some canned HTML to print, it is the current HTML to print.
We'll cover advanced use concepts in the future, but the whole system is extensible and leverages XSL FO technology for print file generation while maintaining ease of styling via css. The extensibility allows you to customize output in many ways, to extend and expand upon a simple File->Print operation. Because it uses XSL FO at the core, one can certainly do more than generate PDF. You could generate PostScript, AFP or XPS print files if desired.
Because HTML is not XML, the solution does some additional processing to ensure a well-formed XML document is exported. That well-formed XML document is generated and sent to the XEPOnline formatter via REST with a reference to a specific XSL stylesheet for processing the HTML tag content to create XSL FO.
XEPOnline accepts the REST request, attempts to format the document with RenderX XEP and returns the result. There are several opions to return data from initiating a download to base64 encoding the result and inserting it into the document.
The XSL which processes incoming (X)HTML to XSL FO is written in XSL version 1.0. Most modern web environments make use of very few tags and control appearance through headings, div's, img's and span's with css styles. The template not only supports these core elements but also has support for many legacy structures. The following HTML structures are currently supported:
The new HTML5 tags <section> and <article> are only supported as block elements, not seperate page generating elements at this time. One <header> and one <footer> element are allowed inside the printable <div> and are used for printed document header and footer.
A selected list of appearance styles are passed in css format to the XSL stylesheet which parses and interprets css. There are some special considerations in the XSL to handle differences between HTML css and XSL FO style attributes. Every attempt is made to keep the XSL FO and resulting PDF output as close as possible to the css/browser representation.
The solution also takes into account hidden
items, those with "display" attribute to "none". These items are not extracted to
the print file. Clicking this
paragraph
I am a test paragraph!
Note also that some other attributes you can use within the HTML that have no effect on the appearance of the page can also be used. For instance, you can use the css attribute "page-break-before" to start a new page.
This solution also supports various aspects of print media. The print media stylesheet is applied to the data before sending it to be processed and as such, you can affect various style changes as well as inject page information this way. Some browsers fully support CSS3 print media @page attributes while others do not. However, the system was created so you can either use @page or you can specify in code.
The following shows various block-level structures, some with standard HTML interpretation of the style and other with css styling applied to augment/change the formatting.
This is a paragraph with some CSS styling applied
This is the standard <blockquote> element. It provides an indented look to the text. This is rarely used in the days of <div> elements, classes and css styling but it is supported.
This is the standard <pre> element. Again, most folks would use css to style output like this, but for old HTML compatibility we support the "pre" element.
To apply absolute sizes (width or height), you need to use floats. The following set of examples show float styles applied to <div> elements.
Even stackable floats
Testing some inline elements. There are some elements still used like
"b" for bold and "i" for italic. They can even be combined
like bold italic underline. The more modern approach of
using <span> with classes and css is also
supported. A variety of other elements supported like the quote element
evensuperscripts andsubscripts.
Tables: This section tests various tabular structures including borders, colors and spanning.
Heading 1 | Heading 2 | Heading 3 |
---|---|---|
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | ||
Body Cell 1 | Body Cell 2 | Body Cell 3 |
Body Cell 1 | Body Cell 2 |
A fancier table with all CSS styling. Also note that this table implements "thead" and "tbody" which map to the appropriate XSL FO constructs so that the table header and table footer is repeated at a break in the page.
Company Header | Contact Header | Country Header |
---|---|---|
Company Footer | Contact Footer | Country Footer |
Alfreds Futterkiste | Maria Anders | Germany |
Berglunds snabbköp | Christina Berglund | Sweden |
Centro comercial Moctezuma | Francisco Chang | Mexico |
Ernst Handel | Roland Mendel | Austria |
Island Trading | Helen Bennett | UK |
Königlich Essen | Philip Cramer | Germany |
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada |
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy |
North/South | Simon Crowther | UK |
Paris spécialités | Marie Bertrand | France |
There are many list styles in HTML. Currently this solution supports the following:
Testing un-numbered lists, first a simple list
Another list with list-style-type setting in CSS
Now, nested lists
Testing the same lists as above, numbered, first a simple list
Another set of lists with list-style-type setting in CSS
Now, nested lists
A very common use of the list element in HTML is to provide specialty structures like breadcrumbs or navigation tabs. While most of these structures are likely not something you would want in the print output, we wanted to do as close as possible representation of the HTML. These type of lists normally make use of the attribute "display" set to "inline" on the list tag.
We have attempted to take this into account, mapping a list HTML tag with "display:inline" differently than a normal list. This example actually is a set of lists with css applied that turns that list into a breadcrumb. It also shows some advanced concepts introduced later like creating links to internal destinations in the document.
Or maybe a cool effect like a set of tabs you can drop in the document to format and control the navigation.
Images in web pages can be linked via absolute or relative references or they can be embedded directly in the web page using a data-uri scheme. All of these methods are supported. In addition, modern browsers make signifcant use of SVG as a format that can be directly included in the page. This is why we selected XSL FO for back-end processing of the information. XSL FO and specifically XEPOnline supports SVG not by converting the SVG to an image, but by processing SVG to the output, retaining all the vector-based information.
This is a static SVG inserted directly in the HTML page.
This is a PNG inserted using "src" attribute of the "img" tag. When formatting, XEPOnline needs to be able to access the image in question so the path of the web page is sent using xml:base to provide XEPOnline with the ability to resolve the path to the image.
Processing the <img> also requires special handling because HTML5 allows non-closed tags. Since XSL FO is an XML-based processing solution, the XEPOnline javascript handles this by creating valid XML first before submitting.
Take special care if using auto-scaling of images. If you desire an exact size of the image, it should be specified in the HTML or css. The width is carried through to the PDF. The following two images are one locally referenced on the submit website and the same image remotely referenced.
This image of a folder icon is directly in the web page as base-64 encoded.
The following chart is dynamically generated using Anychart JavaScript library. This shows that even page-based dynamic information can be sent to the engine after processing in its full SVG format. There is not pre-processing of this SVG to image format at any time. The full vector information is carried through to the PDF.
The following chart is dynamically generated using d3 JavaScript library. You can even see that dynamic SVG is printed as it exists at any time. This sample is dynamic, click this paragraph and the pie chart data changes. The print file will represent what is on the screen at that time.
Hyperlinks can be carried into the output PDF.
This link should go to the XEPOnline web site.
Links can also point to internal destinations which are carried through to the PDF. Using many of the concepts above, you could create a list that is the document table of contents styled with css to add images and control appearance.
Many current implementations in HTML make use of responsive designs. The challenge is to try and replicate this HTML into the printed page. The trigger here is the float element. We will start with an easy example, how about replicating a drop-cap.
Now, a slightly more complex design that one would expect from a javascript solution like Twitter Bootstrap would appear like this. We are directly writing the styles and not using Twitter Bootstrap for testing purposes.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas at hendrerit eros. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus turpis diam, pellentesque vitae interdum nec, laoreet in neque. Curabitur pretium, nunc sed suscipit blandit, ligula magna adipiscing mauris, rhoncus suscipit enim purus et dolor. Duis felis erat, posuere in orci a, tempor convallis libero. Fusce ornare fringilla nunc. Quisque sodales malesuada tellus. Vestibulum ac elit non nulla placerat tempus. Mauris eget cursus metus. Mauris dignissim mi in nulla pulvinar, id pellentesque odio dapibus. Aenean blandit rutrum blandit. Etiam ac pulvinar ligula. Nulla laoreet eget augue id luctus.
Suspendisse ornare pellentesque lacus ut volutpat. Nunc sit amet tellus justo. Phasellus ut pellentesque augue. Integer sollicitudin felis eget neque porttitor elementum at sit amet libero. Ut blandit enim ac quam porttitor dapibus. Morbi viverra hendrerit libero, et commodo felis porttitor ut. Integer vulputate, mi eu auctor imperdiet, libero velit interdum metus, eget auctor mauris mi a urna. Aenean feugiat, purus ut ullamcorper fermentum, nisl lorem adipiscing dui, tempor semper quam ligula ac est. Vestibulum ante erat, laoreet non eros ut, iaculis gravida augue.
Donec sollicitudin felis in posuere tempus. Donec enim neque, viverra at sem vitae, elementum faucibus diam. In at eleifend orci. Fusce posuere facilisis nisl, vitae iaculis magna tristique eu. Nam ut est vel dolor consequat bibendum. Sed vitae eros massa. Phasellus scelerisque dolor et sapien elementum luctus. Phasellus libero tellus, pharetra ornare auctor in, aliquam eu lectus. Pellentesque hendrerit est massa, quis fermentum leo tempus vulputate. Suspendisse potenti. Duis sed ultrices eros. Curabitur et tempus augue, in egestas odio. Mauris malesuada, est vel lobortis egestas, nunc dui aliquam erat, non ultricies dui ipsum eu turpis. Vivamus ornare non nulla non dapibus.
This section when printed alone or printed through selecting individual sections shows that you can inject a background image behind the page. This is accomplished using a "page-background" attribute as part of the <header> tag for this section.
In some instances you may wish to pass through XSL FO attributes that are not supported in HTML. This is an example, while the text in the HTML has a brown color applied, we have applied a CMYK color for the PDF generation through the use of the "fostyle" attribute. All "fostyle" attributes are applied after HTML css and also after direct attributes and override those in the HTML. This paragraph also has "text-align" justify in the HTML and font-stretch, font-size-adjust and hyphenate only in the PDF output. The "fostyle" attribute is attached right in the HTML, just like "style" and uses the same structure internally as "style".
Since HTML supports arbitrary XML inside the markup, one of the easiest ways to
do some simple footnotes is to simply use the regular XSL FO
markup
<footnote> <sup>your character here for inside the HTML</sup> <footnote-body> <block> <sup>optional footnote char here for inside print footnote</sup> your inline HTML markup here for footnote formatting in PDF. Only inline styles (b, i, span) are supported in the css provided for footnotes. Of course, you can implement your own. </block> </footnote-body> <footnote>
With today's browser technology, you can use css to style any content you wish. This includes any generic XML inserted into the HTML file. For example, consider that you just wish to place the sales data through some dynamic process into a table. If you examine the source HTML for this page here, you will see only XML. It is 100% styled with external css in the file named "xmlsamp.css".
And this table is formatted according to the css styling, even carried through to the PDF. You so not have to change XML tags, you only indicate their style as "table" or "table-row" or "block" or "inline" in the css and XEPOnline will format the generic XML according to the css styling to print.
You can implement continued-headers using the "fostyle" attribute. You want to mark the actual header with a special property that omits it from the output, then use the first row to implement the header you want at the start of the table.
For the correct appearance to be in the HTML, you want to not display the <thead> element, using @print media commands to set it to display only on print.
Company (continued) | Contact (continued) | Country (continued) |
---|---|---|
Company Header | Contact Header | Country Header |
Alfreds Futterkiste | Maria Anders | Germany |
Berglunds snabbköp | Christina Berglund | Sweden |
Centro comercial Moctezuma | Francisco Chang | Mexico |
Ernst Handel | Roland Mendel | Austria |
Island Trading | Helen Bennett | UK |
Königlich Essen | Philip Cramer | Germany |
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada |
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy |
North/South | Simon Crowther | UK |
Paris spécialités | Marie Bertrand | France |
Alfreds Futterkiste | Maria Anders | Germany |
Berglunds snabbköp | Christina Berglund | Sweden |
Centro comercial Moctezuma | Francisco Chang | Mexico |
Ernst Handel | Roland Mendel | Austria |
Island Trading | Helen Bennett | UK |
Königlich Essen | Philip Cramer | Germany |
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada |
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy |
Alfreds Futterkiste | Maria Anders | Germany |
Berglunds snabbköp | Christina Berglund | Sweden |
Centro comercial Moctezuma | Francisco Chang | Mexico |
Ernst Handel | Roland Mendel | Austria |
Island Trading | Helen Bennett | UK |
Königlich Essen | Philip Cramer | Germany |
Laughing Bacchus Winecellars | Yoshi Tannamuri | Canada |
Magazzini Alimentari Riuniti | Giovanni Rovelli | Italy |
This is our next step, working on providing all the fonts needed and language support. So this section is in development, if you see something not working then provide a hand to help! It's in development and we'll hammer it out in a few weeks time.
Sometimes a cigar is just a cigar.
Joskus sikari on pelkkä sikari.
לפעמים סיגר זה רק סיגר
Kartais cigaras tėra cigaras.
Af un to is en Zigarr eenfach en Zigarr.
He smoked a cigar after lunch.
Efter middagsmaden røg han en cigar.
Post la tagmanĝo li fumis cigaron.
Él se fumó un cigarro después de almorzar.
O, öğle yemeğinden sonra bir puro içti.
I watched a ring of smoke that floated from his cigar into the air.
Obserwowałem kółka z dymu, unoszące się w powietrze z jego cygara.
我看著一輪煙圈從他的雪茄裏冒出來,飄到了空氣中。
私は彼の葉巻から煙の輪空中に漂っていくのをじっと見つめていた。
To the man who only has a hammer in the toolkit, every problem looks like a nail.
للرجل الذي ليس عنده إلا مطرقة في طقم أدواته، تبدو كل مشكلة كالمسمار.
对工具箱里只有一把榔头的人来说,所有的问题都像钉子。
Pour celui qui n'a qu'un marteau dans sa trousse à outils, tout problème ressemble à un clou.
Für den, der nur einen Hammer im Werkzeugkasten hat, sieht jedes Problem wie ein Nagel aus.
För den, de bloot en Hamer in ’e Warktüügkist hett, seht all Probleems as Nagels ut.
Для человека, у которого есть только молоток в ящике с инструментами, любая проблема похожа на гвоздь.
Para el hombre que solo tiene un martillo en su caja de herramientas, todos los problemas parecen clavos.
Temu, kto w skrzynce na narzędzia ma tylko młotek, każdy problem wygląda jak gwóźdź.
道具箱に金槌しか入っていない者にとっては、あらゆる問題が釘のように見える。
I know a good lawyer who can help you.
我認識一個不錯的律師,他可以幫你。
Ich kenne einen guten Anwalt, der dir helfen kann.
Je connais un bon avocat qui peut t'aider.
당신을 도와줄 수 있는 괜찮은 변호사를 알고 있습니다.
Eu conheço um bom advogado que pode te ajudar.
Я знаю хорошего адвоката, который может тебе помочь.
We've done the implementation to support @media print. Would love to implement page-templates (first, last, left/right). Only Chrome supports @media print @page directives though. Guess that will need to wait a bit.
Next a little clean-up on the conversion to XSL FO, there are some issues pointed out in this document. There is certainly more work here to do, we probably did not think of everything.
After that, well we can do form fields ... a fillable HTML form to a PDF fillable form ... that would be cool.