In the cold winter of 2008, we were approached by a technical lead at Royal Caribbean Cruise Lines (“RCCL”), regarding the topic of cruise booklet production. Sajan Chacko had seen our Silicon Paginator product (based on Adobe InDesign Server), and he inquired by email:
“Based on an XML stream, can it do the following:
1. Get data from external file-paths specified in XML
2. Reference external sources such as HTML, RTF, or Word?”
This is geek-speak for “does your InDesign-based platform interface with the formats we’re currently dealing with?” And of course the answer was a resounding “Yes.”
This is the sort of inquiry that brightens our days here at Silicon Publishing. We have lived and breathed this sort of technical work for decades, and whatever the year, we’re using the very best technology to make it even more efficient.
2008 was no exception. I could tell immediately that Sajan would be happy with InDesign Server as the core of a solution, because we had already mastered the art of just this sort of workflow.
The Challenge
While InDesign Server is focused on the output of documents through automation, the challenge that RCCL faced actually extended upstream into the larger-scale question of “how is content managed?” They were re-inventing the entire process of creating and managing the textual content of their cruise booklets (though some data sources were to remain unchanged), so it is not as if our work were limited to the InDesign Server dimension.
RCCL came with a very strong technical team (Sajan turned out to be a brilliant architect/coder), with the ambition to completely own the application in terms of ongoing changes and maintenance. This not uncommon in our work; we have never sought dependency relationships, but instead thrive on empowering our customers. So before we defined the architecture of the solution, we had to decide which organization should take on which part of the implementation; who was responsible for what, and critically, what were the specifications for handoff points between application components. We knew that with clarity around such points of interface (for example an agreed-to schema for any data exchanged), both teams could effectively work in parallel.
With InDesign Server and Silicon Paginator, clients come to us with a wide spectrum of “data-readiness.” We see several different scenarios, including the following:
- “We can only get this from our system. You get this as input, period. Nobody really remembers what the schema was, and it’s a bit buggy, but this is all we can get.”
- “We normally get out XML, with this schema, which is documented here. Of course our devs can normalize, denormalize, and give you anything you like.”
- “We don’t actually have a data source, yet… in fact our system of record is currently data sitting in the InDesign document.”
- “We are re-architecting the system: we can show you our existing data structures, but they will be changing and we can still make changes.”
RCCL fell into the 2nd and 4th categories above. Their relational data, things like passenger names and ports of call on an itinerary, was in a relational database that was well-managed by very savvy people. We had no worries that the data would come to us in an efficient way.
But the text content was not quite so clear: although they had some pre-existing methodology, they were actually planning to make a big change in content management. And while they had prototyped and planned things hypothetically, their critical dependency for perfect output had led them to us to answer the question:
“Can it reference external sources such as HTML, RTF, or Word?”
Selecting the content source for text
So the initial critical path was to determine was how to handle the text content. We had to work together to define it, both in terms of a content authoring solution, and our automation of document rendition.
In response to Sajan’s question, I answered:
“Yes, we can reference HTML, RTF, or Word, but in your case, HTML is far and away the best choice.”
This is not to say that Word doesn’t have its place in similar solutions. But we had already seen what RCCL had in place, and their users were not married to Word. Their dev team was already providing an effective web portal to content authors and stakeholders. This appeared to be the ideal place for the new authoring solution: simply another component in an already familiar environment.
Web-based interfaces for editing can have a greater degree of control over and validation of source content. Since Paginator just happens to have robust functionality to render high-quality InDesign output from HTML, all signs pointed to this path forward.
HTML was agreed to, and it was also agreed that Sajan’s team would build that side of the solution. Our role at Silicon Publishing was to establish detailed requirements for the structure of the HTML: this critical point of interface served as the output of their content authoring/approval process, and as the input of our InDesign Server-based rendition engine.
Defining the format was just the first step in architecting the solution. We next had to to assess the vocabulary of HTML elements to be rendered, to ensure that the system was ready for all document constructs that would come up in automated production. We had to answer questions such as:
- Are there bulleted lists? How many levels can lists be nested?
- Are there tables? Can cells be merged? Can a list be contained in a table, or a table in a list?
- Will content sections also reference variables?
As RCCL developed the web front end for managing the content, and we customized the Silicon Paginator platform for rendering output, we continually validated that the handoff point was correct, even as this evolved to support authoring and rendition requirements on either side.
Document components with inline variables
Cruise booklets, not unlike financial statements, consist of sections (1 or more pages), which in turn consist of components (which may take up part of a page, an entire page, or multiple pages). The components may comprise of static (unchanging) elements, or may mix and match static and variable content.
For example, a welcome letter may start with “Dear {{FName}},” which would then render to each recipient with their own name. There may even be artwork that is selected conditionally based on data, so an image could be swapped out based on a data element related to recipient data, cruise data, or some other source.
RCCL built the authoring system to let authors include variables inside the components, which were authored and maintained one by one. As the content had to go into multiple languages, component-based authoring supported translation well. Content also could be re-used across the multiple brands (Royal, Azmara, etc.) offered by the organization, and this was also served by the component model.
Data-generated sections and variable resolution
As we went through the discovery process, the picture became clear. The inputs to Silicon Paginator were the following:
- Relational data in tabular form, which flowed into tabular output: for example the ports of call on an itinerary or the passenger list.
- Text components in HTML, which included variable placeholders to be resolved prior to rendition.
- InDesign templates, which represented the styles and geometry of rendition intent.
- Graphic assets, such as photos and vector art.
Paginator was to ingest these, resolve the variables in the text components with the relational data, select page templates and flow the components through these, and render out personalized booklets. The process had to be 100% automated, as they were producing more than 10,000 20+ page booklets a day, and it couldn’t possibly be proofed beyond very limited spot-checking.
Thus we were challenged with making sure specifications for data, the HTML, and the templates were perfectly clear. We also wanted to automate, to the extent possible, validation that information was complete, and be sure to gracefully manage edge cases (such as a passenger who wishes to be known by a single name), or prevent incomplete records from being rendered.
Pagination with formatting rules
Finally, the output had to look good. RCCL had stringent brand requirements and quality design aspirations, so to the extent possible we automated logic that human designers explained to us:
- What happens if there’s extra whitespace?
- What if the text doesn’t fit in the frame?
- When should a section grow to add to another page?
- How can we ensure that output comes out in page counts divisible by 4 (signatures)?
We applied formatting rules, copy fit algorithms, and pagination logic based on our Silicon Paginator platform, iterating with RCCL stakeholders, developers and designers to get it right. There was plenty of nuance to the solution:
- In the first place, the sections that a recipient got would be determined from the data. These were not always the same, even on the same cruise: business rules came into play.
- Once the sections were determined, their sequence would be determined: again, this could vary based on different parameters.
- Some sections could grow in size, even adding pages if the size of source content demanded it; others would fit in a defined area no matter what.
- Whitespace was to be avoided: if another page had to be added, but there wasn’t much to put on that new page, styles would be adjusted to minimize whitespace: in some cases, filler images or optional text would be added automatically to fill whitespace.
Most of these requirements were known at the outset of the project, though there was some inevitable iteration as things were tested and as ideas came up. We aimed to make the solution as extensible as possible, so to the extent possible we would express logic in XML that RCCL could modify as needed after the system was brought to life.
The Solution
We worked several months iteratively detailing out the solution, and arrived at a really powerful system. On their side, RCC had created a streamlined system for maintaining the content. On ours, we automated the publishing process to receive the data, content, graphic assets, and templates, and flow out print-ready output.
Yet one of the goals of the solution was to avoid printing every piece. Our Paginator solution also used the InDesign Server engine to render high-quality raster images of pages, along with object coordinates/metadata in XML sidecar files, to feed electronic distribution.
We and RCCL were both quite happy with the initial results: the output looked handcrafted, even though it would run entirely lights out, without any human intervention. It passed a test suite of “worst case” stress tests, things such as longest possible name, most and least itinerary records, large and small numbers of passengers.
Handing off template setup
We constructed the first set of document templates, as we were defining the template setup process in a way that would be very easy for RCCL to maintain and extend when we finished. With one brand prototyped, we trained RCCL in the process of template setup, such that they were able to continue for the other brands.
Because we used standard Adobe InDesign files, with familiar techniques such as layer naming, named swatches, and paragraph/character/object styles, it was not a steep learning curve. Most design concepts could be implemented simply by updating named styles: in some advanced cases layer names or other forms of metadata could inform pagination processes, to specify, for example, the specific logic for how text would be automatically fit within a text frame before the program determined that another frame or page had to be added.
Deployment at scale
Once we were certain that the application produced correct output given proper inputs, we assisted RCCL in deploying to their environment. The challenge for us was throughput: given the high number of daily orders, and the volume and the size of their documents, multiple InDesign Servers were required. Each ran parallel instances of InDesign Server. Until we had real world documents and data flowing, we really didn’t know what would be required, so at that time (servers go quite a bit faster a decade later) the requirements went beyond what we’d originally projected. We breathed a collective sigh of relief when throughput reached an acceptable level, and the system went into production.
The solution remains in production to this day, having produced millions of cruise booklets each year since inception. It has been extended to have new templates, new data structures, and other changes, with minimal involvement of our developers in any changes. We are happy to have been able to provide an extensible solution that could stand the test of time.
Lessons Learned
This solution validates one approach to content maintenance for variable data publishing; managing text components as HTML with variable placeholders. It also shows the power of InDesign Server, for both print and web publishing. InDesign Server has some key benefits that make it ideal in this and similar use cases:
- It offers the highest quality of output for print and web graphics.
- It uses templates directly from the most widely-used tool for print-ready document creation, Adobe InDesign.
- It renders variable content with literally the same InDesign engine that designers know and trust.
- It is exposed to automation more completely than any alternative, allowing our Silicon Paginator application layer to implement spatially aware graphic distribution, copy fitting, and other pagination algorithms that allow for a “designer made” look across thousands of personalized documents.
Yet InDesign Server is the raw composition engine. On the RCCL side, they built a beautiful content management solution orthogonal to the InDesign Server solution, while on our side we tailored the Silicon Paginator platform (on top of InDesign Server) to provide template-driven publishing capabilities that efficiently fulfilled the RCCL publishing vision at scale.