Legacy data to Web


Typelegacy document transformation
ToolsPerl, Java, SQL, HTML


Video Monitoring Services of America, a market leader in news and advertising monitoring, wanted to offer a new product to their customers, delivering monitored news text and video through the Web. The off-the-shelf tools available to convert the documents generated by their in-house editing system produced poor-quality HTML that was unsuitable for their purposes. The project called for the development of a tool to accept legacy documents and construct customer-facing HTML pages with integrated links to streaming media resources.


A nomadcode developer worked closely with VMS engineers and business personnel to develop tools to transform the legacy documents generated by VMS's existing proprietary system into Web-ready HTML. Throughout the project, extensive use was made of widely-available open-source software solutions, minimizing development costs and reducing dependency on closed proprietary software.

In the first phase, the developer worked with VMS's in-house developers and business team to establish the requirements for the client-facing web pages, then took the mockup graphic layouts created by the VMS designer and implemented them as a liquid HTML page. Next, the developer worked with VMS's own engineers to define the modifications to the editing system output necessary to support automated transformation. The transformation system itself - the bridge between the editing system and the public web site - was implemented as a suite of Perl scripts, interfacing to an Oracle database to recover the source documents. An ad hoc converter then reformatted the documents as HTML and delivered them to the public site.

In a second phase of the project, new requirements from the business side made it necessary to capture detailed information about the generated documents in order to support accounting and online sales. Once again, the nomadcode developer worked with the in-house team to specify an intermediate format that exposed a greater range of useful information. The ad hoc converter could now be replaced by a simpler tool that parsed the new document format and transformed it to valid XML. A custom processor implemented in Java and based on the Xalan engine was then developed to generate HTML from XML input documents. Where the earlier ad hoc converter was limited to a single output format, the custom processor could be configured to apply batches of XSLT stylesheets against the input XML, creating output documents in any format.

When the existing editing system was replaced by a new system which generated XML outputs, the project entered a third phase. XML documents from the editing system were fed directly to the Java processor. The processor was now extended by the addition of PDF generation capabilities, using an embedded FOP driver. Conversion from XML to XSL-FO was performed internally, using the Xalan engine.

The tools developed were integrated with VMS's systems monitoring software, so that any error conditions affecting their operation could be quickly identified and remedied.