MT Pipeline Overview

May 2005

Copyright © 2005 Markup Systems Ltd.
All rights reserved.

The Problem and the Opportunity of XML Processing

XML data processing requirements in the enterprise are growing rapidly, driven by the increasing use of XML in:

To manage these increasingly complex XML processing requirements, sophisticated combinations of XML processing operations, known as XML pipelines, are needed. MT Pipeline addresses this need by providing a complete platform for building and executing XML pipelines within a J2EE-based enterprise environment. MT Pipeline has been designed to enable the pervasive use of XML both inside and outside the enterprise.

The lack of a coherent XML processing model represents a serious bottleneck in enterprise use of XML in general and Web Services in particular. Today, most XML processing implementations are ad hoc, based on custom programming to link together XML operations. Early adopters of XML have already found this approach to result in poor runtime performance along with an unacceptably high cost of program maintenance. Reliance on scripting or other forms of custom programming is far from the enterprise-class solution that will increase adoption of XML for mission-critical tasks such as building Web Services. XML processing demands the same standards that have evolved for all other classes of enterprise software.

The MT Pipeline Platform

MT Pipeline is a complete solution for industrial strength, enterprise class XML processing. MT Pipeline provides a native XML approach to combining data aggregation, transformation and distribution to implement applications and to provide Web Services. It has been designed to meet the performance, scalability, security, and reliability requirements of large-scale enterprises.

MT Pipeline delivers the following key functionality:

More About MT Pipeline

MT Pipeline provides a comprehensive environment for the configuration of XML operations. This environment is necessary because most XML application processing involves a number of operations, each of which may require the use of a different technology from among the XML family of standards. In other words, creating a robust XML application depends on creating the proper combination of these individual XML-based operations.

What is a pipeline?

MT Pipeline defines information flow, not control flow. In a control flow environment, one step must complete before the next step begins. In an information flow environment, the data moves from step to step in the sequence specified, but multiple steps can be operating at once. Powerful configurations of asynchronous processes communicating via infoset streams require an information flow rather than a control flow architecture.

We call a configuration of XML operations, or processing steps, a pipeline. We call each step in a pipeline a component. MT Pipeline provides a built-in library of standard components to accomplish the majority of basic XML processing tasks. Users can write custom components that carry out specialized tasks using the Pipeline API. Using the graphical Pipeline Authoring Tool users can quickly and easily define pipelines by selecting components from a palette and connecting them together. The Pipeline Runtime is the environment for deploying and executing pipelines.

MT Pipeline includes standard components for the majority of XML processing operations that typical applications are likely to require. Custom operations can be implemented as components using the MT Pipeline API. Among the standard components are implementations of most of the major W3C-defined XML standards, including XML 1.0 itself, XML Namespaces, XML Infoset, XSLT, XML Schema, XML Base and XInclude.

Won't pipelines be slow?

One of Markup Technology's key inventions is a highly optimized XML Infoset-conformant representation of the information content of XML documents. It is this representation that we use for inter-component communication, as it captures everything important about the XML data but frees us from having to pass it between components in a serial format which must be dumped and re-parsed each time. This high-performance representation is one of the cornerstones upon which the efficiency and robustness of the product rest.

How do I specify a pipeline?

The MT Pipeline Authoring Tool provides a graphical environment for composing pipelines in an intuitive manner. Pipelines are composed from sets of components selected from drop-down menues. Information flow from component to component is modelled visually. The graphical metaphors and operations of the authoring tool directly reflect the pipeline language that is supported by the runtime.

In addition to its built-in library of standard components, the authoring tool supports the extensible nature of the overall architecture, allowing components and pipeline wrappers to be developed by customers. In fact, whole pipelines can be treated as components, enabling a composition style of pipeline creation, where larger pipelines are built from smaller ones.

Standard components are rarely completely self-contained. The authoring environment prompts for the parameters required by each component as the pipeline definition is built up. One common form of parameterization that will be important to pipeline authors will be the specification of schemas, DTDs, and stylesheets to be used by pipeline components such as validators and transformation engines. Simpler parameters such as XPath expressions are all that is required for components such as filters and splitters.

How do pipelines actually get run?

The pipeline runtime executes compiled pipelines. Compiled pipelines consist of networks of pipeline components with infosets flowing between them. The pipeline runtime reads pipeline definitions and associated pipeline deployment descriptions, then compiles, activates and executes those pipelines. The runtime design and implementation embody a range of special features which enable efficient execution of multi-stage XML processing pipelines. This is a fundamental differentiator: only MT Pipeline provides these features.

The pipeline runtime is responsible for compiling and caching pipeline definitions, as well as for activating and deactivating pipelines, managing the coordination between pipeline steps, generating pipeline events, and enforcing relevant policies. The pipeline runtime may optimize the processing in various ways based on information contained either in the pipeline definition or the pipeline deployment description.

Are pipelines run all on their own?

The runtime can either operate standalone, or within the framework of a server. In the server context, considerable benefit is gained because the runtime caches and re-uses static resources such as stylesheets, schemas and other auxilliary XML documents.

Summary Conclusion

The XML revolution was stalled until the Web Services bandwagon got rolling, and there are still major hidden problems in realizing the XML vision:

  1. The real value of XML to the enterprise is not achieved by the switch to XML for information representation and interchange alone—this enables the leverage which XML provides, but doesn't actually deliver it;
  2. It's processing XML as XML which actually delivers on the XML promise, and companies are finding it much harder than expected to implement this cost-effectively, or even at all.

Markup Technology's MT Pipeline platform solves this problem by providing an industrial-strength approach to implementing XML-to-XML functionality as a configuration of simple steps (a pipeline):

  1. Our pipelines are efficient and scalable because of our proprietary optimized in-memory representation of XML content (the infoset) and pipeline compilation technology;
  2. Our platform dramatically reduces Total Cost of Ownership for XML-based applications because of reduced development time, improved robustness in the face of document type evolution and ease of reuse, deriving from our graphical authoring tool, declarative pipeline specification language, and pre-built inventory of standard components.