Q&A: How “XDocs” Alters the Paradigm for Gathering Business-Critical Information

Editors’ Update: Feb. 10, 2003 — The product referred to in this article under the code name “Xdocs” has been named Microsoft InfoPath.

REDMOND, Wash., Oct. 9, 2002 — It’s no secret that organizations today have little trouble collecting massive amounts of information. Workers everywhere generate content that contains valuable data every day – in email messages, status reports, and the like. But they also spend an inordinate amount of time searching for that same information when they need to reuse it. When, and if, they find it, they often spend even more time re-keying the relevant data into another document.

Why hasn’t technology made it easier to gather and integrate all this content we generate? The problem is that many of the documents we use for everyday business processes aren’t amenable to automation. While data capture and validation has been a core component of traditional forms for some time, the technology needed to automatically gather, say, valuable data from a text document, hasn’t been available. Until now.

Microsoft today announced a new product in the Microsoft Office family, codenamed
“XDocs,”
to help solve this growing problem.
“XDocs”
looks and feels like a traditional word-processing program, but has all the sophisticated data-capture capabilities of a forms package. Built from the ground up to work with XML (eXtensible Markup Language),
“XDocs”
can gather information that has been generated from documents in which customers can define their own schema, or the structure and the type of content that each data element can contain.
“XDocs”
can then integrate that information with existing databases and servers, making it easier to reuse data across the enterprise or via XML Web services.

Jean Paoli, the XML architect behind
“XDocs”
at Microsoft, spoke to PressPass about how companies can use the tool to better manage their business processes. Paoli, who is one of the co-creators of the XML 1.0 standard with the W3C, believes that
“XDocs”
represents a revolutionary leap in XML technology. He has been a significant player in the worldwide XML community since 1985, when the technology was then known as SGML.

PressPass: Can you talk about the innovation behind


XDocs


?

Paoli: What’s interesting and unique about
“XDocs”
is the type of information it allows people to gather.
“XDocs”
lets companies design and edit what people in my field call
“semi-structured”
documents, or documents that have regions of meaning, in the same way that columns in a database have meaning. While the tool provides great design and editing capabilities for traditional forms like purchase orders and equipment requests, what’s innovative is that
“XDocs”
squarely targets information that historically has been more difficult to capture, like business-critical data contained in sales reports, inventory updates, project memos, travel itineraries, and performance reviews.

We think of
“XDocs”
as a hybrid tool, because it combines the best of a traditional document editing experience, such as a word processor or e-mail program, with the rigorous data-capture capabilities of forms. With
“XDocs,”
organizations can easily design their own document templates that contain customer-specific schema for gathering information. What this means is that the customer defines the overall structure of the information that will be gathered from the
“XDocs”
template, and what type of content each data element will contain. Being able to define your own schema is a critical business advantage, because no one knows what kind of information your company needs to gather better than you do.

PressPass: But isn’t most mission-critical data typically entered into forms anyway?

Paoli: Frequently not. Think about a status report compiled by a salesperson. These are usually created in a traditional document editing application, but they often contain data that will be used over and over again, not only by the salesperson who filed the report, but by co-workers and managers as well. In the report you’ll find information about some sales, what was sold, and how much for each particular company, and possibly some of the problems customers have encountered with the product and the actions that were promised to remedy them.

Now, the salesperson could have used a basic forms application to create the report, but he or she wouldn’t have spell-checking, formatting — all the tools to which people are accustomed to using in traditional document editing applications. Without the familiar document editing experience, these forms can be hard to use, so people sometimes don’t use them as often as they should. When this happens, valuable information gets lost. Another drawback to using a form for this type of document is that classic forms are static on the page, so they can’t expand, which is a big problem when trying to enter data that is constantly growing. Not being able to provide added information — say an optional executive summary, for example — makes it difficult, if not impossible, to convey the full context of the data. The result is that people end up using multiple tools to get their jobs done, and they often end up losing half the data they collect.

“XDocs”
is similar to a forms package in that it provides all the functionality you could imagine from forms, like the ability to structure and validate the data, but it lets you do so much more. So for the first time we’re making available a tool that gives people all the best of both words — the rigor of forms and the ease of use of word processing, all within the familiar Microsoft Office environment, to which millions of people are accustomed. This is a tool that in the XML community we have been trying to build for a long time — to put the creation of XML content in the hands of the masses.

PressPass: What role does XML play in


XDocs


?

Paoli: XML is about creating documents in which the content is delimited, or set apart, by tags that explain the meaning of that content. Of course, the innovation behind XML is the fact that it can describe a wide range of information.

“XDocs”
has been built from the ground up to understand XML. The underlying structure of the information that is gathered using an
“XDocs”
template is described using a schema. A schema describes how your data is constructed, in the same way that a blueprint describes how a building is constructed. Because
“XDocs”
understands XML at its core, customers can define their own business-specific schema using the latest XML standards. This is a significant advantage, as I said earlier, because it lets organizations determine for themselves what kind of data they want to gather.

Native support of XML also means
“XDocs”
can send data using these customer-defined schemas to backend systems via XML Web services. And this is another area in which
“XDocs”
is innovative.
“XDocs”
is the first tool that can gather and send, or receive and read, XML data from a Web service without having to first translate the data to the .xml file format. The benefits of this are enormous. Because XML is the native file format of all information that is gathered,
“XDocs”
reduces translation errors and the need to do custom programming, thus reducing development time and costs. This level of support in
“XDocs”
also lowers the cost of developing solutions that use this data, because the data is represented and structured the way you need it from the very beginning.

PressPass: Can you talk a little bit more about what you mean when you say that


XDocs


puts the creation of XML in the hands of the masses?

Paoli: For a long time, XML has been used mainly on servers and other things that information workers don’t see. But
“XDocs”
gives people an interface with Microsoft Office-level quality that allows them to easily create and gather information on top of the core XML model. No one else has been able to do that until now.

PressPass: What are some of the feature highlights of


XDocs


?

Paoli: A lot of features in
“XDocs”
are the result of a key architectural design decision to adhere to the XML paradigm of separating the data in a document from the formatting. This separation is why rigorous data capture is the essence of
“XDocs.”
And, as I indicated earlier, one of the real innovations in
“XDocs”
is the fact that users can see and modify abstract data structures using a traditional word-processing environment.
“XDocs”
associates what we call
“Editing Views”
to those abstract data structures, providing users with all the familiar tools to which they’ve grown accustomed, like rich text formatting, table and picture support, and AutoCorrect. In addition, industry-standard XML schema validation and business logic validation in
“XDocs”
prevent costly data errors. And
“XDocs”
lets users save forms to their computers so that they can work on them at their convenience, even offline.

For forms designers,
“XDocs”
provides the same, integrated
“What You See Is What You Get (WYSIWYG) design environment. So you start with custom-defined schema and build a template around it.”
XDocs

includes a built-in set of controls for easily laying out forms, as well as a set of 25 ready-to-use sample forms.

PressPass: What role do you believe


XDocs


plays in the Microsoft Office family?

Paoli: In the next release of Microsoft Office, codenamed Office 11, we’re taking a quantum leap in altering the landscape of enterprise integration by integrating people, systems, and data. Microsoft’s vision for Office 11 is to seamlessly connect the information worker to the different islands of data in the enterprise, whether the data is contained in Microsoft Word documents, e-mail messages, an internal company database, or even an external third-party database. To do this, we are altering the paradigm for the Office family of products: Instead of asking customers to structure data based on the software product they use to generate that data, we want to enable customers to use the data defined by their own schemas — however they want to structure it.

I am just as proud of what has been achieved for XML in regard to Office in general, as I am in regard to
“XDocs”
itself. We’ve made significant investments in many of the Office products so that they natively support XML and customer-defined schemas. So the nature of many of the products in the Office family will change fundamentally to support XML Web services. This is a fundamental advance — putting the creation of XML content in the hands of the everyday worker — and it will drive so many new applications that we have been waiting for in the XML community for many years.
“XDocs”
is one element of this overall strategy, giving information workers the desktop tools they need to gather and integrate information that can be re-purposed and connected to the overall enterprise.

PressPass: What is Microsoft’s long-term vision for


XDocs


?

Paoli: It’s really the vision behind Microsoft’s overall XML Web Services strategy: to make it easy to create, access, and share XML data between different systems on the network. With
“XDocs,”
we have for the first time an end-user product that at its core understands XML using customer-driven schema. So you could say that the information worker, for the first time, has the capability to exploit XML and XML Web Services. That means information workers can easily connect to any cross-platform Web Service to share data. This is something that the XML community has been trying to achieve for a long time.