Making Content Discoverable: Search, Scope, Filter and Associate

This paper proposes a canonical model of content for MSDN, describes four basic discovery tools tied to that model and recommends investments to make our content more discoverable. These concepts may be helpful to anyone who publishes and maintains a large body of content.

Introduction

As stewards of Microsoft’s technical content, our responsibility is to deliver the right information, in the right place, with the right experience. In doing this, we must provide access to the full breadth and depth of all Microsoft technical information without overwhelming developers. We need to deliver simple, focused experiences that help developers quickly and efficiently learn about and adopt new Microsoft products and technologies.

Before developers can consume content they must first discover it. While enabling great web and site search experiences is necessary, it is not sufficient. Because of the size and complexity of our corpus, we must round out our tools for content discovery, navigation and consumption. Towards that end, this paper explores scopes, filters, and associations, why they’re important and outlines some guidelines for their proper implementation.

Taking Inventory

With over 3.5 million topics in the MSDN Library, primary navigation (i.e. the main menu and table of contents) is not an efficient way to locate information. Add in Forums threads, blog posts, code examples, etc. and it’s clear why web search is the easiest way for developers to find our technical content. Unfortunately, developers experience a variety of challenges when searching MSDN.

Issue Example
Larger content sets obscure information in smaller content sets Managed API content tends to crowd out native API content in search results
Multiple platforms with similar or overlapping APIs The Differences Between Silverlight and Silverlight for Windows Phone have tripped up some developers
Multiple product versions of the same topic There are currently six different versions of the System.Xml Namespace topic
Closely related topics may be scattered across the Table of Contents This XmlReader Sample is far removed from the XmlReader Class topic in the Table of Contents
Lack of filtering mechanisms encourages creation of content islands It’s confusing for developers to know in which of the hundreds of Forums they should ask their question

As you can see, the problem is worse than searching for the proverbial needle in a haystack. Our haystack contains embroidery needles, knitting needles, upholstery needles and other related sewing equipment. These challenges are magnified when we deliver content for disconnected or behind-the-firewall scenarios (e.g. Help Viewer).

Due to the scale of our technical content and its dearth of semantic information, we cannot simply rely on search engines to do all of the work. We must thoughtfully provide coarse- and fine-grained tools that help developers find the information they need.

A Content Model

If we are to be thorough in providing tools to tear apart our haystack of content, we need to have a clear understanding of the available options. Let’s start with a canonical model of content:

content model - basic

In this diagram, there are two pieces of content, Item 1 and Item 2. Each piece of content has three components:

  1. Identity – An item’s identity enables direct access to its content. Typically, this is a database record ID, a web page URL, and/or a GUID.
  2. Data – is the actual content displayed to the user. The format is based on the type of data. For example, web page data is stored as HTML and pictures are often stored as JPEG or PNG files.
  3. Attributes – are descriptive metadata about the content which may or may not be displayed to the user. For example, web pages store attributes as meta tags and pictures may store them as EXIF metadata.

Content items can be related to each other through one or more relationships – another fundamental concept in our content model.

  1. Relationships – are structural metadata that describe how content items are interconnected. Relationships may or may not have associated semantics (parent, child, predecessor, successor, translation, etc.) and are hierarchical, equivalency or associative.

With these definitions in place, let’s look at an example of a single content item in this format:

Item: Overview of Coffee
GUID: 56ef45c4-435d-46ad-85b4-831db063fe84
Data Attributes Relationships
“Coffee is a dark, brewed beverage made from coffee beans…” Type: Tutorial
Level: Novice
Subject: Beverage
Origin: Euro-asian
Parent: History of Coffee
Children: Discovery of Coffee, Coffee Production, Cultural Impact of Coffee

Of course, real-life content is considerably more complex. Composite documents incorporate a variety of internal structures and content types. Relationships can be one-way, two-way, one-to-many, etc. But this model is complete enough to support our discussion of content discovery.

Content Collections (Graphs and Sets)

No manual is an island. It is the rare content item that exists in isolation and it follows that we need means to collect content into named packages that enable distribution, discovery, navigation and consumption. The fifth and final concept in our canonical model of content is:

  1. Collections – specify a boundary around a set of related content that supports discovery, packaging, branding, distribution and versioning.

The two basic collection types are graphs and sets.

Graphs

The act of organizing content into a coherent structure is called information architecture (IA), a component of the overall content creation or curation process. The goal of IA is to make content easy to understand and consume. Our IA can be modeled and presented to end-users as a graph.

A graph is an organized collection of connected items. End-user visual representations based on graphs include Tables of Content, geographical maps, image slide shows, etc. Let’s extend our canonical model of content to visually depict the graph that encompasses all of our content and its relationships.

content model - graph

Graphs support a variety of user interactions, including:

  • navigate – move from item to item along lines of relationship (includes the special case of zoom: move from the general, comprehensive or abstract to the specific, targeted or concrete – or vice versa)
  • select – choose an item for display
  • collect – 1 : add or remove individual items to a new or existing graph 2 : connect or disconnect graphs to/from a new or existing graph (the resulting graph may be a subgraph, a supergraph or a combination)

Sets

Often we want to model a group of items that have no prescribed relationships or we want to discard existing relationships so end-users can easily rearrange content to form new views.

A set is a collection of disconnected objects. End-user visual representations based on sets include search results, recently updated blog posts, news feeds, etc.

canonical model - set

Sets support a variety of user interactions, including:

  • sort – order the set by some characteristic of the data or its attributes
  • select – choose an item for display
  • collect – add or remove items from the set

We typically display sets to users as a list which is a simple form of graph. Displaying a set as a list creates pseudo-relationships between items that enable primitive navigation including next/previous item, next/previous page, first item, last item, etc.

Graphs and sets are useful for organizing and presenting information but, by themselves, they don’t solve the large dataset problem. Helping people find a needle in a haystack requires tools to whittle down the haystack.

Content Subsets and Subgraphs

The Internet puts the world’s knowledge and minutiae at our fingertips. With every web search, search engines move trillions of pages of content out of your way so you can look at the relevant subset that’s most likely to help you find both the information you need and the information you didn’t know that you needed.

Our content model defines the components users can interact with to create subsets: identity, data, attributes and relationships. While, in theory, identity can be used to create subsets, its primary purpose is to enable direct access to individual content items. One exception is when relationship information is encoded into identity. For example, asking for row 10 through 20 of a spreadsheet is helpful because these are sequential rows. Asking for items with GUIDs of 7c5b19b1-3109-4ad4-b37e-d73b9d4699e2 through ee4ce0b2-3682-4fc8-b5d3-b1948bac43cc is not useful.

That leaves four basic tools for creating subsets corresponding to the four remaining components in our model. I will call them Search, Scope, Filter and Associate and define them as follows:

Scope – Specify one or more collections to be the target of search and filtering.

Search – Create an ordered list of items containing data that matches a specified value (typically a string).

Filter – Remove items with attributes that don’t meet specified criteria. Filters can be applied to any collection type.

Associate – Explore a list of items related to the current context.

p1These tools can be used in combination. For example, let’s say I’m doing a report on imagery in classic movies. Using a web image search, I can search for pictures from the movie Casablanca and I can scope that search to IMDb.com:

I can then filter those results by image size or image color. If I return to the main page, I can explore other classic movies associated with Casablanca by following “People who liked this also liked…” links.

p2

A more general view of how these tools can be used in combination is depicted below:

combinations

This diagram shows how discovery tools both 1) trim away irrelevant content and 2) transform the collection type. Turning a graph into a list enables sorting. Turning a graph into a sparse subgraph exposes indirect relationships.

We have some level of support for each of these tools already. MSDN scoped libraries are good examples of scoped subgraphs. The TOC filter in Microsoft Help Viewer 2.0 is a good example of a filtered subgraph.

With these concepts in hand, let’s take a deeper dive into searching, scoping, filtering and association.

Making Content Discoverable: The Tools

Search

Search finds content with data that matches an input pattern. Results are displayed in an ordered list. The majority of MSDN site traffic comes from web searches so it’s a high priority for us to enable web search to work well. From a content perspective, we must:

  • Write great content. There’s simply no replacement for good content and code examples. We must continue creating great documentation on subjects that matter most to developers.
  • Improve the semantic information in our content. Our metadata schema should be well-understood, consistently applied and accurately maintained.
  • Organize content thoughtfully. Elegant and robust information architecture helps make content easy to understand and enables content maintainability and extensibility.
  • Improve our search engine optimization (SEO). Ensure that our content supports search engine policies and algorithms to improve discoverability from both web search and site search.
  • Enable site searches for things, not just strings. Efforts like the Semantic Web and schema.org are defining standards that enable search engines to understand structured data and provide richer search results. This exposes structured data in a way that enables both human and machine retrieval.
  • Practice good content hygiene. Move older or deprecated information out of the way in favor of information that’s more timely and relevant.
  • Partner with Bing. We have an opportunity to collaborate on innovative web search experiences. The recently introduced MSDN Instant Answers are a good example.

While there is much room for improvement, we have several initiatives focused on improving search with relatively little attention placed on other subsetting techniques. Let’s continue with a discussion of scoping.

Scope

A scope encompasses one or more collections that define the boundary for other operations (navigation, search, filtering and association). Scoping is the act of selecting groups of content and collecting them into a working scope.

The grouping implementation of scopes varies by application but in the MSDN Library we assign topics to a hierarchical namespace that appears in both site navigation and in the page URL (e.g. http://msdn.microsoft.com/en-us/library/windows/apps/hh464920.aspx).

Scoping can be active or passive. Active scoping occurs when an end-user specifically selects a scope. In both active and passive scoping, the scoped state should be clearly depicted in the UI and should be easily reconfigurable. Passive scoping occurs when we apply a scope on the user’s behalf, for example when they take a search results link into one of our scoped libraries (via the mechanism of canonical URLs). In this way, users gain the benefits of scopes transparently and without changing their current behaviors.

Finally, scoped content can be visually branded to aid in user orientation (“Where am I?”) and validation (“Is this the right place?”)

Why scope?

Scoping can improve your productivity by reducing noise and collisions throughout the content experience and by enhancing the discoverability of related information. Let’s define noise as data that’s irrelevant in the context of your current task. Let’s define collisions as irrelevant data (noise) that is difficult to differentiate from useful content because of its overlapping data or metadata.

Look at how noise manifests in search, the most common way for people to access our content. Search noise manifests as irrelevant search results that obscure relevant results. This may occur when search queries are poorly worded or when they contain ambiguous words like string or button. If I’m interested in learning about Win32 buttons, I’m probably not interested in the kind of buttons that are used in textiles.

Search collisions are potentially relevant results that don’t make sense in your current context. This might happen if your search results for ‘gas station’ include those on a different continent. Or if you’re searching for a definition of int, int, int, int, int, int or int. Collisions are harder to disambiguate than regular noise and amount to searching for a needle in a stack of needles.

Noise and collisions slow us down and make us less productive. When searching the web, we must tune our searches to minimize the distraction of noise and collisions – often a time-consuming process of trial and error. Offline/local search is even noisier due to the challenges of indexing content without semantic information from the web.

Very often, the best outcome of a search is finding a site with an appropriate scope. In addition to reducing noise, scoped experiences put other closely related information at our fingertips and help us find information we didn’t know we needed. When I’m really interested in the movie Casablanca, IMDb.com provides a scoped experience that lets me easily explore highly relevant content.

What constitutes a good scope?

A good scope is one that optimizes your productivity in its associated context. More specifically, a good scope:

  • Supports a useful, recurring user context (e.g. “native development on Windows”)
  • Is obvious, easy-to-discover and easy-to-use
  • Minimizes noise and collisions for the intended audience
  • Is reusable across sessions
  • Strikes the right balance between relevant and irrelevant content

Let’s look a bit more at this last point. Every piece of information lies on a continuum of relevance, from highly relevant to completely irrelevant. If the subset of information defined by a scope is too large, the scope won’t sufficiently eliminate noise and collisions. If the subset is too small, users will likely become frustrated and return to unscoped searches.

continuum

Furthermore, it’s easier to discover and reuse fewer large graphs than many small graphs. Offering too many scopes replaces the “needle in the haystack” problem with the “infinite haystacks” problem. The red box in the diagram above represents the theoretical sweet spot of a good scope.

Recommendations

From a scoping perspective, we should:

  • Continue delivering scoped libraries for MSDN product centers. Scoped libraries are already proving their value in delivering simple and focused experiences that are discoverable from search.
  • Enhance all of our social apps to honor scoping preferences. This will enable users to easily scope not just a single application but their entire MSDN experience.
  • Ensure predefined scopes can be discovered from within other scoped experiences. This is a key missing feature of our current scoped library implementation.
  • Enable users to select combinations of predefined scopes. Allow users to mix-and-match our predefined scoped experiences to support a variety of customer scenarios.
  • Enable users to scope their social graph. Just as end-users can choose subsets of information to work with, customers should be able to choose and persist subsets of friends and co-workers with whom to collaborate and share information. In particular, we should seamlessly support TFS collaboration.
  • Infer scopes from the IDE (auto-scoping). In many cases it will be beneficial to infer or seed a scope with contextual information from the user’s IDE. This can help users quickly locate information without having to manually configure a scope.
  • Create a custom scoping capability for MSDN. Empower users to create, persist and share their own custom scopes. Note that any of the subsetting tools (scope, filter and associate) can be used to define a scope.
  • Work with search engines to disambiguate topics that appear in more than one scope. The introduction of canonical topics causes search results to hide non-canonical options. In many cases, we’d like to surface those other options as additional links in the search result.

Filter

A filter, when applied to a collection, results in a smaller collection of content that shares one or more common attributes. Note that filters don’t change the collection type. If you filter a list, you get a list; if you filter a graph, you get a graph.

Filters serve as good search refinements. Examples of filtered searches include:

  • Pictures of boats with an image size larger than 1024×768 pixels
  • Highly rated Italian restaurants
  • MSDN Library topics that contain a code example

Filters further reduce the size of your haystack by eliminating content that doesn’t meet a set of desired characteristics.

Why filter?

Filters enable you to find information that would be difficult to locate using string-based search alone. That’s because filters act on content metadata rather than on the content itself. In addition filters are very useful in removing noise and collisions from other views of content. Filtered result sets can be sorted by any attribute that has magnitude (e.g. price, rating, popularity, etc.).

Web search is evolving from searching ‘strings’ to searching ‘things.’ Already, Google Images enables you to seed a search with an image instead of a string and to filter the result set by attributes including image size and color. Initiatives like the Semantic Web and schema.org are developing web standards for structured data and metadata that enable type-based search. As these standards evolve, web search filters will become powerful and pervasive.

What constitutes a good filter?

A good filter lets you quickly remove noise to help reveal the content you need. In particular, it:

  • Is discoverable, understandable and usable
  • Enables you to eliminate noise and disambiguate collisions
  • Exposes the most important content attributes
  • Doesn’t overwhelm you with low-value filtering options
  • Supports sorting (when appropriate)
  • Is easy to modify/change

Recommendations

From a filtering perspective, we should:

  • Tag content accurately and consistently. Provide both tooling and business process support for consistent, complete and correct tagging of content across our entire corpus. While this is helpful for search, it is critical for good filtering experiences.
  • Support and guide evolving taxonomy standards. Continue collaborating with non-Microsoft search providers on Internet standards for content types and content attribution.
  • Add only the content attributes that are most beneficial to users. It is better to have a handful of extremely useful attributes than 100 useless attributes. We should instrument and analyze which attributes are helpful in eliminating noise and disambiguating collisions for their given content types. Let’s keep in mind that attributes end up being exposed to the end-user in a filtering control, typically a checkbox.
  • Adopt support for filtering in MSDN social apps. To the extent that all of our apps provide sometimes unmanageable amounts of information, they should implement appropriate filtering to help developers find the information they need.
  • Support attribute hierarchies. Nested attributes enable you to discover progressively finer-grained filters, for example: “MS Office” → “Word” → “Word 2012”.
  • Deliver an advanced site search experience. Enable the user to filter site searches and to better disambiguate, for example, “Visual+Studio” as a search term vs. “Visual Studio” as a content attribute. Enable users to save these searches for later re-use.

Associate

An association depicts one or more content items with formal relationships to a selected (or seed) item. While the number of ways that any two pieces of content can be related is unbounded, associations expose relationships deemed so important that they’re programmatically exposed. Associations allow us to explore both the graph and semantics of those formal relationships.

Examples of association tools include:

  • “Frequently Bought Together” and “What Other Items Do Customers Buy After Viewing This Item?” lists on Amazon (transaction relationships)
  • “Because you enjoyed:” lists on Netflix (content affinity)
  • Tag relationships on Stack Overflow (synonym, related)
  • The tuneglue music map (artist affiliation)
  • The MSDN Library Table of Contents (parent, child, sibling)

Associations help us find things we don’t quite know how to describe and things we don’t know that we need.

Why associate?

Associations provide at least three key user benefits. First, they can help you understand the context of currently displayed item by understanding its place in a neighborhood of related content. And by exploring different types of relationships, you can quickly slice the content graph in different ways. Second, if you’re near the content you’re looking for but not on the exact item you need, associations help you quickly explore highly related content without jumping off to search. Third, associations help you discover related content that you didn’t know you needed (unknown unknowns). The less certain you are of what you’re looking for, the more valuable associations can be.

In addition, if the content system allows content items in one scope to have relationships with items in other scopes, then associations can enable you to discover and switch to that scope.

What makes for a good association?

Good associations expose a small number of meaningful relationships. If too many relationships are presented, the best related content may be obscured. If displayed relationships aren’t useful a high percentage of the time, users will perceive the associations as page noise. Finally, good association tools shouldn’t interfere with the fundamental consumption of reference content.

Recommendations

  • Add support for metadata-as-data in our content store. Giving metadata identity enables us to capture semantic relationships between content attributes. This enables users to explore non-obvious relationships in the content.
  • Determine which semantic relationships are most worth formalizing. Don’t try to create relationships for everything.
  • Continue investing in techniques to identify high-value relationships. Manually created relationships have proven difficult to maintain due to the size, age and dynamic nature of MSDN content. Our challenge is to find better data-driven way to identify, deliver, validate and maintain useful relationships across our content.
  • Implement a content hygiene strategy informed by usage metrics. Relationships that aren’t proving beneficial should be removed from the system.

A Few Thoughts on Implementation

The goal of this paper is to solidify fundamental concepts and terminology and not to prescribe an implementation. But, in the course of writing this paper and discussing its concepts with others, I’ve acquired a few insights that may guide implementation.

Implement metadata as data

Today, much of our MSDN content is attributed with hard-coded strings. This impedes discoverability, content hygiene and syndication scenarios. Instead, the system should treat metadata as a first class citizen in the content. In particular:

  • Metadata has identity within the system
  • Metadata can have content and attributes
  • Metadata can have relationships with both data and other metadata (synonym, translation, parent/child, etc.)
  • The system must support both taxonomy (prescriptive tagging) and folksonomy (collaborative tagging)

A good example of this can be experienced on Stack Overflow. The tag “asp.net” has a public identity at http://stackoverflow.com/tags/asp.net/info . From there, you can see that it has a synonym relationship with the tag “aspx”. The system automatically remaps synonym tags in the user interface to keep the experience consistent and clean.

Maintain a clean separation of metadata

It’s common to confuse scoping and filtering because they both use metadata to create content subsets. Likewise, it’s easy to confuse filters and associations because, as consummate pattern-matchers, we automatically infer a relationship between content items that share one or more common attributes.

While there is overlap between the capabilities of scoping, filtering and association, I hope that I’ve shown that these tools have vitally different and complementary roles. As a result, it is helpful to maintain architectural separation between the metadata that these tools use.

In this paper, I’ve underscored the distinction between attributes and collections. Because attributes describe a specific content item, it makes sense to encapsulate them along with their corresponding content. Collections, on the other hand, encapsulate content items and can change independently of the content. Items that are closely related in one information architecture may not be closely related in a different architecture.

TOC relationships are baked into content items in Document Explorer and in the more recent Microsoft Help Content (MSHC) format. That implementation encourages the view that scoping and filtering are equivalent. The problems with this perspective are many and are described in the attached case study on the scoping implementation in Document Explorer.

We should store relationship metadata independently of content in our future online and local help solutions. If, for practical reasons, we decide to maintain relationship metadata inside of content, we must at least acknowledge that attributes and relationships are distinctly different and that they support different authoring and consumption experiences.

Implementing Scopes

When implementing scopes, it’s important that we start simply and evolve carefully. SEO must be a consideration when adding new scopes as should the fragmentation of our experiences. Creating scopes for each of our major development platforms and products is a good start because they have a critical mass of organized documentation.

It’s also necessary to enable users to dynamically combine scopes. This reduces the need to replicate shared topics over and over in many scopes since groups of commonly-reused topics can be collected into their own scope.

I introduced scoped library design principles in my blog post announcing MSDN scoped libraries. I now extend these principles to all MSDN applications as follows:

  1. Fewer application scopes are better. Too many scopes forces users into “island hopping.”
  2. An application scope doesn’t live forever. Scopes must have an end-of-life strategy.
  3. Every content item in an application scope is also in the application’s master scope. This supports the “search everything” scenario and enables us to honor scoped URLs after the scope is no longer available or the item has moved.
  4. In general, content items should not appear in more than one application scope. If items must appear in more than one scope, a single, canonical version should be presented to web crawlers.

Implementing Filters

While evolving web standards define a baseline set of type-based attributes, they must be augmented by domain-specific attributes (e.g. ‘C# development’). Content creators must understand the attributes that are most helpful to their audience and then tag their content accordingly. It’s useful to distinguish between coarse- and fine-grained content attributes. Coarse-grained attributes help you group topics by their similarities (set creation). Fine-grained attributes help you distinguish between similar topics (subset creation and element selection).

Once a tagging system is defined, the challenge is to consistently tag all content. For large teams working on complex documentation it turns out to be difficult to achieve tagging consistency and completeness across numerous teams, many content sets and over the course of many years. A simpler tagging scheme is easier to maintain than a complex one. Ultimately, it’s important that the system support tag hygiene including revision, consolidation and removal.

Filters must be easy to create, modify, combine and clear. The user interface must provide a compromise between the value of filters and the number of them. It is common to implement filters using checkboxes. If too many filterable attributes are offered, it results in the “endless checklist” problem and associated usability issues.

Implementing Associations

From a user experience perspective:

  • Declare the nature of the relationships in an association to the user. The better they understand how another topic is related to the current one, the better they can assess the value of clicking on a related link and what they will see when they do.
  • Fewer good associations are better than many mediocre ones. In fact, displaying zero associations is a much better option than displaying any number of low-quality relationships.
  • The display of associations should not interfere with the primary scenario of consuming the content on the currently displayed page.

Another concern is relationship hygiene. With a measured MSDN task success rate of ~80%, future increases in the task success rate will be modest. At the same time, since the potential number of relationships is exponentially related to the number of topics, we must continually work to ensure that new and existing relationships are a) beneficial, b) not introducing too much complexity into the user experience, and c) not impacting performance.

Closing Thoughts

Let’s review our issue list and look at how scoping, filtering and associations might help with problems that aren’t completely addressed by search and search engine optimization.

Issue Mitigation
Larger content sets obscure information in smaller content sets Developers can select a scoped experience that moves “popular” content out of the way in favor of relevant content.
Multiple platforms with similar or overlapping APIs Developers can select one or more scopes specific to their problem domain.
Multiple product versions of the same topic From search, the most recent topic version is canonical. The Library Version Selector enables you to discover alternative versions and navigate to a section of the TOC scoped to the version you’re using.
Closely related topics may be scattered across the Table of Contents Filtered graphs enable you to discover the relative location of scattered content. Scopes bring these scattered topics into a unified experience. Associations provide a discoverable link to related content.
Lack of filtering mechanisms encourages creation of content islands Clean tagging solution allows us to consolidate and simplify the Forums we offer

We’ve made some progress already. In recent months, we introduced several scoped MSDN libraries including the Windows 8 scoped library and the Azure scoped library. (I covered the benefits of and principles behind scoped libraries in a recent post.) The MSDN Developer Samples site offers a good example of how search and filters can be effective in combination. Our ongoing work with schema.org will improve access to online content from all major search engines and from the MSDN site search that serves the MSDN Library. The ability to expose structured data to search will also benefit our own filtering tools.

I’ve documented these concepts because I wanted to deeply explore the tools at our disposal to better and more quickly connect developers with the information they need. My hope is that this analysis and its recommendations will inform the future direction of MSDN and ultimately benefit developers.

Case Study: Scopes, Filters and Document Explorer

Now that we’ve discussed scopes, filters and the underlying physics of their use, let’s take a look at what happens when these two concepts get confused.

Filters Gone Wild

Document Explorer (dexplore.exe), the rich client help viewer that shipped with Visual Studio 2008 and earlier, provided three different types of filters: a TOC filter, a search filter and in-topic member filtering for managed APIs. The purpose of the TOC filter was to create reusable subsets of the Table of Contents based on a set of predefined products, languages and technologies – in other words, scoped TOCs.

But Document Explorer used filtering instead of scoping to create these views. For example, the filter definition for Windows Forms Development was:

("__FilterUniqueID"="WinForm" AND "__FilterLocaleID"="en") OR
("DocSet"="Visual Studio" OR
"DocSet"="WindowsForms" OR
"DocSet"="NetFxBCL" OR
"DocSet"="ADONET" OR
"DocSet"="C# Lang" OR
"DocSet"="Visual Basic Lang" OR
"DocSet"="VJ# Lang" OR
"DocSet"="Visual C++ Lang" OR
("DocSet"="DebugVS" AND ("DevLang"="VB" OR "DevLang"="CSharp" OR "DevLang"="jsharp" OR
                         "DevLang"="C++" OR DevLang"="SQL")) OR
"DocSet"="ClientVS" OR
"Technology"="NETFramework" OR
"DocSet"="DExploreHelp") NOT "DocSet"="NetFxMPHE"

It’s clear that this view is based on topic attributes, not on the structural relationship of the content. This approach worked but it had some limitations.

The first problem here is that you’re not really sure what you’re getting. As a filter author, it’s not clear what content from the TOC will be included in this view, what important information might be left out and how this view will evolve (or deteriorate) over time. Think about it this way: If you were going to create a scope from scratch for your own use, would you prefer to select nodes from the TOC or search the list of available metadata for likely candidates?

Another problem is that this view looks like it overlaps with a lot of other managed development topic subsets so it may not be the best way to eliminate collisions. Adding yet more metadata to the system to support finer-grained views would likely add complexity without sufficient return.

Worse yet, the Document Explorer TOC Filter doesn’t serve as a search scope. The TOC and Search Filters in Document Explorer are separate tools that don’t work in combination. Good scoped TOCs should set the boundaries for both search and navigation.

Finally, this approach is brittle and difficult to maintain. It’s one thing to ensure that a single documentation team tags their content completely and consistently for a single product release. It’s another thing to scale this system to 70+ teams working over more than a decade. And, as the metadata system gets older and more complex, it’s difficult to perform needed hygiene without breaking older experiences. Microsoft Help MVP Rob Chandler highlighted this problem in his How to fix MSDN Help post: “MS Help 2 was heading in the right direction with filters, but despite several attempts to rework the tags the system just never worked to anyone’s satisfaction … We don’t want complicated tagging. We never get this right.

Leave a Reply

%d bloggers like this: