As of September 2010, the MSDN Library provides online access to over 16 million topics of Microsoft technical and product documentation. What exactly is a topic? In layman’s terms a topic is simply a page authored for users to view. From a database perspective, it is a unique content item with an XHTML representation. Other statistics:
- 374,000 en-us topics delivered with Visual Studio 2010 / .NET Framework 4 for the English (United States) locale – our largest single body of product documentation
- 1.4 million en-us topics across all versions of Visual Studio / .NET Framework (from Visual Studio 6.0 through Visual Studio 2010)
- 3 million en-us topics in the MSDN Library covering over 20 Microsoft product families
- 16 million topics in the MSDN Library distributed across 14 locales
Why is the Library so big?
From Office to Windows to Servers to Developer Tools, there are over 20 product families represented in the Library. Most product families provide technical documentation for multiple products, past and present. For example, the Visual Studio product family includes all of the flavors of Visual Studio, LightSwitch, the Windows SDK, the .NET Framework, Silverlight, ASP.NET, Visual SourceSafe, Visual FoxPro, and many more.
We typically provide multiple versions of technical documentation for each product family. Some of this documentation reaches back over a decade. There are two main reasons for this: 1) we provide access to older technical documentation as part of the Microsoft Support Lifecycle and 2) we recognize that customers can’t always upgrade to a newer product or technology. If content is still receiving significant traffic in the Library, developers are still finding it valuable.
Finally, the Library is available in 14 locales. A locale represents a user’s language and country/region preference. Not all topics are translated, but many are: over 13 million topics are available in non-en-us locales.
How is all this content organized?
At the database level, topics are organized by Locale × Product Family × Product Family Version. The diagram below provides a visual representation. The leftmost table depicts the distribution of our 16 million topics across 14 supported locales. If we zoom in on just the en-us content, you can see the major product families that contribute to our 3 million en-us topics. If we then zoom in on our largest product family, you can see the distribution of topics across different versions of Visual Studio. VS.100 is the internal product family version number for Visual Studio 2010 which delivers over 374,000 topics. (A full list of product family version numbers for Visual Studio is documented in my MSDN URL Cheatsheet.)
We use internal version numbers to uniquely identify product family versions in the database. The easiest way to discover internal version numbers for a given product family is to open Control Panel, navigate to ‘Programs and Features’ and look at the Version column. We use a compact form of the full product family version so ’Visual Studio 10.0.30319’ collapses to ‘vs.100’.
You can see the (Locale × Product Family × Product Family Version) organization at work in the topic URL. For example, the URL for the German (Germany) translation of the .NET Framework 3.5 version of the “System.XML Namespace” topic is:
The locale is specified in the URL as “de-de”. The product-family/product-family-version is specified by the string “(v=VS.90)” where ‘VS’ is the Visual Studio product family and ‘90’ corresponds to Visual Studio 2008, the version that shipped .NET Framework 3.5.
Check out my MSDN URL Cheatsheet for a more complete list of URL conventions.
But wait, there’s more…
The MSDN/TechNet Publishing System (MTPS) that sits beneath the MSDN Library also supports the TechNet Library (for IT Pros), the Expression Library (for designers), and a number of other smaller sites.
- The TechNet Library has over 3 million topics in 19 locales.
- The Expression Library has 8,904 topics, all in en-us.
Overall, there are around 20 million topics hosted in MTPS for a total of 1.3TB of data in production. Uff da!
So why does any of this matter?
The size, structure and age of the Library have a direct bearing on the ease with which you can find the Library information you need.
Let’s look at a specific example. When there are multiple versions of the same topic, search engines may return similar and confusing results. And the specific version you need may not even appear in the list. Once you enter the Library, it’s difficult to tell which version you’re viewing based on content alone since many topics change very little, if at all, from version to version. As I mentioned in a recent post, we’ve introduced a new Version Selector to improve this aspect of our library experience. All six versions of the System Namespace topic are now easily accessible from a dropdown list.
There are many other size- and structure-related challenges including:
- How well do search engine queries help you locate the one topic you need (out of 16 million)?
- Once you’re in the Library, how effectively can you navigate from the topic you’re looking at to the topic you may need?
- If you know exactly which topic you’re looking for, how quickly can you get to it?
- If you want to find information tightly scoped to a specific task, how easy is to get that view (e.g. you only want to see content relevant to Windows Phone development)?
- How quickly can you find a relevant code example for the API you’re working with?
- If you’re in a non-English translation of the Library, how easily can you discover and consume topics that aren’t available in your locale?
These are some of the questions I plan to address in future posts and for which we are constantly working on better answers.
In the meantime, what challenges are you personally facing when using the MSDN Library? If you could make one change to the online Library user interface, what would it be?