Members: Join   Log In
Conv Roy Youngman - @ITSinsider hashtags don't seem to work
Rank_member
Metadata Management by Wiki
by Roy Youngman on Mar 18, 2008 - 05:59 PM read 405 times
Source: http://www.ryoungman.net/?p=14
External

As a former data architect, I still find myself networking with and interested in the work of data gurus at various companies. These are the guys and gals that build data models, design databases, create ETL logic, and manage what is in data dictionaries or other forms of metadata repositories. They work very hard to try to organize what is mostly a mess - much like a single parent of six teenagers. Unfortunately, like that poor parent they are more likely than not to be frustrated and aggravated by constantly taking two steps back for one step forward as the mess gets worse over time. It seems to me that these data gurus are a disillusioned bunch and their numbers and the overall talent pool is fading as the job has lost much of its luster from 20 years or so ago.

Its a shame because having a superb data architecture is likely to be one of those differentiating capabilities that Next Generation Enterprises (NGEs) will covet. Managing metadata is a key piece to that architecture because, believe it or not, it is an important enabler of agility. Let me explain: all the improved agile methods or agile development environments designed to foster innovation or speed up the cycle time from concept to delivery will not help much if you are constantly trying to figure out your data resources (which most companies unknowingly do over and over again). For example: have you ever seen a system development project come in on time and on budget and declared a success, but then discover the data conversion project costs just as much or more and is holding the whole thing up from actually going online?

In the end, NGEs will figure out how to get the most value out of their information resources and the solution is not solely dependent on the talent of data gurus. Im not saying that data architecture is unnecessary. If anything, IT in general could benefit by an increase in data-related competencies that yield improved data architecture. The problem, however, is bigger than data architecture in two main ways:

  1. The best data architecture will still result in bad data (see my earlier post on discovering bad data). Users of systems that create or modify data will use systems in ways designers dont anticipate including the way data is made persistent and in creative ways that surprise and shock designers.
  2. Nearly all metadata that provides clarity as to the meaning of data is a passive representation from the designers, often technical perspective if any such metadata exists at all. Active metadata is something that is part of the system or database design (for example, PayRate is a mandatory, positive numeric field the DBMS will ensure to that). Active metadata is nice, but rarely conveys much meaning beyond what you can get from the name of the data field. Passive metadata is an attempt to explain meaning (for example, PayRate is the gross amount an employee is paid per pay period).

Unfortunately, you combine the two problems above and for the most part you realize you should start any sentence about active metadata with we know and any sentence about passive metadata with we think. For example, we know PayRate is not null and is always a positive number, and we think it represents the gross amount an employee is paid per period [because that was what it was designed for but of course who knows how it is actually used].

As a result, bad data still proliferates in most companies who then incur a great deal of cost and lost time as the same data is re-analyzed to understand its actual meaning over and over again. Or worse, bad decisions are made based on bad data that someone just assumes it is what the passive metadata says it is. We have tried to address this problem with data warehouses, ETL tools, dictionaries, repositories and so on. Yet bad data still exists and the clarity of data meaning remains elusive.

Or is it?

Actually, as power users of data address a need, they frequently figure out the bad data and what to do to work around it. They discover why some field like PayRate has a weird $1 value in many instances or has other such irregularities (oh yeah, we do have some employees on an unique pay scale our system cant handle so we just enter a 1 in that field because the system requires something). As a community of power users, they probably have a collective knowledge that is pretty accurate and potentially powerful. Unfortunately, they have little in the way to codify, share, or collaborate that knowledge with one another. So the wheel is constantly reinvented person by person as each data user tries to solve a different problem using the same data.

It doesnt need to be like this. What we need is a mashup of the traditional repository metadata management tools and Wiki collaboration technology. We need something that allows passive metadata to reflect actuality and not be limited to the intent of system and database designers. We need something that harnesses the wisdom of the community of data users who require data clarity to be successful in their roles. They have the right incentive to become a community that shares knowledge. But to do this will require the data guru to value collaboration as much as data architecture and companies will have to understand that data gurus cannot solve all their data problems by themselves, and never could.

Featured

Project LIM


Project ITM

Wiki Archive

Concours Archive

Author Profile

Roy Youngman  

Profile and writings

Feed_small Roy Youngman Secure_feed

Roy Youngman

Member Rank_member

Subscribe

Feed for nGenera Community:
Feed_small Public Secure_feed Secure

Why subscribe? What is RSS?