What is Metadata in Informatica?
Defn: “Data About Data”
Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a number of different
situations.
Metadata is used for:
• Data transformation and load
• Data management
• Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes that need to be made. The advantage of storing metadata about the data being transformed is that as source data changes the changes can be captured in the metadata, and transformation programs automatically regenerated.
For each source data field the following information is reqd:
Source Field:
• Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme from different sources).
• Name (Local field name).
• Type (storage type of data, like character,integer,floating point…and so on).
• Location
- system ( system it comes from ex.Accouting system).
- object ( object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
• Unique identifier
• Name
• Type (database data type, such as Char, Varchar, Number and so on).
• Tablename (Name of the table th field will be part of).
The other information that needs to be stored is the transformation or transformations that need to be applied to turn the source data into the destination data:
Transformation:
• Transformation (s)
- Name
- Language (name of the lanjuage that transformation is written in).
- module name
- syntax
The Name is the unique identifier that differentiates this from any other similar transformations.
The Language attribute contains the name of the lnguage that the transformation is written in.
The other attributes are module name and syntax. Generally these will be mutually exclusive, with only one being defined. For simple transformations such as simple SQL functions the syntax will be stored. For complex transformations the name of the module that contains the code is stored instead.
Data management:
Metadata is reqd to describe the data as it resides in the data warehouse.This is needed by the warhouse manager to allow it to track and control all data movements. Every object in the database needs to be described.
Metadata is needed for all the following:
• Tables
- Columns
- name
- type
• Indexes
- Columns
- name
- type
• Views
- Columns
- name
- type
• Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same metadata can be used by the Whouse manager to describe the data in the data warehouse is also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This metadata can be used to build a history of all quries run and generate a query profile for each user, group of users and the data warehouse as a whole.
The metadata that is reqd for each query is:
- query
- tables accessed
- columns accessed
- name
- refence identifier
- restrictions applied
- column name
- table name
- reference identifier
- restriction
- join Criteria applied
……
……
- aggregate functions used
……
……
- group by criteria ……
……
- sort criteria ……
……
- syntax - execution plan
- resources ……
……
No comments:
Post a Comment