Thursday, July 30, 2009

Physical Database Design-Purpose of the physical database design

- Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associated integrity constraints and security measures.
- Does not include implementing files and databases ( i.e. creating them and loading data into them)
- It produce technical specifications use during the implementation phase
- The database can be single centralized or multiple distributed sites (here single centralized only)
- Physical database design is critical. Decisions made during this stage have a major impact on data accessibility, response times, securing and user friendliness
- Database administration plays a major role in physical database design.
- In most situations, many physical database design decisions are implicit or eliminated when you choose the database management technologies.
- Since many organizations have standards for OS, DBMS, and data access languages, you must deal only with those choices.
- The primary goal of physical database design is data processing efficiency.
- With ever-decreasing costs for computer technology (both speed and space), it is important for you to design the physical database to minimize the time required by users to interact with the IS
- Designing physical files and databases requires certain information:
- Normalized relations, including volume estimates
- Definitions of each attribute
- Descriptions of where and when data are used: entered, retrieved, deleted, and updated (including frequencies)
- expectations or requirements for response time and data security, backup, recovery, retention, and integrity
- descriptions of the technologies (DBMS) used for implementing the database Physical database design requires several critical decisions that will affect the integrity and performance. These key decisions include:
- Choosing the storage format (called data type) for each attribute from the logical data model. The format is chosen to minimize storage space and to minimize data integrity
- Grouping attributes from the logical data model into physical records
- Arranging similarly structured records in secondary memory (hard disk or any other means) so that individual or group of records (file organizations) can be stored, retrieved, and updated rapidly.
Consideration must be given to protecting and recovering data after errors are found
- Selecting structures (called indexes and database architectures) for storing and connecting files to make retrieving related data more efficient
- Preparing strategies for handling queries against the database that will optimize performance and take advantage of the file organization and indexes

Sunday, July 19, 2009

Relational Algebra Introduction

Relational Query Languages

Query languages: Allow manipulation and retrieval of data from a database.
- Relational model supports simple, powerful QLs:
• Strong formal foundation based on logic.
• Allows for much optimization.
- Query Languages != programming languages!
• QLs not intended to be used for complex calculations.
• QLs support easy, efficient access to large data sets.

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for
representing execution plans.
Relational Calculus: Lets users describe what they
want, rather than how to compute it. (Non-procedural,declarative.)

Note: Understanding Algebra & Calculus is key to
understanding SQL, query processing!

Preliminaries

A query is applied to relation instances, and the
result of a query is also a relation instance.
• Schemas of input relations for a query are fixed (but
query will run over any legal instance)
• The schema for the result of a given query is also
fixed. It is determined by the definitions of the query
language constructs.

Relational Algebra: 5 Basic Operations

- Selection ( ): Selects a subset of rows from relation
(horizontal).


- Projection ( ): Retains only wanted columns from relation
(vertical).
-
Cross-product ( ): Allows us to combine two relations.


- Set-difference ( — ): Tuples in r1, but not in r2.

- Union ( ): Tuples in r1 or in r2.