Thursday, July 30, 2009

Physical Database Design-Purpose of the physical database design

- Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associated integrity constraints and security measures.
- Does not include implementing files and databases ( i.e. creating them and loading data into them)
- It produce technical specifications use during the implementation phase
- The database can be single centralized or multiple distributed sites (here single centralized only)
- Physical database design is critical. Decisions made during this stage have a major impact on data accessibility, response times, securing and user friendliness
- Database administration plays a major role in physical database design.
- In most situations, many physical database design decisions are implicit or eliminated when you choose the database management technologies.
- Since many organizations have standards for OS, DBMS, and data access languages, you must deal only with those choices.
- The primary goal of physical database design is data processing efficiency.
- With ever-decreasing costs for computer technology (both speed and space), it is important for you to design the physical database to minimize the time required by users to interact with the IS
- Designing physical files and databases requires certain information:
- Normalized relations, including volume estimates
- Definitions of each attribute
- Descriptions of where and when data are used: entered, retrieved, deleted, and updated (including frequencies)
- expectations or requirements for response time and data security, backup, recovery, retention, and integrity
- descriptions of the technologies (DBMS) used for implementing the database Physical database design requires several critical decisions that will affect the integrity and performance. These key decisions include:
- Choosing the storage format (called data type) for each attribute from the logical data model. The format is chosen to minimize storage space and to minimize data integrity
- Grouping attributes from the logical data model into physical records
- Arranging similarly structured records in secondary memory (hard disk or any other means) so that individual or group of records (file organizations) can be stored, retrieved, and updated rapidly.
Consideration must be given to protecting and recovering data after errors are found
- Selecting structures (called indexes and database architectures) for storing and connecting files to make retrieving related data more efficient
- Preparing strategies for handling queries against the database that will optimize performance and take advantage of the file organization and indexes

Sunday, July 19, 2009

Relational Algebra Introduction

Relational Query Languages

Query languages: Allow manipulation and retrieval of data from a database.
- Relational model supports simple, powerful QLs:
• Strong formal foundation based on logic.
• Allows for much optimization.
- Query Languages != programming languages!
• QLs not intended to be used for complex calculations.
• QLs support easy, efficient access to large data sets.

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for
representing execution plans.
Relational Calculus: Lets users describe what they
want, rather than how to compute it. (Non-procedural,declarative.)

Note: Understanding Algebra & Calculus is key to
understanding SQL, query processing!

Preliminaries

A query is applied to relation instances, and the
result of a query is also a relation instance.
• Schemas of input relations for a query are fixed (but
query will run over any legal instance)
• The schema for the result of a given query is also
fixed. It is determined by the definitions of the query
language constructs.

Relational Algebra: 5 Basic Operations

- Selection ( ): Selects a subset of rows from relation
(horizontal).


- Projection ( ): Retains only wanted columns from relation
(vertical).
-
Cross-product ( ): Allows us to combine two relations.


- Set-difference ( — ): Tuples in r1, but not in r2.

- Union ( ): Tuples in r1 or in r2.

Wednesday, July 15, 2009

The design process in DBMS Further

Storing information about

- In this example the file is storing information about:
• the staff members who attending courses
• the departments in which those staff members work
• the courses that are being offered
• the actual attendance of an employee on a course

- the above problems could be avoided if we can keep the
information about each of these things separately, and then
make cross-references where required (relational database approach)


Relational database approach

- In a relational database, the data values held are
stored in tables (an arrangement of rows and columns, just like a spread sheet)

- one of the main advantage is its ability to set up and automatically maintain cross-references
(relationships) between the tables.
- the structure that hold the data is called a ‘relation’. Common term is table. A row is called a
‘tuple’ and a column is called an ‘attribute’.

Dr.E.F.Codd invented these new words.

A systematic approach to DB design

- implementing large information system can be extremely complicated
- break down the work into series of steps, and each step into manageable tasks.
- the basic steps to be followed during the design and construction are as follows.
• Decide what is the purpose of the DB
• Determine the information requirements of the DB
• Produce a logical model of the information requirements of DB
• Convert the logical data model to a physical data model appropriate to the technology being used
• Implement the physical design using the chosen method or product

Friday, July 10, 2009

Learn DBMS-Purpose of Database System

What is usefulness of the DBMS?

- In the early days, database applications were built on top of file systems

- Drawbacks of using file systems to store data:

Data redundancy and inconsistency
• Multiple file formats, duplication of information in different files

Difficulty in accessing data
• Need to write a new program to carry out each new task
Data isolation — multiple files and formats
Integrity problems
• Integrity constraints (e.g. account balance > 0) become part of program code
• Hard to add new constraints or change existing ones

Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
- E.g. transfer of funds from one account to another should either complete or not happen at all

Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
- E.g. two people reading a balance and updating it at the same time

Security problems
- Database systems offer solutions to all the above problems

Monday, July 6, 2009

Logical Database Design and the Relational Model in DBMS

What is Relational Model - Basic Definitions
- relational model represent the data in the form of tables
- based on mathematical theory
- consists of three components
Data structure: Data are organized in the form of tables with rows and columns
Data manipulation: powerful operations (using SQL) are used to manipulate data stored in the relations Data integrity: facilities are included to specify business rules

What is Relational Data Structure
- a relation is a named, two-dimensional table of data
- can express the structure of a relation by a shorthand notation in which name of the relation is followed (in parentheses) by the name of the attributes in that relation
- e.g. EMPLOYEE(Emp_ID,Name,Dept_Name,Salary)
- Relational Keys: every relation must have a primary key
- e.g. EMPLOYEE(Emp_ID,Name,Dept_Name,Salary)
- a composite key is a primary key that consists of more than one attributes
-E.g the primary key for a relation DEPENDENT would likely consists of the combination of Emp_Id and Dependent_Name
- A foreign key : is an attributes (possibly composite) in a relation of a database that serve as the primary key of another relation in the same database.
-E.g EMPLOYEE(Emp_ID,Name,Dept_Name,Salary), DEPARTMENT(Dept_Name,Location,Fax)

learn Properties of Relations
- not all the tables are relations
- relations have several properties
- unique name
- an entry at the intersection of each row and column is atomic (or single valued)
- each row is unique
- each attribute within a table has a unique name
- the columns of a relation can be interchanged without changing the meaning or use of the relation
- the sequence of rows (top to bottom) is insignificant (can be interchanged)

Example Database
- A relational database consists of any number of relations
- The structure of the database is described through the use of conceptual Schema
- Two common methods for expressing a (conceptual) schema
- Short text statements: each relation is named and the names of its attributes follow in parentheses
- A graphical representation: each relation is represented by a rectangle containing the attributes for the relation
Text statements have the advantage of simplicity, while the graphical representation provides a better means of expressing referential integrity constraints.




Text description for the relations

CUSTOMER(Customer_ID,Customer_Name,Address,City,State, Zip)
ORDER(Order_ID,Order_Date,Customer_ID)
ORDERLINE(Order_ID,Product_ID,Quantity)
PRODUCT(Product_ID,Product_Description,Product_Finish,Unit_Price,On_H and)
- The primary key for the ORDERLINE is a composite key.
- Customer_ID is a foreign key in the ORDER relation; this allows the user to associate an order with the customer who submitted the order.
- ORDERLINE has two foreign keys; Order_ID and Product_ID; these keys allow the user to associate each line on an order with the relevant order and product

Monday, June 29, 2009

Learn The design process In DBMS

Where do databases come from?

- There may be an manual system that has reached the limits of its usefulness
- processing requirements of the organization may have proved to be beyond the capabilities of flat file system
Why is database design important?

- Look at the following Excel spreadsheet currently used Look at the following Excel spreadsheet currently used by ABC Ltd to store its course information.
- need to keep record of staff members who attend each course
- uses a list on Excel worksheet, simple arrangement (rows and columns)
- single table
- it may be better to break it up into separate lists, but lack the ability to hold cross-reference in Excel.
- Columns headed Atendee1 name and Atendee1

Problems with this approach
Redundant data
- each time that staff member attends a course it is necessary to record their department.
- This redundancy could be avoided if information about the staff member could be held in a separate list that could be crossreferenced.
- each row contains a code for the course as well as its title. This redundancy could be avoided if information about the course could be held in a separate list that could be cross-referenced.
- redundant data should not be confused with duplicate values.

Inflexibility Inflexibility
- it is necessary to impose arbitrary limits on its uses
- waste of storage space
- the ways that can analyse the data in this example are limited.
Limited sharing of data
- file is manipulated as a single unit.
- it is not possible for more than one user to update the data in
the same file at the same time.
No inherent control for data quality
- In flat file systems, it is necessary to write logic into an application to ensure the quality of the data
- such application logic may have to be rewritten every time the design of the file is changed
Changing the data can lead to unexpected results
- update anomaly , ‘every occurrences should be updated
- insertion anomaly, ‘difficult to find missing data ’
- deletion anomaly, ‘loss all the information when it deleted'

Thursday, June 25, 2009

Learn to Create Data Model in DBMS E-R model


The Entity Relationship Model
in DBMS
- The Entity-Relationship model is a set of concepts and graphical symbols that can be used to create conceptual schemas
Versions/ Evolution of the E-R Model
- Original E-R model by Peter Chen (1976)
- Extended E-R model (1986): added subtypes, now the most widely used E-R model
- Information Engineering/IE model (also called the Crow’s Foot model) developed by James Martin in 1990
- IDEF1X (1994) : national standard by the National Institute of Standards and Technology
- Additionally:
Unified Modeling Language (UML) Style E-R Models
Semantic Object Data Models (SOMs)
The Four Major Components of the E-R Model
• Entities (really entity sets)
• Attributes
• Relationships
• Identifiers
The Geometric Symbols in an E-R Model
- Entities are represented by rectangles.- Attributes are represented by ovals, that are connected to the entity by a straight line.
- Relationships are represented by diamond shaped

Entity Sets

- A database can be modeled as:
• a collection of entities,
• relationship among entities.
- An entity is an object that exists and is distinguishable from other objects.
• Eg. specific person, company, event, plant
- Entities have attributes
• Eg. people have names and addresses
- An entity set is a set of entities of the same type that share the same properties.
• Eg. set of all persons, companies, trees, holidays
Note: There are usually many instances of an entity

Entity Sets customer and loan













Attributes

- An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.
- Description of the entity’s characteristics







- Domain – the set of permitted values for each attribute
- Attribute types:
• Simple and composite attributes.
• Single-valued and multi-valued attributes
E.g. multivalued attribute: phone-numbers
• Derived attributes Can be computed from other attributes
E.g. age, given date of birth

Composite Attributes

Thursday, June 18, 2009

Who are the Database Users?


Database Users

- Users are differentiated by the way they expect to interact with the system
- Application programmers – interact with system through DML calls
- Sophisticated users – form requests in a database query language
- Specialized users – write specialized database applications that do not fit into the traditional data processing framework
- Naïve users – invoke one of the permanent application programs that have been written previously
E.g. people accessing database over the web, bank tellers, clerical staff

Database Administrator
- Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.
- Database administrator's duties include:
• Schema definition
• Storage structure and access method definition
• Schema and physical organization modification
• Granting user authority to access the database
• Specifying integrity constraints
• Acting as liaison with users
• Monitoring performance and responding to changes in requirements

Transaction Management
- A transaction is a collection of operations that performs a single logical function in a database application
- Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and
transaction failures.
- Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Storage Management
- Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
- The storage manager is responsible to the following tasks:
• interaction with the file manager
• efficient storing, retrieving and updating of data

Overall System Structure



Application Architectures
Two-tier architecture: E.g. client programs using ODBC/JDBC to communicate with a database
Three-tier architecture: E.g. web-based applications, and applications built using “middleware”


Monday, June 15, 2009

Language In use DBMS


Data Definition Language (DDL)

- Specification notation for defining the database schema
• E.g. create table account (account-number char(10),
balance integer)
- DDL compiler generates a set of tables stored in a data dictionary
- Data dictionary contains metadata (i.e., data about data) database schema
Data storage and definition language
• language in which the storage structure and access methods
used by the database system are specified
• Usually an extension of the data definition language

Data Manipulation Language (DML)
- Language for accessing and manipulating the data organized by the appropriate data model
• DML also known as query language
- Two classes of languages
• Procedural – user specifies what data is required and how to get those data
• Nonprocedural – user specifies what data is required without specifying how to get those data
- SQL is the most widely used query language

SQL
- Developed at IBM research labs at San Lose, Ca. in 1970’s
• D.Chamberlin was the leader in the research group
- Dominant data base language on mainframe computers, but available on mini and PC based DBMS
- ANSI (American National Standards Institute approved
in 1992 (SQL 92); SQL 93 exists but not popular

Use Of Data Base


Uses of database

- banking systems that keep information about customers and their accounts
- point of sales systems in shops that process the purchase of goods
- inventory systems
- reservation systems for booking airline flights and other travels
- payroll systems that process salaries and benefits
- accounting systems that produce invoices, statements and other company accounts
- manufacturing systems: many products are now produced by machines that are controlled by computers
- Universities: registration, grades
- Databases touch all aspects of our lives

Database Management System (DBMS)


- DBMS contains information about a particular enterprise
- DBMS provides an environment that is both convenient and efficient to use.
- computer software with the capability to store data in an integrated structured format and to enable users to retrieve, manipulate and manage the data

Introduction to Data Base

Keeping Records

- Mankind has been keeping records since the dawn of civilization (purely pictorial)
- then into the alphabetic systems
- quantitative (num
erical and financial) information was kept in lists.
- later these lists took on a more structured form (rows and columns) called ledger
- these records were maintained by hand, usually on paper, dozens of clerks
- computer has changed all this
- almost every organization now keeps its records in
electronic form and process them using a computer system
- army of clerks is no longer required
- introduction of the computer has led to major
changes in employment pattern
- employment pattern shift away from primary industries (agriculture and manufacturing) to service
- this trend has been accelerated by the increasing use of IT (computers and electronic communications)
- at the heart of these computerized information processing systems you will find a DATABASE

Data and Information


What is Data?
- individual facts about something or somebody that have not been organized (raw data)
- a random collection of names and telephone numbers is just data
What is Information?
- data that has been organized in such a way as to be u
seful to somebody
- the telephone directory contains information because it has been organized
- information to one person in a particular circumstance may be simply considered as data to another

Information processing
Key aspects key aspects

- capturing d
ata (forms for capturing data (forms for data capture )
- storing data (tables for data storage) storing data (tables for data storage)
- processing data (queries, macro and programs) processing data (queries, macro and programs)
- communicating information (Reports for communicating communicating information (Reports for commun
icating information)
What is Database?
- combination of software and data that are used to perform combination of software and data that are used to perform

above four key aspect of information processing above four key aspect of information processing
- computers are not the only means of storing data an
d computers are not the only means of storing data and producing information producing information
- An integrated collection of data organized to meet the An integrated collection of data organized to meet the needs of one or more users needs of one or more users

Levels of Abstraction
- Physical level describes how a record (e.g.,
customer) is stored.
- Logical level: describes data stored in database, and the relationships among the data.
Eg. type customer = record
name : string;
street : string;
age : integer;
end;

- View level: application programs hide details of data types. Views can also hide information (e.g. account balance) for security purposes.

Instances and Schemas
- Similar to types and variables in programming languages
- Schema – the logical structure of the database (a complete example will be given in SQL lecture)
• e.g., the part of the bank database consists of information about a set of customers and accounts and the relationship between them)
• Analogous to type information of a variable in a program
• Physical schema: database design at the physical level
• Logical schema: database design at the logical level
- Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable
- Physical Data Independence – the ability to modify the physical schema without changing the logical schema
• Applications depend on the logical schema
• In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.

Data Models

- A collection of tools for describing

• data
• data relationships
• data semantics

• data constraints
- Entity-Relationship model (we will discuss in th
is module)
- Relational model (we will discuss in this module)
- Other models:
• object-oriented model
• semi-structured data models
• Older models: network model and hierarchical model

Entity Relati
onship model
- E-R model of real world
Entities (objects)
• E.g. street block,street,parcel
Relationships between entities
• Relationship set depositor associates customers with accounts
- Widely used for database design
• Database design in E-R model usually converted to design in the
relational model which is used for storage and proc
essing