Database systems are designed to manage large bodies of information. These large
bodies of information do not exist in isolation. They are part of the operation of
some enterprise whose end product may be information from the database or
may be some device or service for which the database plays only a supporting
role.
Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues. In this text, we focus initially on the writing of database queries and the design of database schemas. Chapter 9 discusses the overall process of application design.
1.6.1 Design Process
A high-level data model provides the database designer with a conceptual frame- work in which to specify the data requirements of the database users, and how the database will be structured to fulfill these requirements. The initial phase of database design, then, is to characterize fully the data needs of the prospective database users. The database designer needs to interact extensively with domain experts and users to carry out this task. The outcome of this phase is a specification of user requirements.
Next, the designer chooses a data model, and by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database. The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. The designer can also examine the design to remove any redundantfeatures. The focus at this point is on describing the data and their relationships, rather than on specifying physical storage details.
name), and tot cred (total credit hours the student
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example, each person is an entity, and bank accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the
Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues. In this text, we focus initially on the writing of database queries and the design of database schemas. Chapter 9 discusses the overall process of application design.
1.6.1 Design Process
A high-level data model provides the database designer with a conceptual frame- work in which to specify the data requirements of the database users, and how the database will be structured to fulfill these requirements. The initial phase of database design, then, is to characterize fully the data needs of the prospective database users. The database designer needs to interact extensively with domain experts and users to carry out this task. The outcome of this phase is a specification of user requirements.
Next, the designer chooses a data model, and by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database. The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. The designer can also examine the design to remove any redundantfeatures. The focus at this point is on describing the data and their relationships, rather than on specifying physical storage details.
In terms of the relational model, the conceptual-design process involves de-
cisions on what attributes we want to capture in the database and how to group
these attributes to form the various tables. The “what” part is basically a business
decision, and we shall not discuss it further in this text. The “how” part is mainly a
computer-science problem. There are principally two ways to tackle the problem.
The first one is to use the entity-relationship model (Section 1.6.3); the other is
to employ a set of algorithms (collectively known as normalization) that takes as
input the set of all attributes and generates a set of tables (Section 1.6.4).
A fully developed conceptual schema indicates the functional requirements of the enterprise. In a specification of functional requirements, users describe the kinds of operations (or transactions) that will be performed on the data. Example operations include modifying or updating data, searching for and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can review the schema to ensure it meets functional requirements.
The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases. In the logical-design phase, the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used. The designer uses the resulting system-specific database schema in the subsequent physical-design phase, in which the physical features of the database are specified. These features include the form of file organization and the internal storage structures; they are discussed in Chapter 10.
1.6.2 Database Design for a University Organization
To illustrate the design process, let us examine how a database for a university could be designed. The initial specification of user requirements may be based on interviews with the database users, and on the designer’s own analysis of the organization. The description that arises from this design phase serves as the basis for specifying the conceptual structure of the database. Here are the major characteristics of the university.
• The university is organized into departments. Each department is identified
A fully developed conceptual schema indicates the functional requirements of the enterprise. In a specification of functional requirements, users describe the kinds of operations (or transactions) that will be performed on the data. Example operations include modifying or updating data, searching for and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can review the schema to ensure it meets functional requirements.
The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases. In the logical-design phase, the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used. The designer uses the resulting system-specific database schema in the subsequent physical-design phase, in which the physical features of the database are specified. These features include the form of file organization and the internal storage structures; they are discussed in Chapter 10.
1.6.2 Database Design for a University Organization
To illustrate the design process, let us examine how a database for a university could be designed. The initial specification of user requirements may be based on interviews with the database users, and on the designer’s own analysis of the organization. The description that arises from this design phase serves as the basis for specifying the conceptual structure of the database. Here are the major characteristics of the university.
• The university is organized into departments. Each department is identified
by a unique name (dept
budget.
name), is located in a particular building, and has a
• Eachdepartmenthasalistofcoursesitoffers.Eachcoursehasassociatedwith
it a course
prerequisites.
id, title, dept
• Instructors are identified by their unique ID. Each instructor has name, asso-
• Instructors are identified by their unique ID. Each instructor has name, asso-
name, and credits, and may also have have associated
ciated department (dept
• StudentsareidentifiedbytheiruniqueID.Eachstudenthasaname,anassoci-
• StudentsareidentifiedbytheiruniqueID.Eachstudenthasaname,anassoci-
name), and salary.
ated major department (dept
earned thus far).
The university maintains a list of classrooms, specifying the name of the
building, room
• Theuniversitymaintainsalistofallclasses(sections)taught.Eachsectionis
• Theuniversitymaintainsalistofallclasses(sections)taught.Eachsectionis
number, and room capacity.
identified by a course
a semester, year, building, room class meets).
a semester, year, building, room class meets).
id, sec
number, and time
number, and time
id, year, and semester, and has associated with it
slot id (the time slot when the
-
The department has a list of teaching assignments specifying, for each in-
structor, the sections the instructor is teaching.
-
The university has a list of all student course registrations, specifying, for
each student, the courses and the associated sections that the student has
taken (registered for).
A real university database would be much more complex than the preceding design. However we use this simplified model to help you understand conceptual ideas without getting lost in details of a complex design.
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example, each person is an entity, and bank accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the
attributes dept
in a university, and they form attributes of the department entity set. Similarly, attributes ID, name, and salary may describe an instructor entity.2
in a university, and they form attributes of the department entity set. Similarly, attributes ID, name, and salary may describe an instructor entity.2
name, building, and budget may describe one particular department
The extra attribute ID is used to identify an instructor uniquely (since it may
be possible to have two instructors with the same name and the same salary).
A unique instructor identifier must be assigned to each instructor. In the United
States, many organizations use the social-security number of a person (a unique
number the U.S. government assigns to every person in the United States) as a
unique identifier.
A relationship is an association among several entities. For example, a member relationship associates an instructor with her department. The set of all entities of the same type and the set of all relationships of the same type are termed an entity set and relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graph- ically by an entity-relationship (E-R) diagram. There are several ways in which to draw these diagrams. One of the most popular is to use the Unified Modeling Language (UML). In the notation we use, which is based on UML, an E-R diagram is represented as follows:
A relationship is an association among several entities. For example, a member relationship associates an instructor with her department. The set of all entities of the same type and the set of all relationships of the same type are termed an entity set and relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graph- ically by an entity-relationship (E-R) diagram. There are several ways in which to draw these diagrams. One of the most popular is to use the Unified Modeling Language (UML). In the notation we use, which is based on UML, an E-R diagram is represented as follows:
Entity sets are represented by a rectangular box with the entity set name in
the header and the attributes listed below it.
• Relationship sets are represented by a diamond connecting a pair of related entity sets. The name of the relationship is placed inside the diamond.
As an illustration, consider part of a university database consisting of instruc- tors and the departments with which they are associated. Figure 1.3 shows the corresponding E-R diagram. The E-R diagram indicates that there are two entity sets, instructor and department, with attributes as outlined earlier. The diagram also shows a relationship member between instructor and department.
In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform. One important constraint is mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set. For example, if each instructor must be associated with only a single department, the E-R model can express that constraint.
The entity-relationship model is widely used in database design, and Chapter 7 explores it in detail.
1.6.4 Normalization
Another method for designing a relational database is to use a process commonly known as normalization. The goal is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to design schemas that are in an appropriate normal form. To determine whether a relation schema is in one of the desirable normal forms, we need additional information about the real-world enterprise that we are modeling with the database. The most common approach is to use functional dependencies, which we cover in Section 8.4.
To understand the need for normalization, let us look at what can go wrong in a bad database design. Among the undesirable properties that a bad design may have are:
• Repetition of information
• Inability to represent certain information
the header and the attributes listed below it.
• Relationship sets are represented by a diamond connecting a pair of related entity sets. The name of the relationship is placed inside the diamond.
As an illustration, consider part of a university database consisting of instruc- tors and the departments with which they are associated. Figure 1.3 shows the corresponding E-R diagram. The E-R diagram indicates that there are two entity sets, instructor and department, with attributes as outlined earlier. The diagram also shows a relationship member between instructor and department.
In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform. One important constraint is mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set. For example, if each instructor must be associated with only a single department, the E-R model can express that constraint.
The entity-relationship model is widely used in database design, and Chapter 7 explores it in detail.
1.6.4 Normalization
Another method for designing a relational database is to use a process commonly known as normalization. The goal is to generate a set of relation schemas that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to design schemas that are in an appropriate normal form. To determine whether a relation schema is in one of the desirable normal forms, we need additional information about the real-world enterprise that we are modeling with the database. The most common approach is to use functional dependencies, which we cover in Section 8.4.
To understand the need for normalization, let us look at what can go wrong in a bad database design. Among the undesirable properties that a bad design may have are:
• Repetition of information
• Inability to represent certain information
We shall discuss these problems with the help of a modified database design for
our university example.
Suppose that instead of having the two separate tables instructor and depart- ment, we have a single table, faculty, that combines the information from the two tables (as shown in Figure 1.4). Notice that there are two rows in faculty that contain repeated information about the History department, specifically, that department’s building and budget. The repetition of information in our alterna- tive design is undesirable. Repeating information wastes space. Furthermore, it complicates updating the database. Suppose that we wish to change the budget amount of the History department from $50,000 to $46,800. This change must be reflected in the two rows; contrast this with the original design, where this requires an update to only a single row. Thus, updates are more costly under the alternative design than under the original design. When we perform the update in the alternative database, we must ensure that every tuple pertaining to the His- tory department is updated, or else our database will show two different budget values for the History department.
Now, let us shift our attention to the issue of “inability to represent certain information.” Suppose we are creating a new department in the university. In the alternative design above, we cannot represent directly the information concerning
Suppose that instead of having the two separate tables instructor and depart- ment, we have a single table, faculty, that combines the information from the two tables (as shown in Figure 1.4). Notice that there are two rows in faculty that contain repeated information about the History department, specifically, that department’s building and budget. The repetition of information in our alterna- tive design is undesirable. Repeating information wastes space. Furthermore, it complicates updating the database. Suppose that we wish to change the budget amount of the History department from $50,000 to $46,800. This change must be reflected in the two rows; contrast this with the original design, where this requires an update to only a single row. Thus, updates are more costly under the alternative design than under the original design. When we perform the update in the alternative database, we must ensure that every tuple pertaining to the His- tory department is updated, or else our database will show two different budget values for the History department.
Now, let us shift our attention to the issue of “inability to represent certain information.” Suppose we are creating a new department in the university. In the alternative design above, we cannot represent directly the information concerning
a department (dept
instructor at the university. This is because rows in the faculty table require values for ID, name, and salary. This means that we cannot record information about the newly created department until the first instructor is hired for the new department.
One solution to this problem is to introduce null values. The null value indicates that the value does not exist (or is not known). An unknown value may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists). As we shall see later, null values are difficult to handle, and it is preferable not to resort to them. If we are not willing to deal with null values, then we can create a particular item of department information only when the department has at least one instructor associated with the department. Furthermore, we would have to delete this information when the last instructor in the department departs. Clearly, this situation is undesirable, since, under our original database design, the department information would be available regardless of whether or not there is an instructor associated with the department, and without resorting to null values.
instructor at the university. This is because rows in the faculty table require values for ID, name, and salary. This means that we cannot record information about the newly created department until the first instructor is hired for the new department.
One solution to this problem is to introduce null values. The null value indicates that the value does not exist (or is not known). An unknown value may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists). As we shall see later, null values are difficult to handle, and it is preferable not to resort to them. If we are not willing to deal with null values, then we can create a particular item of department information only when the department has at least one instructor associated with the department. Furthermore, we would have to delete this information when the last instructor in the department departs. Clearly, this situation is undesirable, since, under our original database design, the department information would be available regardless of whether or not there is an instructor associated with the department, and without resorting to null values.
An extensive theory of normalization has been developed that helps formally
define what database designs are undesirable, and how to obtain desirable de-
signs. Chapter 8 covers relational-database design, including normalization.
Post a Comment