European Centre of e-Commerce & Publishing


29/12/20
:: Creating the Database Schema ::

SQL | MySQL Programming | Using MySQL with PHP | Authentication Login System | Content Management System | MySQL 5 Installation |

SQL / Creating the Database Schema



The Database Design process



To manage databases you need database management systems (DBMS).
A DBMS is an SQL application program that stores, retrieves, and modifies data in the database on request.
There are four main types of databases:
  • hierarchical,
  • network,
  • relational,
  • and more recently object relational.
Very quickly the Relational database management systems (RDBMS) became very popular, especially for their ease of use and flexibility in structure. They are based on the following components:

1. Collections of relations or objects in which data is stored
2. Operators that produce relations acting upon other relations
3. Data integrity for accuracy and consistency

Very quickly the Relational database management systems (RDBMS) became very popular, especially for their ease of use and flexibility in structure.
They are based on the following components:
  • Collections of relations or objects in which data is stored
  • Operators that produce relations acting upon other relations
  • Data integrity for accuracy and consistency.
  • A relational database stores information in two-dimensional tables which represent relations. We showed in our previous example how we stored information about all the employees of a company. In our relational database, we created two tables to store different pieces of information about the employees such as their salary table, and their overtime month work table..
  1. A relational database can contain one or many tables.
  2. A table is the basic storage structure of an RDBMS.
  3. A single row represents all data required for a particular entity or object.
  4. Each row in a table should be identified by a primary key, which allows no duplicate rows.
  5. The order of rows is not significant when data is stored. The row order is defined when the data is retrieved.
  6. A column or attribute in a table should contain the primary key.
  7.  The primary key identifies a unique value for each entity or object in the EMP table and generally can not be changed.
  8. A column or attribute that is not a key value represents one kind of data in a table.
  9. The order of columns is not significant during data is stored. The column order is defined when the data is retrieved.
  10. A foreign key is a column that defines how tables relate to each other.
  11. In the column of the foreign key of a table we keep the values of the primary key of another table.
  12. We cannot define foreign keys without existing primary (unique) keys
  13.  A data item or a table field can be found at the intersection of a row and a column with only one value in it.
  14. A data item or a table field may have no value in it. Thus, we can say that it has a null value.

Relating Multiple Tables and Data Models



The heart of the database design is the data model. Models are the corner stone of design. System designers develop models to explore ideas and improve the understanding of the database design.
Relating Multiple Tables

Each row of a table in a relational database schema contains data that describes exactly one entity.
The column structure of the table separates in pieces of information the logical data unit of the entity or object.
Thus, data about different entities is stored in different tables. Very often in order to answer to a more complicated query you have to combine two or more tables.

The solutions to this are foreign keys.
By using foreign keys we can relate the data in one table to the data in another.
We remind you that a foreign key is a column or a set of columns that refer to a primary key in the same table or another table.

Using Logical Data Models


Like other models, data models can be used for a variety of purposes, from high-level conceptual models to physical data models.
From the point of view of an object-oriented developer data modeling is conceptually similar to class modeling.
With data modeling we can identify entity types just as with class modeling we identify classes.
Attributes are assigned to entity types just as we would assign attributes to classes. There are relationships between entities, similar to the relationships between classes. Additionally inheritance, composition, and aggregation are all applicable concepts in data modeling.

Data models focus solely on data and lead us to explore data issues. Thus, from the point of view of data model developers data modeling must get the data "right" as opposed to object oriented modeling (ORM) developers who explore both the behavior and data aspects.

Thus, we will choose a logical data model (LDM) that should create a database of high performance as efficiently as possible and easy to maintain.
This model must be based on particular amount of data that it is not repetitive.
To eliminate duplication and enhance our tables data maintenance, we would create a table of possible values and will use a key to refer to these values.
Thus, if the value changes names, the change will take place in the master table once.
The reference will remain the same to this table throughout the foreign keys of the other tables.
Let us suppose for example that we are responsible for maintaining a database of employees and the departments in which they work.
If 120 of these employees are in the same department, called "Development" this department name would appear 120 times in the table of employees.
If the person in charge of this department decides to change its name to "R&D" we must change 120 records to reflect the new name of department. 
If the database was designed under the model we mentioned earlier so that department names appeared in one separate table and just the department ID number was stored with the employee record in the employees table, we would only have to change one record in order to change the name and not 120.

The System Development Life Cycle



The creation of a database as an application information system must always follow the development life cycle through multiple stages of development.
Thus, a top-down approach to database development transforms business information requirements into an operational database. This approach is known as waterfall model and is depicted in the following figure.

Database Development Life Cycle




Strategy and Requirements Analysis


In this phase we study and analyze the business requirements and decide which entities will compose our database.
We interview users and managers in order to identify the information requirements according to our system strategy.
We keep information for the future system specifications and choose the development data model for our database system.


We transfer the business procedures into graphical representations, confirm and refine the model in connection to the system requirements and business needs and rules.

Design

We design the database based on the model that was developed in the strategy and requirements analysis phase.
Logical database design is also referred to as the logical model.
Thus, the relational database design is the process of arranging data into logical, organized groups of entities or objects which can easily be maintained.
Especially the logical design of a relational database should reduce data repetition and completely eliminate it.


This design approach follows a methodology (technique) that is called normalization.
Normalization is the methodology that is used when designing and redesigning a relational database. Additionally, naming conventions should also be standard and logical.


Implementation

In this phase we write SQL code in order to create the tables of the database. The implementation stage contains the creation of the prototype system and all necessary work in order to build the documentation of the database application.
This documentation includes user information, help text, and operations manuals to support the operation and the use of the system.

Integration and Distribution


The phase of integration and distribution of the database application refines the prototype.
We transfer the database application into production with user acceptance testing, conversion of existing data, and parallel operations.
We verify the results and make any modifications required in order to integrate the final operation and performance of the system. We distribute the database application to the users.
Also, we test periodically the system and enhance its performance.


The Logical Data Model approach



The Data Model gives us the means to describe the information held by the system and its inter-relationships. In the following figure we show the components of the Logical Data Model (LDM).


Each of the components above may be documented:
  • Entities as Entity Descriptions
  • Relationships as Relationship Descriptions
  • Attributes as Attribute Descriptions
During the phases of strategy, requirements analysis and design we make our Logical Data Model Product Breakdown Structure (PBS)

- Identify the entities of the database
- Identify the entities relationships
- Build the data model

During the phase of implementation

we make
- the entities with their attributes (tables creation)
- the normalisation process for each table of data model
- the work that identifys the primary and foreign keys - the documentation of the system entities, relationships, and attributes.

During the phase of integration
and distribution
we make
- the validation of the data model with business rules and with users
- the work for the formal Quality Assurance
- all necessary modifications required to integrate the final system operation
- the work of database distribution to users
- the work to increase the database performance


Assigning relationships in a Logical Data Model



In order to assign the relationships in a logical data model we present the following example of customer orders using the notation of Barker.
This logical model shows both how the customers place orders and how the orders are placed by customers.



The verbs or sort expressions between entities (tables) identify the relationships and usually show their direction.

Thus, as you can see in the figure of the logical model above customers place one or more orders and any given order is placed by one customer only.


It additionally shows that a customer can live at one or more addresses and any given address can have zero or more customers who live there.

In the next section we will talk about data normalization, a methodology which leads to the reduction and even to the elimination of data redundancy.
This procedure is very important for database developers who must avoid storing entities or objects in a relational database which must not maintain the same information in several places.


Normalization of a Relational Database Schema



The normalization process, was introduced by Codd in 1970, and its basic function is to apply upon a database schema a set of tests in order to find if the database belongs to a normal form or not.
The normalization process

Codd presented three normal forms (1NF, 2NF and 3NF) which are connected with functional dependencies among the fields of the database tables.


Later two other normal forms are added to normalization methodology (4NF and 5NF) which are based on the multi-valued dependencies and join dependencies.
In this book we discuss the three normal forms of
Codd and focus on them in detail.
Under the normalization procedure a relational database schema is transformed in each stage in a new form analyzing the tables structure based on their primary keys and the functional dependencies that exist among their fields.
These structures must satisfy some conditions.
If they do not, the tables of database are divided into new tables (fewer than before) so that the conditions are met for each one of them. A normal form characterizes a relation (table) in connection with the types of dependencies that there are among its fields.

These dependencies in most of the cases cause problems to the database behavior - such us repeating groups of fields, positional dependency of data and non-atomic data - they must be removed from the tables.
Thus, normalization step by step undertakes to remove from tables structure all these conflict dependencies. 
The following figure shows the step by step process of a database tables normalization.




In the sections that follow we will attempt to simplify the process of normalization as much as possible.
 In your journey in SQL and MySQL database management it is very important to understand the basic concepts.
However, normalization is absolutely one of the most important parts of database design and you must know it if you want to create professional relational database schemas.


The Zero Normal Form - 0NF of a Relational Database

A database that contains data that is common to one or more different tables needs normalization. In such a case there are many reasons why normalization is needed.
Some of them are security, disk space usage, speed of queries, efficiency of database updates, and most importantly data integrity.
As we mentioned earlier normalization is the process of breaking tables into smaller tables whose form is structured by a better logical design.
To follow the normalization process, we pass our database design through the different forms in order.
Through a top-down processing a database schema starts from the normal form - 0NF to become the first normal form - 1NF schema and after that to become the second normal form - 2NF and finally to become the third normal form - 3NF schema.
In each step of normalization, we add more specific conditions that must be satisfied by the schema of the normal form.
A zero normal form - 0NF database before normalization has not been broken down logically into smaller tables.
A spreadsheet layout for a database table is an example of a zero normal form - 0NF.
In this spreadsheet table structure there are:

  • Repeating groups of fields
  • Positional dependency of data
  • Non-Atomic data
The following table is in spreadsheet design:

 

The First Normal Form - 1NF



To transform a table in a first normal form - 1NF, we must remove all repeating groups of fields.
Thus, the intersection of one row and one column of the table corresponds to a simple value (atomic data).

To do this we must eliminate the repetition of information and create separate tables for related data.

Transforming the initial Departments table of our example as we described earlier we created the new table with the 1NF structure as follows.



Another approach to 1NF is to create a new table removing from the initial table the fields that cause problems.

The new table Dept_Locations will contain the field deptLacation with the primary key of the table Departments.

Thus, in 1NF our data model will be transformed in two tables as follows.



Departments                     Dept_Locations



As you can see the tables above are in 1NF, as well as its fields have only simple values. But, there are still repeating data in these tables.
That means during their content management problems will arise with the insertion, modification and deletion operations.
To fix this situation we transform the table from 1NF to 2NF.



The Second Normal Form - 2NF



To improve the structure of a database table in order to come to the second normal form - 2NF, we must be in the first normal form - 1NF.
In the second normal form procedure we remove all partial dependencies thatexist among the fields of a table.

Thus, a table is in 2NF if all attributes (columns-fields) that are not part of the primary key are fully functionally dependent on the primary key.
We know that the primary key must uniquely identify a single row in a table.
The relationship between the two entities Departments and Dept_Locations is many-to-many since one department can have more than one locations and vice versa in one location more than one departments can exist.



To remove all partial dependencies that there are among the fields in the tables above we create an intermediate table with contents the primary keys of the two other.

Thus, the many-to-many relationship of the two tables is transformed to one-to-many relationships with the new tableto both of them.




In our example after the creation of the intermediate table the contents of the tree tables will be as follows.

The Third Normal Form - 3NF



In order to put a table in the third normal form - 3NF, we must first put it in second normal form 2NF, and after this to remove all transitive dependencies which may exist among its fields.
Thus, in this stage of normalization no attributes depend on other non-key attributes.
In the second normal form the attributes of a table must depend on the whole key. In the third normal the attributes of a table must depend on nothing but the key.

So what is a transitive dependency?

A functional dependency A B is called transitive when there is a field C which is not part of primary key of the table, such as:

C=C(A) and B=B(C)

In order to remove these dependencies we work as earlier breaking the tables that contain them into others without transitive dependencies until the new tables achieve the 3NF.

Look at the following employees_master table.



In this table we have a transitive dependency between Zip field and City and State fields - what ever that means since a zip tells us what state and what city an address is in.
Therefore, to make this table in third normal form we need to split out City, State and Zip fields into a table of there own and then just have the zip in the employee table:

Thus, the initial employees _master table will be broken in two tables that will be associated as in the following figure.



The Boyce-Codd Normal Form, BCNF



The Boyce-Codd normal form, also known as BCNF is a variation of third normal form. To consider a table (relation) to be in BCNF, it must be in third normal form and two rules must be satisfied.
That is, all the functional dependencies must have a super key on the left side.
This is most frequently the case without our having to take any extra steps, as in this example.
If we have a dependency that breaks this rule, we must again decompose it as we did to get into 1NF, 2NF, and 3NF.

The Boyce-Codd normal form, is used to remove abnormal situations which are caused when a table has more than one candidate primary keys that are composed by more than one fields (attributes).
The second rule that must be satisfied in the Boyce-Codd normal form is the entity integrity constraint, through which none of the fields of the primary key of the table can take null value.

The definition of Boyce-Codd normal form is the following.

« One relationship is in Boyce – Codd normal form (BCNF), if and only if every field of the table which identifies uniquely another field, is a candidate primary key of the table

To understand better this normal form let us examine the database schema of an educational organization.
In our data model students can be enrolled in the institution for more than one lessons.
The secretary department will keep records for students, instructors and lessons following the data model below.
  • Each student can get more than one lessons.
  • For each lesson the student has only one instructor.
  • There are many instructors that teach each lesson.
  • Each instructor can teach only one lesson.
  • Each instructor teaches many students simultaneously.
Look at the instance of the table Lessons in which the data of this application is stored.

Lessons{student_ID, lesson ,instructor}



Now we can define the functional dependencies among its fields.
Thus, each student can attend many lessons from which we can take the

 student_ID --> student_ID lesson (multivalued dependence)

by knowing the student ID and some of the lessons that he or she attends we can identify the instructor who teach it.


{student_ID, lesson} instructor

each lesson is taught by many instructors, and we can have each of them from the names of the lessons they teach.

lesson instructor (multivalued dependency)

by knowing the name of the instructor we can identify the lesson that he or she teaches.

instructor lesson (multivalued dependency)

each instructor can teach many students simultaneously.

Thus if we know the name of instructor we can get all students who attend his or her lesson.

instructor student_ID (multivalued dependency)


In our example above we are observing that there are two candidate primary keys for the table Lessons.
These primary keys are composed by the {Student _Id, Lesson} and {Student _Id, instructor} fields as well as these combinations identify the value of instructor and the value of lesson fields (attributes) respectively.

If we choose as a primary key the first combination {Student _Id, Lesson} we can say that the table is in the third normal form - 3NF, as well as there are no transitive dependencies in its fields.

But, we are not sure that we will not face abnormal situations during the operation of records insertion, deletion and modification.
These problems could be caused form a table that has two candidate combined keys, which have a common field as the field Student _ID in table Lessons in our example.
If suddenly the student with ID=17 made a new decision and another lesson was selected to be attended by him or her instead of mathematics then what will happen?

Indeed, the record with information about the instructor Green who teaches mathematics will be lost.


To bring the table Lessons in Boyce-Codd normal form (BCNF) we must split it in smaller tables in order to be in BCNF.

Thus, the field instructor which identifies some other field but it does not belong to the candidate primary key of the table, will be put in another table (relation) as the primary key.

Then, the new two tables will be as in the schema that follows.
STUDENTS {student ID, instructor} and INSTRUCTORS {instructor, lesson}.

As you can see the new two tables follow the BCNF and the foreign key instructor of STUDENTS table is the primary key of the INSTRUCTORS table.


The Forth Normal Form - 4NF



In order to bring a table in the forth normal form - 4NF, we will first have to put it in third normal form 3NF, and then to remove all multi-valued dependencies which may exist among its fields.
Transforming a table to Boyce-Codd normal form (BCNF),

we are sure that its structure does not contain functional dependencies
.
But sometimes a relationship could appear between two fields one of which has multi-valued dependencies.
The table COURSES whose an instance of data is presented below is in Boyce-Codd normal form having a primary key that is composed by the combination of the tree fields lesson, instructor and textbook since all fields that identify other fields belong to the primary key.
But, as you can see there are even data which all cause abnormal situations which we have mentioned previously. In our data instant example, a lesson could be taught by one or more instructors who can select their favorite textbook based on their teaching.
The schema of the this table in the 3NF or BCNF will be the following

COURSES {lesson, instructor, textbook}


In order to remove these multi-valued dependencies we will have to break the initial table COURSES to smaller tables which will be structured in the following schema.

TEACHER {instructor, lesson}     BOOKS {textbook, lesson}


Now, these tables are in the 4NF.



Benefits of Normalization



The normalization of a database provides a schema of relationships (tables) that makes the users requests in order to access the tables of the database as well as the administrator (DBA) efforts for the overall management of every entity (object) in the database.
On the other hand through normalization we achieve reduction of data redundancy which simplifies data structures and conserves disk space.
Thus, the possibility to have inconsistent data is minimized because duplicate data is minimized.
It is much easier to manage small tables that have been created by the normalization process than to modify big tables that hold all the vital data in the database.
Under a normalized schema of tables that is contained by small tables we obtain more flexibility to retrieve the structure of the database.

Additionally, we can control the security of a database when it is normalized. Security takes place in connection with the Database Administrator (DBA) who is responsible to give access rights and privileges to certain users.
So, the better handling on database security demands the database to be normalized.
When normalization has occurred we have data integrity which is the assurance of consistent and accurate data within a database.
In terms of database performance, normalization brings a disadvantage that is the database performance reduction.
 
This occurs because we need more input/output (I/O) processing, CPU usage and memory usage in order to handle SQL queries for a normalized database with many small tables which are associated with primary and foreign keys.

But you can solve this problem implementing software techniques or methodologies that try to reduce the Database Server response time and achieve better database performance.

For this critical matter that concerns the Database Driven Site Optimization we have developed our own Session Scripting Programming (SSP) methodology as a Server scripting programming approach that optimizes the Web Server performance completely.









Copyright © euRoCec.eu

The content and services of eurocec.eu are offered under the terms of use which the visitors have accepted.


Sectors: Publishing, e-Business Development, Financial and Investment Consulting, Web Services, WebSite Management, WebSite Optimization, Search Engines Optimization, Web Business Plans, Email Marketing.