CHAPTER 6
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Effective information system
provides users with accurate, timely, and relevant information.
Information is timely when
it is available to decision makers when it is needed. Information is relevant
when it is useful and appropriate for the types of work and decisions that
require it. To understand the problem,
let’s look at how information systems arrange data in computer files and
traditional methods of file management.
FILE
ORGANIZATION TERMS AND CONCEPTS
Computer system organizes
data in a hierarchy that starts with bits and bytes and progresses to fields,
records, files, and databases. A bit
represents the smallest unit of data a computer can handle. A group of bits,
called a byte, represents a single character, which can be a letter, a.
A computer system
organizes data in a hierarchy that starts with the bit, which represents either
a 0 or a 1. Bits can be grouped to form a byte to represent one character, number,
or symbol. Bytes can be grouped to form a field, and related fields can be
grouped to form a record. Related records can be collected to form a file, and
related files can be organized into a database. number, or another symbol.
A Field is the grouping of characters into a word, a group of words,
or a complete number (such as a person’s name or age).
Record is a group of related fields, such as the student’s name, the
course taken, the date, and the grade. a group of records of the same type is
called a file. A group of related
files makes up a database. An entity
is a person, place, thing, or event on which we store and maintain
information. Each characteristic or quality describing a particular entity is
called an attribute.
Course, Date, and Grade
are attributes of the entity COURSE. The specific values that these attributes
can have are found in the fields of the record describing the entity COURSE.
PROBLEMS WITH THE TRADITIONAL FILE ENVIRONMENT
In most organizations,
systems tended to grow independently without accompany-wide plan. Accounting,
finance, manufacturing, human resources, and sales and marketing all developed
their own systems and data files. program to operate.
The following are some of
the problems facing the traditional file environment.
Data Redundancy and Inconsistency
Data redundancy is the presence of
duplicate data in multiple data files so that the same data are stored in more
than place or location. Data redundancy occurs when different groups in an
organization independently collect the same piece of data and store it independently
of each other. Data redundancy wastes storage resources and also leads to data
inconsistency, where the same attribute may have different values.
Program-Data
Dependence
Program-data dependence refers to the coupling of
data stored in files and the specific programs required to update and maintain
those files such that changes programs require changes to the data. Every
traditional computer program has to describe the location and nature of the
data with which it works. In a traditional file environment, any change in a
software program could require a change in the data accessed by that program.
Lack
of Flexibility
A traditional file system
can deliver routine scheduled reports after extensive programming efforts, but
it cannot deliver ad hoc reports or respond to unanticipated information
requirements in a timely fashion. The information required by ad hoc requests
is somewhere in the system but may be too expensive to retrieve.
Poor
Security
Because there is little
control or management of data, access to and dissemination of information may
be out of control, Management may have no way of knowing who is accessing or
even making changes to the organization’s data.
Lack
of Data Sharing and Availability
Because pieces of
information in different files and different parts of the organization cannot
be related to one another, it is virtually impossible for information to be
shared or accessed in a timely manner. Information cannot flow freely across
different functional areas or different parts of the organization.
6.2
THE DATABASE APPROACH TO DATA MANAGEMENT
Database technology cuts
through many of the problems of traditional file organization. A more rigorous
definition of a database is a collection of data organized to serve many
applications efficiently by centralizing the data and controlling redundant
data. Rather than storing data in separate files for each application, data are
stored so as to appear to users as being stored in only one location. A single
database services multiple applications.
DATABASE
MANAGEMENT SYSTEMS
Database is a collection of data
organized to serve many applications efficiently by centralizing the data and
controlling redundant data.
A database management
system (DBMS) is software that permits an organization to centralize data,
manage them efficiently, and provide access to the stored data by application
programs. The DBMS acts as an interface between application programs and the
physical data files. When the application program calls for a data item, such
as gross pay, the DBMS finds this item in the database and presents it to the
application program. Using traditional data files, the programmer would have to
specify the size and format of each data element used in the program and then
tell the computer where they were located.
How
a DBMS Solves the Problems of the Traditional File Environment
A DBMS reduces data redundancy and
inconsistency by minimizing isolated files in which the same data are repeated.
The DBMS may not enable the organization to eliminate data redundancy entirely,
but it can help control redundancy. The
DBMS enables the organization to centrally manage data, their use, and
security.
Relational DBMS
Relational databases
represent data as two-dimensional tables (called relations). Tables may be
referred to as files. Each table contains data on an entity and its attributes.
Eg. Microsoft Access is a relational DBMS for desktop.
Operations
of a Relational DBMS
Relational database
tables can be combined easily to deliver data required by users, provided that
any two tables share a common data element.
.
Object-Oriented
DBMS
DBMS designed for
organizing structured data into rows and columns are not well suited to
handling graphics based or multimedia applications. Object-oriented databases
are better suited for this purpose.
An object-oriented
DBMS stores the data and procedures that act on those data as objects that
can be automatically retrieved and shared. Object-oriented database management systems
(OODBMS) are becoming popular because they can be used to manage the various
multimedia components or Java applets used in Web applications, which typically
integrate pieces of information from a variety of sources. Although object-oriented databases can store
more complex types of information than relational DBMS, they are relatively
slow compared with relational
Databases
in the Cloud
Cloud computing providers
offer database management services, but these services typically have less
functionality than their on-premises counterparts.
CAPABILITIES
OF DATABASE MANAGEMENT SYSTEMS
A DBMS includes
capabilities and tools for organizing, managing, and accessing the data in the
database. The most important are its data definition language, data dictionary,
and data manipulation language.
DBMS have a data
definition capability to specify the structure of the content of the
database. It would be used to create database tables and to define the
characteristics of the fields in each table. This information about the
database would be documented in a data dictionary. A data dictionary is
an automated or manual file that stores definitions of data elements and their
characteristics.
Querying
and Reporting
DBMS includes tools for
accessing and manipulating information in databases. Most DBMS have a specialized language called
a data manipulation language that is used to add, change, delete, and
retrieve the data in the database.
DESIGNING
DATABASES
To create a database, you
must understand the relationships among the data the type of data that will be
maintained in the database, how the data will be used, and how the organization
will need to change to manage data from a company-wide perspective. The
database requires both a conceptual design and a physical design. The
conceptual, or logical, design of a database is an abstract model of the
database from a business perspective, whereas the physical design shows how the
database is actually arranged on direct-access storage devices.
Normalization
and Entity-Relationship Diagrams
Is the process of creating small,
stable, yet flexible and adaptive data structures from complex groups of data?
DATA
WAREHOUSES
A data warehouse is
a database that stores current and historical data of potential interest to
decision makers throughout the company. Data warehouse consolidates and
standardizes information from different operational database so that the
information can be used across the enterprise for management information.
Data
Marts
A data mart is a subset
of a data warehouse in which a summarized or highly focused portion of the
organization’s data is placed in a separate database for a specific population
of users.
TOOLS
FOR BUSINESS INTELLIGENCE:
MULTIDIMENSIONAL DATA ANALYSIS AND
DATA MINING
Business intelligence
tools enable users to analyze data to see new patterns, relationships, and
insights that are useful for guiding decision making. Principal tools for business intelligence
include software for database querying and reporting, tools for multidimensional
data analysis (online analytical processing), and tools for data mining.
Online
Analytical Processing (OLAP)
This enable users to view
the same data in different ways using multiple dimensions. Each aspect
of information—product,
pricing, cost, region, or time period—represents a different dimension.
Data
Mining.
Data mining provides insights into corporate
data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and
inferring rules from them to predict future behavior.
The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of
information obtainable from data mining
include associations, sequences, classifications, clusters, and forecasts.
Text
Mining and Web Mining
Text mining tools are now available
to help businesses analyze these data. These tools are able to extract
key elements from large unstructured data sets, discover patterns and
relationships, and summarize the information.
Web mining is the discovery and analysis of useful patterns and information
from the world.
MANAGING DATA RESOURCES
ESTABLISHING
AN INFORMATION POLICY
An information policy specifies
the organization’s rules for sharing, disseminating, acquiring, standardizing,
classifying, and inventorying information.
Information policy lays out specific procedures and accountabilities, identifying
which users and organizational units can share information, where information
can be distributed, and who is responsible for updating and maintaining the
information
Data administration is responsible for the
specific policies and procedures through which data can be managed as an
organizational resource. These responsibilities include developing information
policy, planning for data, overseeing logical database design and data
dictionary development, and monitoring how information systems specialists and
end-user groups use data.
Data governance used to describe many of
these activities. Promoted by IBM, data governance deals with the policies and processes
for managing the availability, usability, integrity, and security of the data
employed in an enterprise, with special emphasis on promoting privacy, security,
data quality, and compliance with government regulations, the logical relations
among elements, and the access rules and security procedures. The functions it
performs are called database administration.
ENSURING DATA QUALITY
This is to ensure that data in an organizational databases
are accurate and reliable. If a
database is properly designed and enterprise-wide data standards established, duplicate or inconsistent data
elements should be minimal. Analysis of
data quality often begins with a data quality audit, which is a
structured survey of the accuracy and level of completeness of the data in an
information system. Data quality audits can be performed by surveying entire
data files, surveying samples from data files, or surveying end users for their
perceptions of data quality.
Data cleansing, also known as data scrubbing, consists of activities for detecting and
correcting data in a database that are incorrect, incomplete, improperly
formatted, or redundant. Data cleansing not only corrects errors but also
enforces consistency among different sets of data that originated in separate
information systems.