In a(N) _____, Each Record Can Have Multiple Parent and Child Records.

four Chapter 4: Data and Databases

Dave Conservative and David T. Bourgeois

Learning Objectives

Upon successful completion of this chapter, you will exist able to:

draw the differences between data, information, and knowledge;
ascertain the term database and identify the steps to creating ane;
describe the office of a database management system;
describe the characteristics of a data warehouse; and
define information mining and describe its role in an organization.

Please note, at that place is an updated edition of this book available at https://opentextbook.site. If you are not required to use this edition for a course, you may want to bank check it out.

Introduction

You lot have already been introduced to the start two components of information systems: hardware and software. Nevertheless, those two components by themselves do not make a computer useful. Imagine if you turned on a computer, started the word processor, but could non save a certificate. Imagine if you opened a music actor simply there was no music to play. Imagine opening a spider web browser merely in that location were no web pages. Without data, hardware and software are non very useful! Information is the third component of an information system.

Information, Information, and Knowledge

Data are the raw bits and pieces of information with no context. If I told yous, "xv, 23, 14, 85," you would not take learned annihilation. But I would have given y'all data.

Information can be quantitative or qualitative. Quantitative data is numeric, the result of a measurement, count, or some other mathematical calculation. Qualitative data is descriptive."Ruby Cherry," t he color of a 2013 Ford Focus, is a n instance of qualitative data. A number tin be qualitative too: if I tell y'all my favorite number is 5, that is qualitative information because it is descriptive, not the consequence of a measurement or mathematical calculation.

By itself, data is not that useful. To be useful, information technology needs to be given context. Returning to the example to a higher place, if I told you lot that "15, 23, 14, and 85″ are the numbers of students that had registered for upcoming classes, that would exist information . By adding the context – that the numbers represent the count of students registering for specific classes – I take converted data into information.

One time nosotros have put our data into context, aggregated and analyzed it, we can use it to make decisions for our organization. We can say that this consumption of information produces knowledge. This knowledge tin be used to make decisions, ready policies, and even spark innovation.

The final step up the information ladder is the step from noesis (knowing a lot well-nigh a topic) towisdom. We can say that someone has wisdom when they tin can combine their cognition and experience to produce a deeper understanding of a topic. It oftentimes takes many years to develop wisdom on a particular topic, and requires patience.

Examples of Data

Almost all software programs require data to do anything useful. For example, if you are editing a document in a word processor such as Microsoft Word, the document you are working on is the data. The give-and-take-processing software can manipulate the information: create a new document, duplicate a document, or modify a certificate. Some other examples of data are: an MP3 music file, a video file, a spreadsheet, a web folio, and an e-book. In some cases, such as with an eastward-volume, you may only take the ability to read the data.

Databases

The goal of many data systems is to transform data into information in order to generate cognition that can be used for decision making. In order to do this, the system must be able to have data, put the information into context, and provide tools for assemblage and analysis. A database is designed for just such a purpose.

A database is an organized collection of related information. Information technology is anorganized collection, because in a database, all information is described and associated with other data. All information in a database should be related as well; separate databases should be created to manage unrelated information. For instance, a database that contains information about students should non also hold data about company stock prices. Databases are not always digital – a filing cabinet, for instance, might be considered a form of database. For the purposes of this text, we will but consider digital databases.

Relational Databases

Databases tin exist organized in many unlike means, and thus take many forms. The most popular form of database today is the relational database. Popular examples of relational databases are Microsoft Access, MySQL, and Oracle. A relational database is one in which information is organized into 1 or more tables. Each table has a set of fields, which ascertain the nature of the data stored in the table. A record is i instance of a fix of fields in a table. To visualize this, retrieve of the records as the rows of the table and the fields as the columns of the tabular array. In the example below, we have a table of student information, with each row representing a student and each cavalcade representing one slice of data near the student.

In a relational database, all the tables are related past one or more fields, so that it is possible to connect all the tables in the database through the field(s) they have in common. For each table, i of the fields is identified as a primary key. This cardinal is the unique identifier for each record in the table. To assist you lot empathize these terms further, let's walk through the procedure of designing a database.

Designing a Database

Suppose a university wants to create an information system to track participation in educatee clubs. Subsequently interviewing several people, the blueprint team learns that the goal of implementing the organisation is to requite improve insight into how the university funds clubs. This will be achieved by tracking how many members each club has and how active the clubs are. From this, the squad decides that the system must keep track of the clubs, their members, and their events. Using this data, the design team determines that the following tables need to be created:

Clubs: this will runway the club name, the club president, and a short clarification of the club.
Students: student name, e-mail, and twelvemonth of birth.
Memberships: this table volition correlate students with clubs, allowing u.s. to take any given educatee join multiple clubs.
Events: this tabular array will track when the clubs meet and how many students showed upward.

Now that the design team has adamant which tables to create, they need to define the specific information that each table will hold. This requires identifying the fields that will be in each table. For example, Club Proper name would be one of the fields in the Clubs table. First Proper noun and Last Proper name would be fields in the Students table. Finally, since this will exist a relational database, every tabular array should take a field in common with at to the lowest degree i other tabular array (in other words: they should accept a human relationship with each other).

In order to properly create this relationship, a primary key must exist selected for each table. This key is a unique identifier for each record in the table. For case, in the Students tabular array, it might be possible to use students' concluding proper noun as a way to uniquely identify them. Nevertheless, information technology is more probable that some students will share a final proper name (similar Rodriguez, Smith, or Lee), then a dissimilar field should be selected. A pupil's e-mail accost might exist a practiced selection for a primary key, since e-mail addresses are unique. Even so, a primary key cannot alter, so this would hateful that if students changed their e-mail accost we would accept to remove them from the database so re-insert them – non an attractive proffer. Our solution is to create a value for each educatee — a user ID — that will act as a primary central. We will also exercise this for each of the student clubs. This solution is quite common and is the reason you have so many user IDs!

Y'all can see the last database design in the effigy below:

With this design, not only practice we have a way to organize all of the information we need to meet the requirements, but we accept also successfully related all the tables together. Hither'south what the database tables might expect like with some sample data. Note that the Memberships table has the sole purpose of assuasive us to relate multiple students to multiple clubs.

Student clubs table with sample data

Student table with sample data

Normalization

When designing a database, one important concept to understand is normalization. In simple terms, to normalize a database means to blueprint information technology in a way that: 1) reduces duplication of data between tables and 2) gives the table as much flexibility every bit possible.

In the Student Clubs database blueprint, the blueprint squad worked to achieve these objectives. For example, to track memberships, a elementary solution might have been to create a Members field in the Clubs table and and then just list the names of all of the members there. However, this design would mean that if a student joined two clubs, and so his or her information would accept to be entered a second time. Instead, the designers solved this trouble by using ii tables: Students and Memberships.

In this design, when a student joins their first club, we first must add the pupil to the Students table, where their start name, terminal proper noun, electronic mail address, and birth year are entered. This add-on to the Students table will generate a educatee ID. Now we will add a new entry to denote that the student is a member of a specific club. This is accomplished by adding a record with the student ID and the club ID in the Memberships table. If this student joins a 2d gild, we practice non have to duplicate the entry of the student'southward name, e-mail, and birth twelvemonth; instead, nosotros merely need to make another entry in the Memberships table of the second club's ID and the student'southward ID.

The design of the Student Clubs database also makes it simple to change the blueprint without major modifications to the existing structure. For example, if the blueprint squad were asked to add functionality to the organization to track faculty advisors to the clubs, we could easily accomplish this by adding a Faculty Advisors tabular array (similar to the Students table) and and so calculation a new field to the Clubs table to hold the Kinesthesia Advisor ID.

Information Types

When defining the fields in a database table, nosotros must give each field a information blazon. For example, the field Nascency Year is a year, so information technology volition exist a number, while Start Name will be text. Most modernistic databases allow for several different information types to exist stored. Some of the more than common data types are listed here:

Text: for storing not-numeric data that is brief, generally nether 256 characters. The database designer tin identify the maximum length of the text.
Number: for storing numbers. In that location are usually a few different number types that can exist selected, depending on how large the largest number volition be.
Yep/No: a special class of the number information blazon that is (usually) one byte long, with a 0 for "No" or "False" and a i for "Yes" or "True".
Date/Fourth dimension: a special form of the number data type that can be interpreted as a number or a time.
Currency: a special class of the number data type that formats all values with a currency indicator and 2 decimal places.
Paragraph Text: this data blazon allows for text longer than 256 characters.
Object: this data type allows for the storage of data that cannot be entered via keyboard, such as an image or a music file.

In that location are two of import reasons that we must properly define the information type of a field. First, a data type tells the database what functions can be performed with the data. For case, if we wish to perform mathematical functions with ane of the fields, we must exist sure to tell the database that the field is a number data type. So if we have, say, a field storing birth year, we can subtract the number stored in that field from the current year to go age.

The 2d important reason to define data type is and so that the proper amount of storage infinite is allocated for our data. For example, if the First Proper noun field is divers as a text(50) data type, this means 50 characters are allocated for each first name we want to store. However, fifty-fifty if the first name is only 5 characters long, fifty characters (bytes) volition be allocated. While this may not seem like a big deal, if our table ends upwardly belongings 50,000 names, we are allocating 50 * 50,000 = 2,500,000 bytes for storage of these values. It may be prudent to reduce the size of the field and so we practice not waste matter storage space.

Sidebar: The Departure between a Database and a Spreadsheet

Many times, when introducing the concept of databases to students, they speedily decide that a database is pretty much the same as a spreadsheet. After all, a spreadsheet stores data in an organized manner, using rows and columns, and looks very similar to a database table. This misunderstanding extends across the classroom: spreadsheets are used as a substitute for databases in all types of situations every day, all over the globe.

To exist fair, for unproblematic uses, a spreadsheet tin substitute for a database quite well. If a uncomplicated listing of rows and columns (a unmarried tabular array) is all that is needed, then creating a database is probably overkill. In our Student Clubs example, if we just needed to rails a listing of clubs, the number of members, and the contact data for the president, nosotros could get abroad with a single spreadsheet. However, the demand to include a list of events and the names of members would be problematic if tracked with a spreadsheet.

When several types of data must be mixed together, or when the relationships between these types of information are complex, so a spreadsheet is not the best solution. A database allows information from several entities (such equally students, clubs, memberships, and events) to all be related together into 1 whole. While a spreadsheet does allow y'all to ascertain what kinds of values can exist entered into its cells, a database provides more than intuitive and powerful ways to define the types of data that become into each field, reducing possible errors and assuasive for easier analysis.

Though not good for replacing databases, spreadsheets can be ideal tools for analyzing the data stored in a database. A spreadsheet package can exist connected to a specific table or query in a database and used to create charts or perform analysis on that data.

Structured Query Linguistic communication

In one case y'all have a database designed and loaded with data, how will you do something useful with it? The primary fashion to piece of work with a relational database is to utilise Structured Query Language, SQL (pronounced "sequel," or simply stated as S-Q-Fifty). Well-nigh all applications that work with databases (such as database direction systems, discussed below) make use of SQL as a way to analyze and manipulate relational data. Equally its name implies, SQL is a linguistic communication that can be used to piece of work with a relational database. From a elementary request for data to a complex update operation, SQL is a mainstay of programmers and database administrators. To give you a taste of what SQL might wait like, hither are a couple of examples using our Student Clubs database.

The following query will retrieve a list of the first and last names of the club presidents:

SELECT "Kickoff Proper noun", "Last Name" FROM "Students" WHERE "Students.ID" = "Clubs.President"

The following query will create a listing of the number of students in each order, listing the club proper name and then the number of members:

SELECT "Clubs.Social club Name", COUNT("Memberships.Student ID") FROM "Clubs" LEFT JOIN "Memberships" ON "Clubs.Social club ID" = "Memberships.Gild ID"

An in-depth description of how SQL works is beyond the scope of this introductory text, but these examples should give you an thought of the power of using SQL to dispense relational data. Many database packages, such equally Microsoft Access, let y'all to visually create the query you want to construct and so generate the SQL query for you.

Other Types of Databases

The relational database model is the about used database model today. Notwithstanding, many other database models exist that provide unlike strengths than the relational model. The hierarchical database model, popular in the 1960s and 1970s, continued data together in a hierarchy, allowing for a parent/child relationship betwixt data. The document-centric model allowed for a more than unstructured information storage by placing data into "documents" that could so be manipulated.

Perhaps the almost interesting new development is the concept of NoSQL (from the phrase "not but SQL"). NoSQL arose from the need to solve the trouble of large-scale databases spread over several servers or even across the world. For a relational database to piece of work properly, it is important that just i person exist able to manipulate a piece of data at a time, a concept known as tape-locking. But with today's large-calibration databases (call back Google and Amazon), this is just not possible. A NoSQL database tin work with information in a looser way, assuasive for a more unstructured environment, communicating changes to the data over time to all the servers that are role of the database.

Database Management Systems

Screen shot of the Open Role database management system

To the computer, a database looks like ane or more files. In order for the data in the database to exist read, changed, added, or removed, a software program must access it. Many software applications accept this ability: iTunes can read its database to give y'all a list of its songs (and play the songs); your mobile-phone software can interact with your list of contacts. Merely what about applications to create or manage a database? What software can you utilise to create a database, alter a database'due south structure, or but do analysis? That is the purpose of a category of software applications chosen database direction systems (DBMS).

DBMS packages by and large provide an interface to view and change the pattern of the database, create queries, and develop reports. Most of these packages are designed to piece of work with a specific type of database, but generally are compatible with a wide range of databases.

For example, Apache OpenOffice.org Base (see screen shot) can be used to create, modify, and analyze databases in open-database (ODB) format. Microsoft's Access DBMS is used to work with databases in its own Microsoft Access Database format. Both Admission and Base have the ability to read and write to other database formats every bit well.

Microsoft Access and Open up Function Base of operations are examples of personal database-management systems. These systems are primarily used to develop and analyze single-user databases. These databases are non meant to be shared across a network or the Net, but are instead installed on a item device and work with a single user at a time.

Enterprise Databases

A database that can only be used by a single user at a time is not going to meet the needs of most organizations. Every bit computers have become networked and are at present joined worldwide via the Internet, a grade of database has emerged that can be accessed by two, x, or fifty-fifty a million people. These databases are sometimes installed on a single computer to be accessed by a group of people at a single location. Other times, they are installed over several servers worldwide, meant to be accessed by millions. These relational enterprise database packages are congenital and supported by companies such as Oracle, Microsoft, and IBM. The open up-source MySQL is likewise an enterprise database.

As stated earlier, the relational database model does not scale well. The term scale here refers to a database getting larger and larger, existence distributed on a larger number of computers connected via a network. Some companies are looking to provide large-scale database solutions by moving away from the relational model to other, more flexible models. For example, Google now offers the App Engine Datastore, which is based on NoSQL. Developers can use the App Engine Datastore to develop applications that access data from anywhere in the earth. Amazon.com offers several database services for enterprise employ, including Amazon RDS, which is a relational database service, and Amazon DynamoDB, a NoSQL enterprise solution.

Big Information

A new buzzword that has been capturing the attention of businesses lately is big data. The term refers to such massively big data sets that conventional database tools do not have the processing power to analyze them. For instance, Walmart must process over one million customer transactions every hour. Storing and analyzing that much data is beyond the ability of traditional database-management tools. Understanding the all-time tools and techniques to manage and analyze these big information sets is a problem that governments and businesses alike are trying to solve.

Sidebar: What Is Metadata?

The term metadata can be understood as "data about data." For example, when looking at one of the values of Year of Birth in the Students table, the data itself may be "1992". The metadata about that value would be the field name Year of Birth, the time it was final updated, and the data type (integer). Another example of metadata could be for an MP3 music file, like the one shown in the paradigm below; information such as the length of the song, the artist, the album, the file size, and even the album comprehend fine art, are classified every bit metadata. When a database is existence designed, a "data dictionary" is created to hold the metadata, defining the fields and structure of the database.

Metadata about a camera paradigm (Public Domain)

Data Warehouse

As organizations accept begun to utilize databases equally the centerpiece of their operations, the need to fully understand and leverage the data they are collecting has become more than and more than credible. Still, directly analyzing the data that is needed for day-to-mean solar day operations is non a proficient idea; nosotros do not want to tax the operations of the visitor more we need to. Further, organizations likewise want to analyze information in a historical sense: How does the data we take today compare with the same set of data this fourth dimension concluding calendar month, or last yr? From these needs arose the concept of the data warehouse.

The concept of the data warehouse is simple: excerpt data from i or more of the organization'due south databases and load information technology into the data warehouse (which is itself another database) for storage and assay. All the same, the execution of this concept is not that simple. A data warehouse should exist designed then that it meets the following criteria:

It uses non-operational information. This means that the information warehouse is using a copy of data from the agile databases that the visitor uses in its day-to-day operations, so the data warehouse must pull data from the existing databases on a regular, scheduled footing.
The data is time-variant. This means that whenever data is loaded into the data warehouse, information technology receives a time stamp, which allows for comparisons between different time periods.
The data is standardized. Considering the data in a data warehouse usually comes from several different sources, it is possible that the data does non use the aforementioned definitions or units. For example, our Events tabular array in our Student Clubs database lists the event dates using the mm/dd/yyyy format (eastward.one thousand., 01/10/2013). A table in some other database might use the format yy/mm/dd (e.g., 13/01/x) for dates. In club for the data warehouse to friction match upwardly dates, a standard engagement format would have to be agreed upon and all information loaded into the data warehouse would have to be converted to employ this standard format. This process is called extraction-transformation-load (ETL).

There are two primary schools of thought when designing a data warehouse: lesser-upwards and top-downwardly. The bottom-up approach starts past creating minor information warehouses, called data marts, to solve specific business issues. Equally these data marts are created, they can be combined into a larger data warehouse. The top-down approach suggests that we should start by creating an enterprise-wide data warehouse and and then, as specific business needs are identified, create smaller information marts from the information warehouse.

Data warehouse process (top-down) — Data warehouse procedure (top-down)

Benefits of Data Warehouses

Organizations discover information warehouses quite beneficial for a number of reasons:

The procedure of developing a data warehouse forces an organization to better understand the data that it is currently collecting and, equally important, what data is non being collected.
A data warehouse provides a centralized view of all data being collected across the enterprise and provides a means for determining information that is inconsistent.
Once all data is identified as consistent, an organization can generate one version of the truth. This is important when the company wants to study consequent statistics about itself, such as revenue or number of employees.
By having a data warehouse, snapshots of information can be taken over fourth dimension. This creates a historical record of data, which allows for an analysis of trends.
A data warehouse provides tools to combine data, which can provide new data and analysis.

Information Mining

Data mining is the procedure of analyzing data to find previously unknown trends, patterns, and associations in society to make decisions. Mostly, data mining is achieved through automated means against extremely big data sets, such as a data warehouse. Some examples of data mining include:

An analysis of sales from a large grocery concatenation might determine that milk is purchased more than frequently the solar day after it rains in cities with a population of less than 50,000.
A bank may find that loan applicants whose bank accounts prove particular deposit and withdrawal patterns are not good credit risks.
A baseball team may notice that collegiate baseball players with specific statistics in striking, pitching, and fielding make for more successful major league players.

In some cases, a data-mining projection is begun with a hypothetical result in mind. For example, a grocery chain may already take some thought that ownership patterns change afterward it rains and want to get a deeper agreement of exactly what is happening. In other cases, there are no presuppositions and a data-mining plan is run against big data sets in lodge to find patterns and associations.

Privacy Concerns

The increasing power of data mining has acquired concerns for many, especially in the surface area of privacy. In today's digital globe, it is becoming easier than ever to take data from disparate sources and combine them to practise new forms of assay. In fact, a whole industry has sprung up around this engineering: information brokers. These firms combine publicly accessible information with information obtained from the government and other sources to create vast warehouses of data about people and companies that they tin can and so sell. This discipline volition be covered in much more than detail in affiliate 12 – the affiliate on the upstanding concerns of information systems.

Business organisation Intelligence and Business concern Analytics

With tools such every bit data warehousing and information mining at their disposal, businesses are learning how to utilise information to their advantage. The term business intelligence is used to depict the process that organizations use to take data they are collecting and analyze it in the hopes of obtaining a competitive advantage. Too using data from their internal databases, firms oftentimes purchase data from data brokers to get a big-picture agreement of their industries. Business analytics is the term used to describe the utilise of internal visitor data to improve business processes and practices.

Knowledge Direction

We end the chapter with a discussion on the concept of knowledge management (KM). All companies accrue knowledge over the course of their existence. Some of this knowledge is written down or saved, only not in an organized style. Much of this noesis is not written down; instead, it is stored inside the heads of its employees. Knowledge management is the procedure of formalizing the capture, indexing, and storing of the company's noesis in club to benefit from the experiences and insights that the company has captured during its being.

Summary

In this affiliate, we learned virtually the office that data and databases play in the context of information systems. Data is made upwardly of modest facts and information without context. If you requite data context, then you lot accept data. Noesis is gained when information is consumed and used for decision making. A database is an organized collection of related data. Relational databases are the most widely used type of database, where data is structured into tables and all tables must be related to each other through unique identifiers. A database management arrangement (DBMS) is a software application that is used to create and manage databases, and tin take the form of a personal DBMS, used by one person, or an enterprise DBMS that tin can be used by multiple users. A data warehouse is a special form of database that takes data from other databases in an enterprise and organizes information technology for analysis. Data mining is the procedure of looking for patterns and relationships in large data sets. Many businesses use databases, data warehouses, and data-mining techniques in order to produce business organization intelligence and gain a competitive reward.

Study Questions

What is the deviation betwixt data, information, and noesis?
Explain in your ain words how the data component relates to the hardware and software components of information systems.
What is the difference betwixt quantitative data and qualitative data? In what situations could the number 42 be considered qualitative data?
What are the characteristics of a relational database?
When would using a personal DBMS make sense?
What is the departure between a spreadsheet and a database? List three differences betwixt them.
Describe what the term normalization means.
Why is it important to define the data type of a field when designing a relational database?
Proper noun a database you collaborate with frequently. What would some of the field names be?
What is metadata?
Name three advantages of using a data warehouse.
What is information mining?

Exercises

Review the design of the Student Clubs database earlier in this affiliate. Reviewing the lists of data types given, what information types would you assign to each of the fields in each of the tables. What lengths would y'all assign to the text fields?
Download Apache OpenOffice.org and use the database tool to open the "Student Clubs.odb" file available hither. Have some time to learn how to modify the database construction and and then meet if you can add the required items to support the tracking of faculty advisors, as described at the end of the Normalization section in the chapter. Hither is a link to the Getting Started documentation.
Using Microsoft Access, download the database file of comprehensive baseball statistics from the website SeanLahman.com. (If you don't accept Microsoft Admission, you can download an abridged version of the file here that is compatible with Apache Open Office). Review the structure of the tables included in the database. Come up up with three different data-mining experiments y'all would like to endeavour, and explicate which fields in which tables would accept to be analyzed.
Do some original inquiry and observe two examples of information mining. Summarize each example and then write about what the ii examples have in common.
Conduct some contained research on the process of business intelligence. Using at least two scholarly or practitioner sources, write a two-page paper giving examples of how business organization intelligence is being used.
Conduct some independent inquiry on the latest technologies being used for knowledge management. Using at least two scholarly or practitioner sources, write a two-page paper giving examples of software applications or new technologies being used in this field.

rogersquind1993.blogspot.com

Source: https://bus206.pressbooks.com/chapter/chapter-4-data-and-databases/