Lab4 : Database Security, Data Warehouse and Data Mining
PART A. General Database Security
Databases provide security features that control how a database is accessed
and used. Database security can be classified into two categories: system
security and data security. System security includes the mechanisms that control
the access and use of the database at the system level, for example, valid
username/password combination. Data security includes the control mechanisms at
the schema object level. In this part, we examine data security mechanisms with
privileges.
Create the following table, movieXY n your own Oracle
account. 'XY' stands for last two digits of your Oracle account. For example,
account S1G17 has Movie17 table.
CREATE TABLE movieXY (id INTEGER NOT NULL, title VARCHAR(70),
yr DECIMAL(4), score FLOAT, votes INTEGER, director INTEGER,
PRIMARY KEY (id));
Populate data into your movieXY from TA account using following SQL. INSERT INTO movieXY
SELECT * FROM s1g30.movie
WHERE yr > 1997;
COMMIT;
Suppose you wish to share parts of data in your table, MovieXY. with your
partners. Work with your team partner account and
additionally given Oracle account
.
From here, A is used to denote your account, B is your partner
account and C is given another account. Write SQL statements so that account
A(Owner) provides a minimum security extension for following independent
scenarios.
- Account B(Partner) can retrieve all properties of movies released in last
1 years.
- Every database user can retrieve all properties of movies released in last
2 year.
- Account B can retrieve, delete and update all properties of movies except
votes and score.
- Account B can retrieve and insert all properties of movies and also has a
privilege to grant these privileges to other users. Account B gives to account
C(another Partner) the privileges that he/she received. Then, account A(owner)
revokes all privileges that he/she gave to account B. Can account C still
retrieve movie data?
- Account B can retrieve and insert all properties of movies and also has a
privilege to grant these privileges to other users. Account B grants to
account C all privileges he/she received including a permission to grant these
privileges to others. Account C gives the privileges he/she received to
account B again. After that, account A revokes all privileges from account B.
Can account B still insert a movie data into movieXY ?
PART B. Data Warehouse
In part B and C, the goal is to expose us to
data warehouse and data mining with an interesting real dataset. Have a fun!
- Dataset
Let us consider a dataset about the passengers on
Titanic on April 15th, 1912, the day of its sinking after collision with an
iceberg. Review the original raw data file
titanic.csv
and a
documentation file
titanic.txt
containing a brief description of the dataset. Refer
"The 'Unusual Episode' Data Revisited"
to find more statistical information.
- Data Warehouse Design
A data warehouse is a relational database
designed for query and analysis rather than transaction processing. In this
part, we simulate data warehousing implementation with a titanic dataset.
Download
titanic.sql
which
is converted for data warehouse exercises. Review and execute it in your
account.
Fact tables and dimension tables are the two types of objects
commonly used in data warehouse schema. Fact tables typically contain facts
and foreign keys to the dimension tables. Dimension tables contain the
relatively static data in the warehouse. We assume that our Fact table has
SURVIVED as the measuare data to be analyzed and 3 dimension tables, CLASS,
AGE and SEX. Draw a simple entity relationship diagram based on the SQL script
file. Which schema design method is used for the titanic data warehousing
?
- Aggregation Querying
Aggregation is a fundamental part of data
warehousing. See the following tables in which the titanic's population and
the deaths are classified by age, gender, and class.
Write SQL statements
to produce relevant information of the following cross-tabulation, e.g., the
number of population exposed to risk by class and gender or by class and age.
(Hint)
- Use Oracle aggregating operations, e.g., CUBE, ROLLUP or GROUP BY
clause.
- One SQL statement per each table is enough to infer all values in it.
- Your query results don't need to be the same format of the reporting
tables. You can compare your query results with the cross-tabulation values
- For deaths per 100, truncate the value to 0 decimal place.
By Class and Gender
-----------------------------------------------------------------------
Population Exposed Number of Deaths per 100
to Risk Deaths Exposed to Risk
------------------------------------------------------------
Class Male Female Both Male Female Both Male Female Both
-----------------------------------------------------------------------
First 180 145 325 118 4 122 65 2 37
Second 179 106 285 154 13 167 86 12 58
Third 510 196 706 422 106 528 82 54 74
Crew 862 23 885 670 3 673 77 13 76
-----------------------------------------------------------------------
Total 1731 470 2201 1364 126 1490 78 26 67
By Class and Age
-----------------------------------------------------------------------
Population Exposed Number of Deaths per 100
to Risk Deaths Exposed to Risk
-------------------------------------------------------------
Class Adult Child Both Adult Child Both Adult Child Both
-----------------------------------------------------------------------
First 319 6 325 122 0 122 38 0 37
Second 261 24 285 167 0 167 63 0 58
Third 627 79 706 476 52 528 75 65 74
Crew 885 0 885 673 0 673 76 - 76
-----------------------------------------------------------------------
Total 2092 109 2201 1438 52 1490 68 47 67
Table: Population at Risk, Deaths, and Death Rates for the
Sinking of the Titanic
- Indexes in Data warehouse
Bitmap indexes are widely used in
data warehousing environments. Which columns in titanic fact table are
suitable for using bitmap indexes? Explain.
In comparing with a traditional
B-tree index, what is the advantage of Bitmap indexes on the selected columns
?
- Create a Bitmap index
Create bitmap indexes on the desirable
columns of the fact table(titanic) from Question 4 using the following syntax.
CREATE BITMAP INDEX index_name ON table_name (column_name);
Query a proper Oracle dictionary and check if the indexes you created
are bitmap index.
Compare the query performance of Question 3 between
before and after the index creation using the following SQL*Plus comment.
SQL>SET TIMING ON
SQL>SELECT statement;
SQL>SET TIMING OFF
PART C. Data Mining
Data mining consists of finding interesting trends
or patterns in large datasets. The identified patterns can give a data analyst
useful and unexpected insight can be observed rather than simple aggregation
results. Let's make decision about titanic fatalities using data mining
techniques.
- Data Mining Tool : CLEMENTINE
We use Clementine data mining tool
to identify interesting patterns in the titanic dataset.
Clementine 6.5 is
installed in EE/CSci 3-170 Lab, ME 308 lab and on the four grad lab windows
machines. To execute Clementine,
First, start Xwin32 - after the splash
screen comes up you can hit cancel, and then you should see a blue x in the
lower right hand corner of the monitor, in the system tray. As long as you see
the blue X there, Xwin32 is running. And then, run Clementine.
(Note) : Clementine references:
Tutorial,
Simple demo
- Dataset
Download
titanic_detailed.csv
which is the same with titanic fact table but is changed with category
names for analyzing easily in the data mining tool.
- Setup
- Execute Clementine.
- Set titanic_dm.data for data source using FILE Nodes. The fields in the
data file are seperated by commas. Check to get the field name from the
file.
- View records using TABLE Nodes.
- Before we find rules, we need to define type information for the fields.
Create a TYPE node, and make sure the survice field has the direction of OUT
and others have the direction of IN. Set type of all fields to AUTO SET.
- Building Rules
(1) Classification Rules
We consider
rules that can be represented by a tree from the titanic dataset.
- Attach C5.0 modeling Node to the TYPE Node.
- Make sure that Group symbol values is checked.
- Press Execute button.
- Browse the generated C5.0 model.
- Submit generated decision tree with instances/confidence values.
(2) Association Rules
Next, let's find association
rules from the titanic dataset.
- Attach APRIORI Node to the TYPE Node.
- For threshold values, give 15% to the Minimum Rule Coverage and 10% to
the Minimum Rule Accuracy.
- Press Execute button.
- Browse the generated Aprioir model.
- Submit generated rules with instances/confidence values.