Lab4 : Database Security, Data Warehouse and Data Mining

PART A. General Database Security

Databases provide security features that control how a database is accessed and used. Database security can be classified into two categories: system security and data security. System security includes the mechanisms that control the access and use of the database at the system level, for example, valid username/password combination. Data security includes the control mechanisms at the schema object level. In this part, we examine data security mechanisms with privileges.

Create the following table, movieXY n your own Oracle account. 'XY' stands for last two digits of your Oracle account. For example, account S1G17 has Movie17 table.

    CREATE TABLE movieXY (id INTEGER NOT NULL, title VARCHAR(70), 
        yr DECIMAL(4), score FLOAT, votes INTEGER, director INTEGER, 
        PRIMARY KEY (id));

Populate data into your movieXY from TA account using following SQL.

   INSERT INTO movieXY
   SELECT * FROM s1g30.movie
   WHERE yr > 1997;
   COMMIT;

Suppose you wish to share parts of data in your table, MovieXY. with your partners. Work with your team partner account and additionally given Oracle account . From here, A is used to denote your account, B is your partner account and C is given another account. Write SQL statements so that account A(Owner) provides a minimum security extension for following independent scenarios.

Account B(Partner) can retrieve all properties of movies released in last 1 years.
Every database user can retrieve all properties of movies released in last 2 year.
Account B can retrieve, delete and update all properties of movies except votes and score.
Account B can retrieve and insert all properties of movies and also has a privilege to grant these privileges to other users. Account B gives to account C(another Partner) the privileges that he/she received. Then, account A(owner) revokes all privileges that he/she gave to account B. Can account C still retrieve movie data?
Account B can retrieve and insert all properties of movies and also has a privilege to grant these privileges to other users. Account B grants to account C all privileges he/she received including a permission to grant these privileges to others. Account C gives the privileges he/she received to account B again. After that, account A revokes all privileges from account B. Can account B still insert a movie data into movieXY ?

PART B. Data Warehouse

In part B and C, the goal is to expose us to data warehouse and data mining with an interesting real dataset. Have a fun!

Dataset
Let us consider a dataset about the passengers on Titanic on April 15th, 1912, the day of its sinking after collision with an iceberg. Review the original raw data file titanic.csv and a documentation file titanic.txt containing a brief description of the dataset. Refer "The 'Unusual Episode' Data Revisited" to find more statistical information.
Data Warehouse Design
A data warehouse is a relational database designed for query and analysis rather than transaction processing. In this part, we simulate data warehousing implementation with a titanic dataset. Download titanic.sql which is converted for data warehouse exercises. Review and execute it in your account.

Fact tables and dimension tables are the two types of objects commonly used in data warehouse schema. Fact tables typically contain facts and foreign keys to the dimension tables. Dimension tables contain the relatively static data in the warehouse. We assume that our Fact table has SURVIVED as the measuare data to be analyzed and 3 dimension tables, CLASS, AGE and SEX. Draw a simple entity relationship diagram based on the SQL script file. Which schema design method is used for the titanic data warehousing ?

Aggregation Querying
Aggregation is a fundamental part of data warehousing. See the following tables in which the titanic's population and the deaths are classified by age, gender, and class.
Write SQL statements to produce relevant information of the following cross-tabulation, e.g., the number of population exposed to risk by class and gender or by class and age.
(Hint)

Use Oracle aggregating operations, e.g., CUBE, ROLLUP or GROUP BY clause.
One SQL statement per each table is enough to infer all values in it.
Your query results don't need to be the same format of the reporting tables. You can compare your query results with the cross-tabulation values
For deaths per 100, truncate the value to 0 decimal place.

                      By Class and Gender
-----------------------------------------------------------------------
           Population Exposed       Number of          Deaths per 100
                to Risk               Deaths          Exposed to Risk
           ------------------------------------------------------------
Class      Male  Female  Both   Male  Female  Both   Male  Female  Both
-----------------------------------------------------------------------
First      180     145    325   118      4     122    65      2     37
Second     179     106    285   154     13     167    86     12     58
Third      510     196    706   422     106    528    82     54     74
Crew       862     23     885   670      3     673    77     13     76
-----------------------------------------------------------------------
Total      1731    470   2201   1364    126   1490    78     26     67


                       By Class and Age
-----------------------------------------------------------------------
           Population Exposed       Number of          Deaths per 100
                to Risk               Deaths          Exposed to Risk
          -------------------------------------------------------------
Class     Adult   Child  Both  Adult   Child  Both  Adult   Child  Both
-----------------------------------------------------------------------
First      319      6     325   122      0     122    38      0     37
Second     261     24     285   167      0     167    63      0     58
Third      627     79     706   476     52     528    75     65     74
Crew       885      0     885   673      0     673    76      -     76
-----------------------------------------------------------------------
Total      2092    109   2201   1438    52    1490    68     47     67

Table: Population at Risk, Deaths, and Death Rates for the Sinking of the Titanic

Indexes in Data warehouse
Bitmap indexes are widely used in data warehousing environments. Which columns in titanic fact table are suitable for using bitmap indexes? Explain.
In comparing with a traditional B-tree index, what is the advantage of Bitmap indexes on the selected columns ?
Create a Bitmap index
Create bitmap indexes on the desirable columns of the fact table(titanic) from Question 4 using the following syntax.
```
	CREATE BITMAP INDEX index_name ON table_name (column_name);
```
Query a proper Oracle dictionary and check if the indexes you created are bitmap index.
Compare the query performance of Question 3 between before and after the index creation using the following SQL*Plus comment.
```
 
	SQL>SET TIMING ON
	SQL>SELECT statement;
	SQL>SET TIMING OFF
```

PART C. Data Mining

Data mining consists of finding interesting trends or patterns in large datasets. The identified patterns can give a data analyst useful and unexpected insight can be observed rather than simple aggregation results. Let's make decision about titanic fatalities using data mining techniques.

Data Mining Tool : CLEMENTINE
We use Clementine data mining tool to identify interesting patterns in the titanic dataset.
Clementine 6.5 is installed in EE/CSci 3-170 Lab, ME 308 lab and on the four grad lab windows machines. To execute Clementine,
First, start Xwin32 - after the splash screen comes up you can hit cancel, and then you should see a blue x in the lower right hand corner of the monitor, in the system tray. As long as you see the blue X there, Xwin32 is running. And then, run Clementine.

(Note) : Clementine references: Tutorial, Simple demo
Dataset
Download titanic_detailed.csv which is the same with titanic fact table but is changed with category names for analyzing easily in the data mining tool.
Setup
1. Execute Clementine.
2. Set titanic_dm.data for data source using FILE Nodes. The fields in the data file are seperated by commas. Check to get the field name from the file.
3. View records using TABLE Nodes.
4. Before we find rules, we need to define type information for the fields. Create a TYPE node, and make sure the survice field has the direction of OUT and others have the direction of IN. Set type of all fields to AUTO SET.
Building Rules
(1) Classification Rules
We consider rules that can be represented by a tree from the titanic dataset.
1. Attach C5.0 modeling Node to the TYPE Node.
2. Make sure that Group symbol values is checked.
3. Press Execute button.
4. Browse the generated C5.0 model.
5. Submit generated decision tree with instances/confidence values.
(2) Association Rules
Next, let's find association rules from the titanic dataset.
1. Attach APRIORI Node to the TYPE Node.
2. For threshold values, give 15% to the Minimum Rule Coverage and 10% to the Minimum Rule Accuracy.
3. Press Execute button.
4. Browse the generated Aprioir model.
5. Submit generated rules with instances/confidence values.