David's Cubicle

Logo

Let's talk, chitchat, and grab coffee :)

View My GitHub Profile

This project is a Project Course referred to CS 122C from Prof. Chen Li at the University of California, Irvine. I will explain my implementation of a database system ground up from the simplest basis (Page File Mangement)… WITHOUT SHARING ANY OF MY CODE. Because it is a on-going class and for the best of student fairness, I do not want student to have any unfair advatange using my code. Please understand this is a explanatory page for people wanting to learn and understand how database system works and why it is efficient.

Last but not least, here is a overall image of how this project looks like:

structure

We will work from the bottom, disk space management, and in our case, we call it page file management.

Part 1 (Project 1)

Page File Management (PFM): PFM contains 4 functions that act with clear purpose: create, destroy, open, and close the file. Particularly, this is a single instance in the later object class that will only does these functionalities to create separation of layers for clean design and optimization.

PFM is a file management that used very similar to the OS file management except for a few twists. First of all it does not contain a pointer per se (with bunch of other information contain inside of the pointers). PFM simply has a pass-by-reference variable of a Object called FileHandler (class of FileHandle) when to open the file to manipulate the data into the disk. It contains some information about the opened file, such as number of read, write, append, and of course, the file name. It does not really have a pointer point to the file because we did not want to leave any dangling pointer (of course based on the implementation it should not be a problem if we are sophisticated). But the another reason for not having a pointer is basically saving the resource.

In the Filehandler, you have four additional functions: read a page, write a page, append a page, and get the total number of page. In context, a page is defined 4 KB which means 4096 (2^12) bytes each time you try to access a “page”. It is used from a database perspective to create a level of readability and simplicity.

This is what it looks like:

PFM:

FH:

Part 2 (Project 2)

Relation Manager: After the RBFM and PFM, we have continued to implement a wrapping layer of RBFM to handle relationship tables for databases. More specifically, we use columns-like tables to record tables’ information.

Part 3 (Project 3)

Now, it is important to have the database ready; however, it is not enough to do any query search or comparison query at current stage. Now, we have implemented a B+ tree to further integrate our project:

Our implementation design of index Entry:
Our page design choice:
a quick look over split on B+ tree (again, I will not provide any code because it is a on-going class but I am happy to share my logic and to discuss if there is any optimization or better choice toward this design):

Part 4 (Final Project):

The last part of the database management is the parser and optimizer for the database. It is used to help when the query pass in and choose whichever users ask for (such as Project, Filter, and Join) We have implement these functionalities:

A thank-you note to professor Chen Li and the TAs in this class helped us to develop such fascinating experiences and here is a link to the project website: CompSci122c, additional work for Prof. Chen Li’s Research Texera: Texera-presentation, and its GitHub page: Texera

back