Data Management and Data Systems

Introduction to the use, design, and implementation of database and data-intensive systems.

Course Overview


This course covers how to use databases in applications, first principles on how to scale for large data sets and how to design good data systems.

A few key topics:

— Introduction to relational data model, relational database engines, and SQL.

— How to scale systems for large data sets on servers and server clusters

— How to design good schemas based on dependencies, normal forms so we build and evolve good applications. This will include indexes, views and transactions.

The class will culminate in a hands-on programming project in SQL+Python — a key part of the course — where you will query, visualize and predict from terabytes of data on BigQuery, a popular cloud database part of Google Cloud Platform.

Key Dates


Lectures: Tues/Thurs
4:30 PM - 5:50 PM
NVIDIA Auditorium
Exam#1: Tuesday, November 1
Class Timings (4:30 - 5:50 PM)
Exam#2: Monday, December 12
7-10 PM

Schedule

Event Date Description Course Materials
Lecture 1 9/27 Tu Why Databases?
Concepts: Data models, DB systems overview
[Introduction: Why databases?]
[Project outline]
[Systems Primer]
Reading List:
[Why cs145?]
Lecture 2 9/29 Th SQL I
Concepts: Schemas, Systems, Select-From-Where
[Example SQL]
[SQL - Part I]
Project 1
Released
10/3 Mon See Course Info for general submission information and the regrade policy. [Project 1 Handout]
[project1_submission.py]
Lecture 3 10/4 Tu SQL II
Concepts: Joins, Set operators, Subqueries
[SQL Deep Dive]
Homework 1
Released
10/4 Tu [CS145 Fall 2022 Homework 1]
Lecture 4 10/6 Th SQL III, Advanced
Concepts: Grouping, Aggregations, Nested queries
[SQL Deep Dive (Same Slides as Previous Lecture)]
Section 1 10/7 Fri 9:30 AM — 10:20 AM [Section 1 slides]
Lecture 5 10/11 Tu Scale: Indexing and IO Model
[Scale Slides]
Lecture 6 10/13 Th Sorting, Building Indices Part 1
[Sorting Slides] [Indexing Slides]
Project 1 Due 10/14 Fri
Lecture 7 10/18 Tu B+ Trees
Query Optimization Part 1
[B+ Trees Slides]
Homework 1 Due 10/18 Tu
Homework 2
Released
10/19 Wed [CS145 Fall 2022 Homework 2]
Project 2
Released
10/19 Wed See Course Info for general submission information and the regrade policy. [Project 2 Handout]
[Project 2 colab]
Lecture 8 10/20 Th Query Optimization Part 2 [Query Optimization Slides]
Section 2 10/21 Fri 9:30 AM — 10:20 AM [Section 2 slides]
[Section 2 recording]
Lecture 9 10/25 Tu Systems Design: Putting it all together [Systems Design Slides]
Homework 2 Due 10/26 Wed
Lecture 10 10/27 Th Exam Review [Exam Review Slides]
Exam #1 11/01 Tu
Lecture 11 11/03 Th Transactions [Transactions Slides]
No Class
Democracy Day
11/08 Tu
Project 3
Released
11/08 Tue See Course Info for general submission information and the regrade policy. [Project 3 Handout]
[High level rubric]
[ML warmup colab]
[Project template colab]
Project 2 Due 11/09 Wed
Homework 3
Released
11/10 Th [CS145 Fall 2022 Homework 3]
Lecture 12 11/10 Th Transactions [Transactions Slides]
Lecture 13 11/15 Tu Guest Lectures:
1. Girish Baliga (Uber) on Data Analytics in Practice
2. Rishi Bhargava (Descope) on Data Security, Privacy, and Ransomware
Project 3
Proposal Due
11/16 Wed
Lecture 14 11/17 Th Transactions Locking [Transactions Locking Slides]
Section 3 11/18 Fri 9:30 AM — 10:20 AM [Section 3 slides]
Lecture 15 11/29 Tu Big schemas and Design Theory [Big schemas]
[Design theory]
[Case Studies]
Homework 3 Due 11/29 Tue
Homework 4
Released
11/30 Wed [CS145 Fall 2022 Homework 4]
Lecture 16 12/01 Th Big Schemas and Design Theory [Design Theory II]
Section 4 12/05 Mon 9:30 AM — 10:20 AM [Section 4 slides]
Lecture 17 12/06 Tu Design Theory Continued
Lecture 18 12/8 Th Exam Review [Exam Review]
Project 3 Due 12/8 Th
Homework 4 Due 12/9 Fri

Course Logistics and Policies


Prerequisites CS 103 and CS 107 (or equivalent)

Grading Projects: 50% (10 + 15 + 25), Exam #1: 15%, Exam #2: 25%, Homework: 10%.

For students taking the course on a credit/no-credit basis: you need to score equivalent to at least a C grade to pass the course. We cannot provide the exact score threshold since the course is curved at the end of the quarter.

We will be offering extra credit for in class participation.

Ed and Gradescope: please access course Ed and Gradescope on the canvas tab.

Homeworks There will be 4 biweekly homework assignments, worth 10% of your final grade, that accompany the material being taught in class. They will be graded on completion basis — meaning that you will receive full credit as long as you submit the assignment on time and receive a grade above 70%. You will submit your homeork through Gradescope. No late days can be used on homework.
The homework assignments reflect the exam material, so it is in your best interest to complete them thoroughly. Aside from preparing you for the exam, they will assess and reinforce your understanding of the material.

Sections There will be 4 optional discussion sections that will accompany each homework assignment. The sections will be recorded and uploaded to Canvas. The slides will be posted online.

Exam Dates
  • Exam #1: Tuesday, November 1st
  • Exam #2: Monday, December 12th

Conflict in exams or course schedule Due to the large course enrollment number, we won’t be able to accommodate alternate exam schedules for those who have exam conflicts (both midterm and final). Please make sure you do not have a conflict in exam schedules when enrolling in CS 145.

Late Days You are allowed a total of two late days shared between all project deadlines. You do not lose any credit when using a late day. If you run out of late days and submit after the deadline, you receive a 0. (Late days can only be applied for projects.)

Lectures Lectures occur on Tues/Thurs 4:30-5:50 p.m. in NVIDIA Auditorium. NOTE that while attendance is not mandatory, we will be giving out extra credit for students with insightful in-class participation.

Lecture Videos Lecture videos will be recorded and posted on Canvas.

Textbook There is no required textbook, but for students who want additional resources, we recommend the following two:
  • Database Systems 2nd Edition by Garcia-Molina
  • First Course in Database Systems 3rd Edition by Ullman

Accomodations If you need an academic accommodation based on a disability, you should initiate the request with the Office of Accessible Education (OAE). The OAE will evaluate the request, recommend accommodations, and prepare a letter for faculty. Students should contact the OAE as soon as possible and at any rate in advance of assignment deadlines, since timely notice is needed to coordinate accommodations. If you need OAE accommodations for exams, please notify us at least 7 days (ONE week) prior to the exams. You can send your OAE letters to us by private ed post or sending to our staff mailing list: cs145-aut2223-staff@lists.stanford.edu .

Honor Code/Collaboration Policy


Students must adhere to The Stanford Honor Code and The Stanford Honor Code as it pertains to CS courses.

We encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solution independently, and without referring to written notes from the joint session.

It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo.

The teaching staff will be using plagiarism detection software and if we have reason to believe that you are in violation of the honor code, we will follow the university policy to report it.

Projects


Group Size The first two projects are individual only, but the third project you are allowed to work in teams of two.

Project Submissions You will submit your projects via Gradescope. You should have been automatically signed up for gradescope. Each assignment will include specific instructions regarding what files to submit.

Regrade Policy If you think that we've made a grading mistake or that the work you submitted should be regraded, submit a regrade request on Gradescope within one week of receiving your grade. Be sure that you prepare a short and convincing argument on Gradescope about why you think your work was incorrectly graded – we reserve the right to ignore your regrade request if you don't provide a justification. If you submit a regrade request, we reserve the right to regrade your entire assignment. This means that your overall score could go down.

Staff


For any enquiries, personal matters, or emergencies, please email the staff at cs145-aut2223-staff@lists.stanford.edu .

Office Hours


Office hours will be held either online via Zoom or in-person in Huang basement. You can sign up via QueueStatus. Please find the zoom links for the different CAs on Canvas.