RFC: Replace Stanford Databases coures with CMU 15-445/645: Database Systems
Problem: The suggested course might be of better quality than the current database courses listed on the curriculum and doesn't come with the limitations that Edx courses usually have.
Duration: 1 month
Background:
While I didn't take either of these courses, skimming through the content of both courses and reading some reviews, it appears that the CMU course might be superior in terms of the quality of the materials and topics that are covered. CMU course is more comprehensive and goes deeper into the subject.
Here is a detailed comparison based on @aayushsinha0706 work: According to our curriculum guidelines on what a database course should cover:
Information Management Concepts like socio-technical systems, storage, and retrieval (IS&R) concepts, supporting human needs: searching, retrieving, linking, browsing, navigating, Quality issues: reliability, scalability, efficiency, and effectiveness.
Both these courses touch upon these topics.
CMU: 2, Stanford: 2.
Database Systems Approaches to and evolution of database systems, Components of database systems, Design of core DBMS functions (e.g., query mechanisms, transaction management, buffer management, access methods), Database architecture and data independence, Use of a declarative query language, Systems supporting structured and/or stream content.
CMU course shines here. Covering these topics is the key goal of the course.
CMU: 2, Stanford: 0.
Data Modeling Data modeling, Conceptual models (e.g., entity-relationship, UML diagrams), Spreadsheet models, Relational data models, Object-oriented models (cross-reference PL/Object-Oriented Programming), Semi-structured data models (expressed using DTD or XML Schema, for example).
The Standford course is more comprehensive on this topic, covering most of these concepts. The CMU course has a dedicated lecture on Relational Models, Relational Algebra, and refers to other models.
CMU: 1, Stanford: 2.
CMU scored five points in total while Standford scored four. In addition to that, The CMU course is project-based, which is an excellent teaching approach with an Autograder for the projects, while Stanford courses do only provide exercises. There is also a dedicated discord server for the course with many helpful resources and many past students willing to help and provide support for the learners. The only con I can find so far is that the course readings are from a paid textbook, I don't know if they are required or not. The instructor didn't mention anything about the textbook in the intro lecture.
Proposal:
- Replace the current three listed databases courses with CMU 15-445/645: Database Systems.
Note: this RFC initially recommended Berkeley CS186, changed to CMU course because all the materials are publicly available, an Autograder for the projects, a public discord server for the course, and to get rid of the complicatedness of the lectures and worksheets being from different iterations of the course.
Note 2: Berkely course covers an elective topic suggested by the guidelines which is not covered by the CMU course, Approaches for managing large volumes of data (e.g., noSQL database systems, use of MapReduce), it is possible for the students who want to study the topic to check these lectures and the project since it's independent of the other projects of the course, any prerequisites would be already covered by the CMU course
Alternatives: suggest this course as an alternative
Note that the most recent available lectures are from the spring 2022 iteration, and the newest available problem sets are from the fall 2020 iteration.
I only see lectures from 2018: https://cs186berkeley.net/resources/ (which links to this: https://www.youtube.com/@CS186Berkeley/playlists) And the projects seem to be from the current iteration: https://cs186.gitbook.io/project/ Were you differentiating between problem sets and projects? If so, can you link the problem sets you are referencing?
I only see lectures from 2018: https://cs186berkeley.net/resources/ (which links to this: https://www.youtube.com/@CS186Berkeley/playlists)
you can find them here, click on resources and then spring 2022. (You probably clicked on the resources tab from an older iteration, it keeps recursing xD).
And the projects seem to be from the current iteration: https://cs186.gitbook.io/project/ Were you differentiating between problem sets and projects? If so, can you link the problem sets you are referencing?
Yes, the problems sets are different from the projects. if you go here and check the discussion column in the calendar. you will find them under the name of worksheets.
We will define links first as it is a bit confusing -
What contributor suggests
Berkeley CS 186 : Introduction to Database Systems Lectures and projects 2022 and worksheets 2020
While we have Databases: Modeling and Theory , Databases: Relational Databases and SQL, Databases: Semistructured Data these Stanford courses that OSSU suggests.
What ACM CS2013 page 112 suggests for database education
Information Management Concepts like socio-technical systems , storage and retrieval (IS&R) concepts, , Supporting human needs: searching, retrieving, linking, browsing, navigating, Quality issues: reliability, scalability, efficiency, and effectiveness.
Both these courses just touch upon these topics and dedicate like 15 mins to explain the concepts like efficient, convenient, persistent, reliability etc.
UCB : 1 , Stanford : 1
Database Systems Approaches to and evolution of database systems, Components of database systems, Design of core DBMS functions (e.g., query mechanisms, transaction management, buffer management, access methods), Database architecture and data independence, Use of a declarative query language ,Systems supporting structured and/or stream content
In this section UCB scores a point because it has dedicated lectures on Buffers , buffer management and query optimisation + SQL,DB Design: Entity-Relationship Models but also covers elective topic NoSql and MadReduce
While Stanford does not covers Buffer management but does cover Relational Design Theory.
UCB : 2 Stanford : 1
Data Modeling Data modeling, Conceptual models (e.g., entity-relationship, UML diagrams), Spreadsheet models, Relational data models, Object-oriented models (cross-reference PL/Object-Oriented Programming), Semi-structured data model (expressed using DTD or XML Schema, for example).
Here I guess Stanford scores the point as Database Modeling theory and Semistructured Data covers topics like UML Diagrams, XML Schema which UCB does not.
UCB : 2 Stanford : 2
The truth is we cannot create a 100% CS2013 curriculum ( unless we start to create our own material ).
Since, it's a tie i.e, one course covers one core important part and the other course covers the other core part.
It will be best to add one of Stanford Course or UCB course in extras and inform students by adding a note
For example if we go with Stanford courses , what we can do this:
| Courses | Duration | Effort | Notes | Prerequisites | Discussion |
|---|---|---|---|---|---|
| Databases: Modeling and Theory | 2 weeks | 10 hours/week | Optional Recommendation: UCB CS186 covers topics like Buffer management and query optimisation which Stanford courses does not | core programming | chat |
| Databases: Relational Databases and SQL | 2 weeks | 10 hours/week | core programming | chat | |
| Databases: Semistructured Data | 2 weeks | 10 hours/week | core programming | chat |
But this is my personal opinion, since according to me this is the best solution adding both courses will just increase workload on students.
One thing to note: It's presumably easier to recommend the one UCB course, and then mention the Stanford mini courses that touch unaddressed topics.
That as well, if we go with UCB course then
| Courses | Duration | Effort | Notes | Projects | Prerequisites | Discussion |
|---|---|---|---|---|---|---|
| UCB CS186 | 14 weeks | 10 hours/week | Optional Recommendation : Databases: Modeling and Theory covers topics like Data modeling, Conceptual models and Databases: Semistructured Data cover topics like Semi-structured data model | CS186 Projects | core programming | chat |
Now the question here arises why I linked YouTube playlist version or 2018:
- The lectures on YouTube playlist version will be easier to navigate as compared to course website
- The 2018 version was earlier an edx course which they archived it later so it comes from a MOOC background
- Since projects are openly available and is same in all versions of course so projects will not be issue but then linking worksheet to different version of course designed by different instructor might confuse students
- Add on point : same set of lectures are also recommended @ teachyourselfcs.com
I've been checking this course offered by CMU. It covers the same topics as the Berkely course except that the course creators consider non-CMU students, so all the materials are available online and even have public Gradescope to submit the projects to check if it passes the tests. There is also a dedicated discord server for the course. Overall, there is better support around the course for self-learners and identical materials to the Berkeley course. However, I don't know how this RFC should proceed. @waciumawanjohi, what do you think?
However, I don't know how this RFC should proceed.
When users identify resources that exist, that can be useful to the community. But that isn't what changes the curriculum; what changes the curriculum is when a contributor (or contributors) make the case that a new resource is better for learners than the existing resource.
You can read the analysis that Aayush did above. That's the sort of digging into a course (what does it cover, what does our curricular guides say it covers) that is critical for changing a course. You've also done that sort of analysis: pointing out that the feedback on one course (an available autograder) is better than the feedback on another is highly valuable.
So what happens with this RFC? The RFC currently recommends replacing Stanford's Database course with Berkley's. If you think that instead CMU's should be the replacement, make that case! If you only feel capable of analyzing aspect A, B and C of the course, but know that someone else needs to analyze X, Y and Z, say that.
I have updated the RFC to reflect the new recommendation
I don't think it should be added. Certain people don't want the hassle of having Linux on their system, and also it needs C++ which hasn't been taught in the curriculum (although I'm sure it'd be easy to acquire) (this is an assumption on my part though. Perhaps the security courses shy of the first one or OSTEP or the networking book teach it, although I think that's unlikely).
I don't think it should be added. Certain people don't want the hassle of having Linux on their system, and also it needs C++ which hasn't been taught in the curriculum (although I'm sure it'd be easy to acquire) (this is an assumption on my part though. Perhaps the security courses shy of the first one or OSTEP or the networking book teach it, although I think that's unlikely).
Learning Core Programming should provide the basis to also learn C++ and be successful in this course, no?
- My apologies for taking so long to respond to the RFC after the update.
- An unmentioned advantage of the CMU course is that it is being updated each semester.
- I suspect this course would benefit from a course page. A good course page should mitigate the only objections to this new course. Some of the information to include:
- If the current semester is in session and a student gets to the end of the available lectures, point out that archived versions of previous semesters exist.
- Point to guides on setting up a linux virtual machine to do the coursework.
- Point to C++ learning materials. The CMU class FAQ has a recommended resource. Hackr.io has community recommendations.
@AbdesamedBendjeddou (or another contributor) can you open an PR implementing those changes, switching the database course to CMU?