Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Replace Stanford Databases coures with CMU 15-445/645: Database Systems #1111

Open
AbdesamedBendjeddou opened this issue Dec 8, 2022 · 12 comments

Comments

@AbdesamedBendjeddou
Copy link

AbdesamedBendjeddou commented Dec 8, 2022

Problem:
The suggested course might be of better quality than the current database courses listed on the curriculum and doesn't come with the limitations that Edx courses usually have.

Duration:
1 month

Background:

While I didn't take either of these courses, skimming through the content of both courses and reading some reviews, it appears that the CMU course might be superior in terms of the quality of the materials and topics that are covered. CMU course is more comprehensive and goes deeper into the subject.

Here is a detailed comparison based on @aayushsinha0706 work:
According to our curriculum guidelines on what a database course should cover:

Information Management Concepts like socio-technical systems, storage, and retrieval (IS&R) concepts, supporting human needs: searching, retrieving, linking, browsing, navigating, Quality issues: reliability, scalability, efficiency, and effectiveness.

Both these courses touch upon these topics.

CMU: 2, Stanford: 2.

Database Systems Approaches to and evolution of database systems, Components of database systems, Design of core DBMS functions (e.g., query mechanisms, transaction management, buffer management, access methods), Database architecture and data independence, Use of a declarative query language, Systems supporting structured and/or stream content.

CMU course shines here. Covering these topics is the key goal of the course.

CMU: 2, Stanford: 0.

Data Modeling Data modeling, Conceptual models (e.g., entity-relationship, UML diagrams), Spreadsheet models, Relational data models, Object-oriented models (cross-reference PL/Object-Oriented Programming), Semi-structured data models (expressed using DTD or XML Schema, for example).

The Standford course is more comprehensive on this topic, covering most of these concepts. The CMU course has a dedicated lecture on Relational Models, Relational Algebra, and refers to other models.

CMU: 1, Stanford: 2.

CMU scored five points in total while Standford scored four. In addition to that, The CMU course is project-based, which is an excellent teaching approach with an Autograder for the projects, while Stanford courses do only provide exercises. There is also a dedicated discord server for the course with many helpful resources and many past students willing to help and provide support for the learners.
The only con I can find so far is that the course readings are from a paid textbook, I don't know if they are required or not. The instructor didn't mention anything about the textbook in the intro lecture.

Proposal:

Note: this RFC initially recommended Berkeley CS186, changed to CMU course because all the materials are publicly available, an Autograder for the projects, a public discord server for the course, and to get rid of the complicatedness of the lectures and worksheets being from different iterations of the course.

Note 2: Berkely course covers an elective topic suggested by the guidelines which is not covered by the CMU course, Approaches for managing large volumes of data (e.g., noSQL database systems, use of MapReduce), it is possible for the students who want to study the topic to check these lectures and the project since it's independent of the other projects of the course, any prerequisites would be already covered by the CMU course

Alternatives:
suggest this course as an alternative

@waciumawanjohi
Copy link
Member

Note that the most recent available lectures are from the spring 2022 iteration, and the newest available problem sets are from the fall 2020 iteration.

I only see lectures from 2018: https://cs186berkeley.net/resources/ (which links to this: https://www.youtube.com/@CS186Berkeley/playlists)
And the projects seem to be from the current iteration: https://cs186.gitbook.io/project/
Were you differentiating between problem sets and projects? If so, can you link the problem sets you are referencing?

@AbdesamedBendjeddou
Copy link
Author

I only see lectures from 2018: https://cs186berkeley.net/resources/ (which links to this: https://www.youtube.com/@CS186Berkeley/playlists)

you can find them here, click on resources and then spring 2022. (You probably clicked on the resources tab from an older iteration, it keeps recursing xD).

And the projects seem to be from the current iteration: https://cs186.gitbook.io/project/
Were you differentiating between problem sets and projects? If so, can you link the problem sets you are referencing?

Yes, the problems sets are different from the projects. if you go here and check the discussion column in the calendar. you will find them under the name of worksheets.

@aayushsinha0706
Copy link
Member

aayushsinha0706 commented Dec 10, 2022

We will define links first as it is a bit confusing -

What contributor suggests

Berkeley CS 186 : Introduction to Database Systems Lectures and projects 2022 and worksheets 2020

While we have Databases: Modeling and Theory , Databases: Relational Databases and SQL, Databases: Semistructured Data these Stanford courses that OSSU suggests.

What ACM CS2013 page 112
suggests for database education

Information Management Concepts like socio-technical systems , storage and retrieval (IS&R) concepts, , Supporting human needs: searching, retrieving, linking, browsing, navigating, Quality issues: reliability, scalability, efficiency, and effectiveness.

Both these courses just touch upon these topics and dedicate like 15 mins to explain the concepts like efficient, convenient, persistent, reliability etc.

UCB : 1 , Stanford : 1

Database Systems Approaches to and evolution of database systems, Components of database systems, Design of core DBMS functions (e.g., query mechanisms, transaction management, buffer management, access methods), Database architecture and data independence, Use of a declarative query language ,Systems supporting structured and/or stream content

In this section UCB scores a point because it has dedicated lectures on Buffers , buffer management and query optimisation + SQL,DB Design: Entity-Relationship Models but also covers elective topic NoSql and MadReduce

While Stanford does not covers Buffer management but does cover Relational Design Theory.

UCB : 2 Stanford : 1

Data Modeling Data modeling, Conceptual models (e.g., entity-relationship, UML diagrams), Spreadsheet models, Relational data models, Object-oriented models (cross-reference PL/Object-Oriented Programming), Semi-structured data model (expressed using DTD or XML Schema, for example).

Here I guess Stanford scores the point as Database Modeling theory and Semistructured Data covers topics like UML Diagrams, XML Schema which UCB does not.

UCB : 2 Stanford : 2

@aayushsinha0706
Copy link
Member

The truth is we cannot create a 100% CS2013 curriculum ( unless we start to create our own material ).

Since, it's a tie i.e, one course covers one core important part and the other course covers the other core part.

It will be best to add one of Stanford Course or UCB course in extras and inform students by adding a note

For example if we go with Stanford courses , what we can do this:

Courses Duration Effort Notes Prerequisites Discussion
Databases: Modeling and Theory 2 weeks 10 hours/week Optional Recommendation: UCB CS186 covers topics like Buffer management and query optimisation which Stanford courses does not core programming chat
Databases: Relational Databases and SQL 2 weeks 10 hours/week core programming chat
Databases: Semistructured Data 2 weeks 10 hours/week core programming chat

But this is my personal opinion, since according to me this is the best solution adding both courses will just increase workload on students.

@waciumawanjohi
Copy link
Member

One thing to note: It's presumably easier to recommend the one UCB course, and then mention the Stanford mini courses that touch unaddressed topics.

@aayushsinha0706
Copy link
Member

aayushsinha0706 commented Dec 19, 2022

That as well, if we go with UCB course then

Courses Duration Effort Notes Projects Prerequisites Discussion
UCB CS186 14 weeks 10 hours/week Optional Recommendation : Databases: Modeling and Theory covers topics like Data modeling, Conceptual models and Databases: Semistructured Data cover topics like Semi-structured data model CS186 Projects core programming chat

Now the question here arises why I linked YouTube playlist version or 2018:

  1. The lectures on YouTube playlist version will be easier to navigate as compared to course website
  2. The 2018 version was earlier an edx course which they archived it later so it comes from a MOOC background
  3. Since projects are openly available and is same in all versions of course so projects will not be issue but then linking worksheet to different version of course designed by different instructor might confuse students
  4. Add on point : same set of lectures are also recommended @ teachyourselfcs.com

@AbdesamedBendjeddou
Copy link
Author

AbdesamedBendjeddou commented Dec 23, 2022

I've been checking this course offered by CMU. It covers the same topics as the Berkely course except that the course creators consider non-CMU students, so all the materials are available online and even have public Gradescope to submit the projects to check if it passes the tests. There is also a dedicated discord server for the course.
Overall, there is better support around the course for self-learners and identical materials to the Berkeley course. However, I don't know how this RFC should proceed. @waciumawanjohi, what do you think?

@waciumawanjohi
Copy link
Member

However, I don't know how this RFC should proceed.

When users identify resources that exist, that can be useful to the community. But that isn't what changes the curriculum; what changes the curriculum is when a contributor (or contributors) make the case that a new resource is better for learners than the existing resource.

You can read the analysis that Aayush did above. That's the sort of digging into a course (what does it cover, what does our curricular guides say it covers) that is critical for changing a course. You've also done that sort of analysis: pointing out that the feedback on one course (an available autograder) is better than the feedback on another is highly valuable.

So what happens with this RFC? The RFC currently recommends replacing Stanford's Database course with Berkley's. If you think that instead CMU's should be the replacement, make that case! If you only feel capable of analyzing aspect A, B and C of the course, but know that someone else needs to analyze X, Y and Z, say that.

@AbdesamedBendjeddou AbdesamedBendjeddou changed the title RFC: Replace Stanford Databases coures with Berkeley CS186: Introduction to Database Systems RFC: Replace Stanford Databases coures with CMU 15-445/645: Database Systems Dec 24, 2022
@AbdesamedBendjeddou
Copy link
Author

I have updated the RFC to reflect the new recommendation

@Choubs01
Copy link
Contributor

I don't think it should be added. Certain people don't want the hassle of having Linux on their system, and also it needs C++ which hasn't been taught in the curriculum (although I'm sure it'd be easy to acquire) (this is an assumption on my part though. Perhaps the security courses shy of the first one or OSTEP or the networking book teach it, although I think that's unlikely).

@hkakutalua
Copy link
Contributor

I don't think it should be added. Certain people don't want the hassle of having Linux on their system, and also it needs C++ which hasn't been taught in the curriculum (although I'm sure it'd be easy to acquire) (this is an assumption on my part though. Perhaps the security courses shy of the first one or OSTEP or the networking book teach it, although I think that's unlikely).

Learning Core Programming should provide the basis to also learn C++ and be successful in this course, no?

@waciumawanjohi
Copy link
Member

  1. My apologies for taking so long to respond to the RFC after the update.
  2. An unmentioned advantage of the CMU course is that it is being updated each semester.
  3. I suspect this course would benefit from a course page. A good course page should mitigate the only objections to this new course. Some of the information to include:
    1. If the current semester is in session and a student gets to the end of the available lectures, point out that archived versions of previous semesters exist.
    2. Point to guides on setting up a linux virtual machine to do the coursework.
    3. Point to C++ learning materials. The CMU class FAQ has a recommended resource. Hackr.io has community recommendations.

@AbdesamedBendjeddou (or another contributor) can you open an PR implementing those changes, switching the database course to CMU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants