Principles of Data Science
About this course
Welcome to DSC 10 at UC San Diego! This course aims to teach you how to draw conclusions about data. We will learn how to explore data and make predictions about data. Programming is a useful tool to help us analyze large data sets, and so we will learn how to program in Python towards this goal. We will learn some of the core techniques of data science and we will practice applying them to real data sets from a variety of different disciplines.
Prerequisites: None. This course is an introduction to data science with no prior background assumed beyond high school algebra. If you have taken both a statistics class and a programming class, you should take a more advanced course.
First, your top priority right now should be making sure you stay safe and healthy. Your second priority should be making sure your family, friends, and community are safe and healthy. I hope that together we can learn some data science too, but never at the cost of health or safety.
Second, this isn't an "online course". Those are carefully planned for a long time, and the curriculum specially designed for distance education. This is a teach-a-regular-class-remote-pandemic-response-emergency situation. It is going to be a bit rough, and we will all make mistakes. I encourage you to be generous and forgiving, but also to give clear and frequent feedback. I will try to do the same.
Third, I want to recognize that this is going to be hard for you. There are students in this class that are all over the world, that don't have access to good computers, reliable internet, and who are suddenly working in less-than-optimal conditions. There is also the stress of living during a global pandemic! Personally, I know I am not doing my best work right now because I am distracted and scared. Please reach out if there is anything I can do to help you in this class. I want you all to succeed.
Here are a few of the changes for this quarter:
- All instruction, tutoring, discussion, lab, exams, etc. will be remote, using Zoom. There will be no in-person anything.
- There will be no attendance based grading. All classes will be recorded, so you can do this class totally asynchronously (though I encourage live attendance!).
- I encourage you to take this class Pass/No Pass. This will help you focus on the learning rather than the grading, which is the important part.
Course time and location
Lecture: MWF 6:00- 6:50 pm,
CENTR 105 https://ucsd.zoom.us/j/472239761
Discussion (Optional, recommended if you are confused by the concepts covered in class): Tu 6:00- 6:50 pm,
CENTR 113 https://ucsd.zoom.us/j/498980068
Programming Basics Sections (Optional, recommended if you are new to writing code): Tu 7:00- 7:50 pm,
EBU3B B230 https://ucsd.zoom.us/j/3897844318
Office Hours: https://www.dsc10.com/staff-hours
Instructor: Colin Jemmott
You can learn more about me here: http://www.cjemmott.com/
Teaching Assistant: Arda Bati
Tutors: Kersen, Ayush, Chris, Shuli, Michelle, Jeffrey, Samson, Han, Jack
How to Ask For Help
It is totally normal to be confused, or to need some help! The best way to get answers depends on the type of question. For each, follow the steps in order.
Questions about class structure, links, grading, due dates and other administrative issues
- Look over this website and see if it has the answers.
- Search on the forum to see if your question has been asked before.
- If your question does not involve personal information you don't want to share, then make a new public post on the forum with your question.
- If your question does involve personal information, email firstname.lastname@example.org
Questions about concepts (code, math, errors, etc.)
- Consider attending / watching discussion section and programming basics sections.
- If it might help, look in the class textbook.
- If your question does not involve answers to specific HW questions, you could post on the forum as a public topic.
- Tutoring and office hours are the best place for one-on-one or specific homework questions. STAFF HOURS
Assessments and Grades
Your mastery of class material will be assessed in the following ways, and final grades will be computed as follows:
25% Homework Assignments (lowest dropped)
15% Lab Assignments (lowest dropped)
5% Project One
10% Project Two
15% Midterm Exam
30% Final Exam
Homework assignments and projects
There will be weekly homework assignments and two projects, all of which will be programming assignments to reinforce concepts from class, explore new ideas, and provide hands-on experience working with data. Each homework focuses on the material from the previous week's lectures, whereas the projects are cumulative. Otherwise, you can think of the projects as "long homeworks".
You may work on homework assignments and projects either alone or with a partner, using pair programming. If working with a partner, you should submit one assignment as a team (ask a classmate or a tutor if you are unsure how to do this).
Deadlines and Late Submissions
- Homework assignments and projects must be submitted by the 11:59pm Pacific deadline to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it's a good habit to submit early and often. To submit homework assignments and projects, you must do two things:
- Submit your code to OK by running the cell
_ = ok.submit(). Then make sure your submission was uploaded to okpy (it has been buggy lately).
- Submit a PDF of your code to Gradescope. The best way to do this from Jupyterhub is File -> Download As -> HTML, then print to PDF, and upload to Gradescope.
- Submit your code to OK by running the cell
Both of these parts must be completed by the 11:59pm deadline to be considered on time.
You have seven slip days to use at your discretion on any seven homework, lab, or project assignments throughout the quarter. Slip days allow you to turn in an assignment up to 24 hours after the deadline, subject to the following rules:
- You may use at most ONE slip day on any homework assignment or project. That is, you CANNOT get a 48 hour extension on any single assignment.
- If you are working with a partner using pair programming, you may use a slip day if both partners have a slip day remaining, and you will both be charged a slip day.
- Slip days cannot be redeemed for any value at the end of the quarter. Slip days have no monetary value.
- You do not need to ask to use your slip days. Any submission turned in after the deadline and before 24 hours after the deadline will be charged a slip day automatically.
- You will be charged a slip day even if your assignment is submitted just 1 minute after the deadline.
- Assignments submitted after the 24 hour slip day extension, or after the deadline if you are out of slip days will not receive credit.
- It is your responsibility to keep track of how many slip days you have remaining.
Weekly lab assignments are a required part of the course and will help you develop fluency in Python and working with data. The labs are designed to help you build the skills you need to complete homework assignments and projects. To submit a lab, you only need to submit your code to OK by running the cell
_ = ok.submit() . Then make sure your submission was uploaded to okpy (it has been buggy lately).
Labs and Homeworks are graded using automated tests. The difference between the two is that you are able to see the results of the tests in Labs before you turn in the assignment, while Homeworks have some tests which are hidden from you. If you see no failing tests after you complete your Lab, you should get a 100% on that assignment.
You are free to complete your lab assignments whenever is convenient for you. Tutors will be available to help you during tutor hours. Each person must submit each lab independently, but you are welcome to collaborate with any number of other students. This means that you can be physically together working with other students, but not that you can copy or share answers with other students.
Deadlines and late submissions
Lab assignments must be submitted by the 11:59pm Pacific deadline to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it's a good habit to submit early and often. Late lab assignments will not be accepted (though you can use a slip day, see above), but we will drop your lowest lab score when calculating your grade.
There is one midterm exam, taken during lecture, and one final exam. Exam format is still being determined.
We expect that students in this class will have a wide range of backgrounds and relevant experience. If you find that the class is moving fast, and especially if you are new to programming, you will benefit from taking advantage of the opportunity to attend discussion section and catch up on the material that goes by too fast. Even if you are following along well in class, discussion section allows you the opportunity to practice the skills learned in lecture and develop your expertise. Discussion sections are purely for your benefit, and do not impact your course grade.
Programming Basic Sections
One of the most useful and necessary tools for working with data is computer programming. Learning to code is an ongoing process, and in this class, we will introduce you to the basics of Python, with the aim of being able to dig into data science applications as quickly as possible. In later DSC classes, including DSC 20 and DSC 30, you will learn additional coding skills and become a more fluent programmer. You will find that becoming better at programming also makes you a better data scientist, so it is important to develop this skill.
As this class has no prerequisites, many of you are new to programming. That's OK! We know that programming can be really difficult at first, and for those of you who need support, our tutors will host sessions dedicated specifically to the basics of programming. These Programming Basics sessions will cover classic programming problems, not necessarily related to data science applications. We'll provide a worksheet of problems ahead of each Programming Basics session. To get the most of the sessions, you are encouraged to try the problems on your own before attending the session, in which tutors will help you complete the problems and answer questions.
These sessions are purely for your benefit, and attendance is not required, though you should make sure you know how to do all the problems on the worksheets if you choose not to attend. To encourage your mastery of these basic programming principles, there will be one question on each exam which is a slight variant on a question from the Programming Basics worksheets.
Diversity and inclusion
I am committed to an inclusive learning environment that respects our diversity of perspectives, experiences and identities. You, as a student in this course, are also responsible for maintaining an environment where your fellow students feel safe and respected.
In my opinion, the key to this is recognizing the inherent worth and dignity of every person. If there is a way you could feel more included please let me know via email.
Students requesting accommodations for this course due to a disability or current functional limitation must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD), which is located in University Center 202 behind Center Hall. If you have an AFA letter, please make arrangements to meet with the instructor and with the Data Science OSD Liason by the end of Week 2 to ensure that reasonable accommodations for the quarter can be arranged. The Data Science OSD Liaison can be reached at email@example.com and is located in Atkinson Hall #2010.
The goal of DSC 10 is to have you learn the material. The staff is doing everything we can to help with that - please take advantage of the resources! The goal of homework and exams is to see if you are learning the concepts, which helps us offer more help, adapt the class, etc. Cheating undermines this process.
Why is academic integrity important?
Academic integrity is an issue that is pertinent to all students on campus. When students act unethically by copying someone’s work, taking an exam for someone else, plagiarizing, etc., these students are misrepresenting their academic abilities. This makes it impossible for instructors to give grades (and for the University to give degrees) that reflect student knowledge. This devalues the worth of a UCSD degree for all students, making it imperative for the the campus as a whole to enforce that all members of this community are honest and ethical. We want your degree to be meaningful and we want you to be proud to call yourself a graduate of UCSD!
The UCSD Policy on Integrity of Scholarship and this syllabus list some of the standards by which you are expected to complete your academic work, but your good ethical judgment (or asking us for advice) is also expected as we cannot list every behavior that is unethical or not in the spirit of academic integrity. Ignorance of the rules will not excuse you from any violations.
What counts as cheating?
The key to academic integrity is accurately representing the status and authorship of your work.
In DSC 10, you can read books, surf the web, talk to your friends and the DSC 10 staff to get help understanding the concepts you need to know to complete your assignments. However, all code must be written by you, together with your partner if you choose to have one, where allowed.
The following activities are considered cheating and are not allowed in DSC 10 (This is not an exhaustive list):
- Using or submitting code acquired from other students (except your partner, where allowed), the web, or any other resource not officially sanctioned by this course
- Posting your code online, including to ask a question about your code in a class discussion forum
- Having any other student complete any part of your assignment on your behalf
- Acquiring exam questions or answers prior to taking an exam
- Completing an assignment on behalf of someone else
- Providing code, exam questions, or solutions to any other student in the course
- Using any external resource on closed-book exams
The following activities are examples of appropriate collaboration and are allowed in DSC 10:
- Discussing the general approach to solving homework problems or projects
- Talking about problem-solving strategies or issues you ran into and how you solved them
- Discussing the answers to exams with other students who have already taken the exam after the exam is complete
- Using code provided in class, by the textbook or any other assigned reading or video, with attribution
- Google searching for documentation on Python
- Working together with other students on lab assignments in the same place at the same time
- Posting a question about your approach to a problem in a class discussion forum, without sharing your code
How can I be sure that my actions are NOT considered cheating?
The best way to avoid problems is by using your best judgement and remembering to act with Honesty, Trust, Fairness, Respect, Responsibility and Courage. Here are some suggestions for completing your work.
- Don't look at or discuss the details of another student's code for an assignment you are working on, and don't let another student look at your code.
- Don't start with someone else's code and make changes to it, or in any way share code with other students.
- If you are talking to another student about an assignment, don't take notes, and wait an hour afterward before you write any code.
Note: in the discussion above, we are talking about other students that are not your pair programming partner. See the pair programming guidelines for information on working with a partner.
Students agree that by taking this course, their assignments will be submitted to third party software to help detect plagiarism.