pRINCIPLES OF DATA SCIENCE

SYLLABUS

ABOUT THIS COURSE

Welcome to DSC 10 at UC San Diego! This course aims to teach you how to draw conclusions about data. We will learn how to explore data and make predictions about data. Programming is a useful tool to help us analyze large data sets, and so we will learn how to program in Python towards this goal. We will learn some of the core techniques of data science and we will practice applying them to real data sets from a variety of different disciplines.

Prerequisites: None. This course is an introduction to data science with no prior background assumed beyond high school algebra. If you have taken both a statistics class and a programming class, you should take a more advanced course.

COURSE TIME & LOCATION

Lecture:

A00: Tuesday/Thursday, 2-3:20pm, Center 212

B00: Tuesday/Thursday, 3:30-4:50pm, Center 212

Discussion:

A01: Wednesday, 2-2:50pm, Center 216

B01: Wednesday, 3-3:50pm, Center 216

Feel free to attend a discussion section different than the one in which you are enrolled, subject to the availability of seats.

Programming Basics Sessions:

A50: Thursday, 5-5:50pm, CSE B230

B50: Thursday, 6-6:50pm, CSE B230

Feel free to attend a Programming Basics session different than the one in which you are enrolled, subject to the availability of seats.

Midterm Exam: Tuesday, October 29, during lecture

Final Exam: Saturday, December 7, 8-11am, location to be determined


INSTRUCTIONAL STAFF

Instructor: Janine Tiefenbruck

Teaching Assistant: Kyle Vigil

Tutors: Joshua Chan, Michelle Chang, Jiahe Feng, Kaixin Huang, Judy Jin, Jesse Kim, Yang Li, Siddhi Patel, Amanda Shu, Yueting Wu, Dian Yu

Contact Information: For questions about the content of the course, please use Piazza.

Office Hours: See Calendar.

ASSESSMENTS AND GRADES

Your mastery of class material will be assessed in the following ways, and final grades will be computed as follows:

5% Class Participation

25% Homework Assignments (best 6 out of 7)

15% Lab Assignments (best 7 out of 8)

5% Project One

10% Project Two

10% Midterm Exam

30% Final Exam

CLASS PARTICIPATION

During class, you will participate by answering questions using an iClicker remote. You will need an iClicker version 2 remote, available at the UCSD bookstore (you will not be able to use iterWrite, HITT brand, the smartphone clicker app, or anything other than the genuine iClicker2 remote).

Once you purchase an iClicker remote, you must register it online in Canvas in order to get credit for your responses. You only need to do this once at the beginning of the quarter.

Participation scores will be posted periodically to Canvas. You must resolve all iClicker registration issues by the end of Week 1. Failure to ensure that you are getting your participation credit before then will result in a 0 for the days that you have not received credit.

Participation points will be recorded starting on the second day of class. You will receive credit for the day if you answer at least half of the questions asked that day. The correctness of your responses does not affect your participation score. Forgetting your clicker counts as missing a class.

Participation points may not be made up, but you can miss up to four classes with no penalty. So if there are X class sessions, each with one participation point, your participation score is out of X-4 points.

HOMEWORK ASSIGNMENTS AND PROJECTS

There will be weekly homework assignments and two projects, all of which will be programming assignments to reinforce concepts from class, explore new ideas, and provide hands-on experience working with data . You may work on homework assignments and projects either alone or with a partner, using pair programming. If working with a partner, only one of you should submit the assignment.

You are encouraged to do your programming assignments in CSE B230, which is in the basement of the CSE building. The lab will be staffed with tutors who are there to help you during scheduled hours. Make sure to check the calendar before coming to the lab. However, if the lab is full, you may work in any of B220-B260. If you need tutor help, just put your name into the queue (using Autograder) and a tutor will help you as soon as they are available.

Deadlines and Late Submissions:

Homework assignments and projects must be submitted by the 11:59pm deadline to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it's a good habit to submit early and often. To submit homework assignments and projects, you must do two things:

      1. Submit your code to OK by running the cell _ = ok.submit().
      2. Submit a PDF of your code to Gradescope. The best way to do this from Jupyterhub is File -> Download As -> HTML, then print to PDF, and upload to Gradescope.

Both of these parts must be completed by the 11:59pm deadline to be considered on time. Late homework assignments will not be accepted, but we will drop your lowest homework score when calculating your grade. Projects can be turned in one day late for a ten percent deduction.

LAB ASSIGNMENTS

Weekly lab assignments are a required part of the course and will help you develop fluency in Python and working with data. The labs are designed to help you build the skills you need to complete homework assignments and projects. To submit a lab, you only need to submit your code to OK by running the cell _ = ok.submit() .

You are free to complete your lab assignments whenever is convenient for you. Tutors will be available to help you in CSE B230 during tutor hours. Each person must submit each lab independently, but you are welcome to collaborate with any number of other students. This means that you can be physically together working with other students, but not that you can copy or share answers with other students.

Deadlines and Late Submissions:

Lab assignments must be submitted by the 11:59pm deadline to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it's a good habit to submit early and often. Late lab assignments will not be accepted, but we will drop your lowest lab score when calculating your grade.

EXAMS

There is one midterm exam, taken during lecture, and one final exam. Exams are on paper, and they are closed book and closed notes, but you will be provided with a reference sheet for each exam. Exams must be taken during the scheduled time, and you must attend the lecture section in which you are enrolled.

Exams will test the content of the course, plus there will be one question on each exam which is a slight variant on a question from the Programming Basics worksheets that will be done in the Thursday sessions. The final exam will be cumulative, with extra emphasis on the material after the midterm.

If you have a conflicting final exam (scheduled at the exact same time), or if you have three or more exams in one day, please see your instructor by the end of Week 1 to see if alternate arrangements can be made. Outside of this, no makeup exams will be given.

DISCUSSION SECTIONS

We expect that students in this class will have a wide range of backgrounds and relevant experience. If you find that the class is moving fast, you will benefit from taking advantage of the opportunity to attend discussion section and review the material from lecture. Even if you are following along well in class, discussion section allows you the opportunity to practice the skills learned in lecture and develop your expertise. Discussion sections are purely for your benefit, and do not directly impact your course grade.

PROGRAMMING BASICS SESSIONS

One of the most useful and necessary tools for working with data is computer programming. Learning to code is an ongoing process, and in this class, we will introduce you to the basics of Python, with the aim of being able to dig into data science applications as quickly as possible. In later DSC classes, including DSC 20 and DSC 30, you will learn additional coding skills and become a more fluent programmer. You will find that becoming better at programming also makes you a better data scientist, so it is important to develop this skill.

As this class has no prerequisites, many of you may be brand new to programming. Programming can be really difficult at first, and for those of you who need support, our tutors will host sessions dedicated specifically to the basics of programming. These Programming Basics sessions will cover classic programming problems, not necessarily related to data science applications. We'll provide a worksheet of problems ahead of each Programming Basics session. To get the most of the sessions, you are encouraged to try the problems on your own before attending the session, in which tutors will help you complete the problems and answer questions.

These sessions are purely for your benefit, and attendance is not required, though you should make sure you know how to do all the problems on the worksheets if you choose not to attend. These in-person sessions are the only way to get help from the instructional staff on these problems. We will not answer questions about the Programming Basics worksheets in office hours or in our online discussion forum, so do attend these dedicated sessions if you feel you need help completing the worksheets.

To encourage your mastery of these basic programming principles, there will be one question on each exam which is a slight variant on a question from the Programming Basics worksheets.

GRADING POLICIES

  • You must score at least 55% on the final exam to pass the course. If you score lower than 55% on the final, you will receive an F in the course, regardless of your overall average.
  • You have one week from the time an assignment or exam is graded to request a regrade. After that, the grade is set in stone.
  • I will use a standard scale for assigning letter grades: 90-100 = some kind of A; 80-89.9= some kind of B, 70-79.9= some kind of C, 60-69.9=D, <60=F. Plus and minus cutoffs will be determined at the instructor’s discretion.

DIVERSITY AND INCLUSION

I am committed to an inclusive learning environment that respects our diversity of perspectives, experiences, and identities. My goal is to create a diverse and inclusive learning environment where all students feel comfortable and can thrive. If you have any suggestions as to how I could create a more inclusive setting, please let me know. We also expect that you, as a student in this course, will honor and respect your classmates, abiding by the UCSD Principles of Community Please understand that others’ backgrounds, perspectives and experiences may be different than your own, and help us to build an environment where everyone is respected and feels comfortable.

ACCOMMODATIONS

Students requesting accommodations for this course due to a disability or current functional limitation must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD), which is located in University Center 202 behind Center Hall. If you have an AFA letter, please make arrangements to meet with the instructor and with the Data Science OSD Liason by the end of Week 2 to ensure that reasonable accommodations for the quarter can be arranged. The Data Science OSD Liaison can be reached at dscstudent@ucsd.edu and is located in Atkinson Hall #2010.

COLLABORATION POLICY AND ACADEMIC INTEGRITY

The basic rule for DSC 10 is: Work hard. Make use of the expertise of the staff to learn what you need to know to really do well in the course. Act with integrity, and don't cheat.

If you do cheat, we will enforce the UCSD Policy on Integrity of Scholarship. This means: You will fail the course, no matter how small the affected assignment, and the Dean of your college will put you on probation or suspend or dismiss you from UCSD.

Students agree that by taking this course, their assignments will be submitted to third party software to help detect plagiarism.

Why is academic integrity important?

Academic integrity is an issue that is pertinent to all students on campus. When students act unethically by copying someone’s work, taking an exam for someone else, plagiarizing, etc., these students are misrepresenting their academic abilities. This makes it impossible for instructors to give grades (and for the University to give degrees) that reflect student knowledge. This devalues the worth of a UCSD degree for all students, making it imperative for the the campus as a whole to enforce that all members of this community are honest and ethical. We want your degree to be meaningful and we want you to be proud to call yourself a graduate of UCSD!

The UCSD Policy on Integrity of Scholarship and this syllabus list some of the standards by which you are expected to complete your academic work, but your good ethical judgment (or asking us for advice) is also expected as we cannot list every behavior that is unethical or not in the spirit of academic integrity. Ignorance of the rules will not excuse you from any violations.

What counts as cheating?

In DSC 10, you can read books, surf the web, talk to your friends and the DSC 10 staff to get help understanding the concepts you need to know to complete your assignments. However, all code must be written by you, together with your partner if you choose to have one, where allowed.

The following activities are considered cheating and are not allowed in DSC 10 (This is not an exhaustive list):

  • Using or submitting code acquired from other students (except your partner, where allowed), the web, or any other resource not officially sanctioned by this course
  • Posting your code online, including to ask a question about your code in a class discussion forum
  • Having any other student complete any part of your assignment on your behalf
  • Acquiring exam questions or answers prior to taking an exam
  • Completing an assignment on behalf of someone else
  • Using someone else's clicker for them to earn them credit or giving your clicker to someone else so that they can participate for you to earn credit
  • Providing code, exam questions, or solutions to any other student in the course
  • Using any external resource on closed-book exams

The following activities are examples of appropriate collaboration and are allowed in DSC 10:

  • Discussing the general approach to solving homework problems or projects
  • Talking about problem-solving strategies or issues you ran into and how you solved them
  • Discussing the answers to exams with other students who have already taken the exam after the exam is complete
  • Using code provided in class, by the textbook or any other assigned reading or video, with attribution
  • Google searching for documentation on Python
  • Working together with other students on lab assignments in the same place at the same time
  • Posting a question about your approach to a problem in a class discussion forum, without sharing your code

How can I be sure that my actions are NOT considered cheating?

The best way to avoid problems is by using your best judgement and remembering to act with Honesty, Trust, Fairness, Respect, Responsibility and Courage. Here are some suggestions for completing your work:

  • Don't look at or discuss the details of another student's code for an assignment you are working on, and don't let another student look at your code.
  • Don't start with someone else's code and make changes to it, or in any way share code with other students.
  • If you are talking to another student about an assignment, don't take notes, and wait an hour afterward before you write any code.

Note: in the discussion above, we are talking about other students that are not your pair programming partner. See the pair programming guidelines for information on working with a partner.