-
Notifications
You must be signed in to change notification settings - Fork 0
/
project-description.qmd
432 lines (259 loc) · 21.1 KB
/
project-description.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
---
title: "Project description"
---
## Timeline
| Topic | Deadline | GitHub folder & hints |
| --- | --- | --- |
|[Topic ideas](@sec-topic-ideas) | Fri, Oct 28 | references/ |
|[Project proposal](@sec-project-proposal) | Fri, 2 Dec | references/ |
|[Project draft](@sec-draft-analysis) | Fri, Dec 9 | notebooks/ |
|[Peer review](@sec-peer-review-draft) | Fri, Dec 16 | use [](issue-template)|
|[Report](@sec-report) | Fr, Feb 3 | reports/ |
|[Slides](@sec-slides-video) | Fri, Feb 10 | reports/ |
|[GitHub repo](@sec-github-repo) | Fri, Feb 10 | "Repo in final state" |
|[Comments](@sec-comments) | We, Feb 15 | "Comment in Moodle" |
|[Survey](@sec-team-eval ) | Fri,Feb 17 | "Fill out survey" |
## Introduction
**TL;DR**: *Pick a data set and use the concepts and methods covered in our course to effectively analyze and communicate your findings. That is your final project.*
The goal of the final project is for you to analyze a data set of your own choosing and effectively communicate the results (using the concepts and technology we covered in this course).
The data set may already exist or you may collect your own data.
Choose the data based on your group's interests or work you all have done in other courses or research projects.
The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a data set to analyze and communicate it in a meaningful way.
All analyses must be done in Python, and all components of the project **must be reproducible** (with the exception of the final presentation) placed inside the provided GitHub repo.
## Logistics
You will work on the project with your team. Use the provided Github-repo and template files for your project.
The primary deliverables for the final project are
- A draft analysis ("draft-analysis.ipynb")
- A written, reproducible report detailing your analysis ("report.ipynb")
- A GitHub repository corresponding to your report
- Slides + a video presentation
- Formal peer review on another team's project
## Topic ideas {#sec-topic-ideas}
Identify 2 data sets you're interested in potentially using for the final project. The data sets should be suitable to be used alongside the topics described in the ["Big Idea Worksheet"](https://docs.google.com/document/d/1-GZvhdbhLYLB_Bo1arj1rgTqbJ5SUoU21vtgbYEhVqk/edit?usp=sharing).
If you're unsure where to find data, you can use the list of potential data sources in the [Tips + Resources](project-tips-resources.qmd) section as a starting point.
It may also help to think of topics you're interested in investigating and find data sets on those topics.
The purpose of submitting project ideas is to give you time to find data for the project and to make sure you have a data set that can help you be successful in the project.
**Therefore, you must use one of the data sets submitted as a topic idea, unless otherwise notified.**
The data sets should meet the following criteria:
- At least 100 observations
- At least 5 columns
- A mix of quantitative and categorical variables
- You may not use data that has previously been used in any course materials, or any derivation of data that has been used in course materials.
**Please ask me if you're unsure whether your data set meets the criteria.**
### Submission
You need to:
1. Fill out a first draft version of the ["Big Idea Worksheet"](https://docs.google.com/document/d/1-GZvhdbhLYLB_Bo1arj1rgTqbJ5SUoU21vtgbYEhVqk/edit?usp=sharing) (as much as possible) for every dataset and save the PDF in your GitHub project environment inside the folder `references` as `big-idea-draft-data1.pdf` and `big-idea-draft-data2.pdf`
2. Fill out the file `topics-ideas.ipynb` in your project environment (see path: `references`) and save it as HTML file.
Submit the 3 files to Moodle.
## Project proposal {#sec-project-proposal}
The purpose of the project proposal is to help you think about your communication strategy early.
### Submission
You need to:
1. Fill out the ["Big Idea Worksheet"](https://docs.google.com/document/d/1-GZvhdbhLYLB_Bo1arj1rgTqbJ5SUoU21vtgbYEhVqk/edit?usp=sharing) and save the PDF in your GitHub project environment inside the folder `references` as `big-idea-proposal.pdf`
2. Fill out the file `project-proposal.ipynb` in your project environment (see path: `references`) and save it as HTML file.
Push all of your final changes to the GitHub repo, and submit `big-idea-proposal.pdf` and `project-proposal.html` to Moodle.
### Proposal grading
| Total | 10 pts |
|----------------------|--------|
| **Introduction** | 3 pts |
| **Data description** | 2 pts |
| **Analysis plan** | 4 pts |
| **Data dictionary** | 1 pts |
Each component will be graded as follows:
- **Meets expectations (full credit)**: All required elements are completed and are accurate.
The narrative is written clearly, all tables and visualizations are nicely formatted, and the work would be presentable in a professional setting.
- **Close to expectations (half credit)**: There are some elements missing and/or inaccurate.
There are some issues with formatting.
- **Does not meet expectations (no credit)**: Major elements missing.
Work is not neatly formatted and would not be presentable in a professional setting.
## Draft analysis {#sec-draft-analysis}
The purpose of the draft analysis is to give you an opportunity to get early feedback on your project. Each team should push their final version of the draft analysis to their GitHub repo by the due date.
The structure of the draft analysis is as follows:
```md
- Introduction
- Setup
- Data
- Import data
- Data structure
- Data corrections
- Analysis
- Descriptive statistics
- Exploratory analysis
- Visualizations
- Visualization ideas
- Save visualizations
- Conclusion and recommended action
```
Below is a brief description of the sections to focus on in the draft.
- *Introduction*: This section includes an introduction to the project motivation, a data dictionary and (research) question.
- *Setup*: Import all necessary Python libraries.
- *Data*: Includes all data prepartion steps.
- *Analysis*: Descriptive statistics and EDA. Interpret the results.
- *Visualizations*: Includes multiple visualization ideas.
- *Conclusion and recommended action*: This section includes initial interpretations and conclusions drawn from the model.
## Peer review draft analysis {#sec-peer-review-draft}
Critically reviewing others' work is a crucial part of the scientific process, and our course is no exception. Each team will be assigned two other teams's projects to review.
During the peer review process, you will be provided read-only access to your partner teams' GitHub repos.
Provide your review in the form of [GitHub issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue) to your partner team's GitHub repo using the issue template provided (see below).
The peer review will be graded on the extent to which it comprehensively and constructively addresses the components of the partner team's report: the research context and motivation, exploratory data analysis, visualizations, interpretations, and conclusions.
### Issue template {#sec-issue-template}
Spend \~30 mins to review each team's project.
1. Find your team to review in Moodle.
2. Open the repo of the team you're reviewing, read their project draft, and browser around the rest of their repo.
3. Then, go to the `Issues` tab in that repo, click on `New issue` and fill out the issue by answering the following questions (copy the following content and replace it with your feedback):
Issue template:
```md
### Peer review team
- Peer review by: \[NAME OF TEAM DOING THE REVIEW\]
- Names of team members that participated in this review: \[FULL NAMES OF TEAM MEMBERS DOING THE REVIEW\]
### Goal of the project
- Describe the goal of the project.
### Data
- Describe the data used or collected, if any. Is the data adequate for the project?
### Visualization approach
- Describe the visualization approaches that will be used.
### Lack of clarity
- Is there anything that is unclear from the proposal?
### Possible improvements
- Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the visualization ideas of the project, but do feel free to comment on aspects beyond visualizations.
### Presentation
- What aspect of this project are you most interested in and would like to see highlighted in the final presentation?
### Organization
- Provide constructive feedback on any issues with file and/or code organization.
### Further comments
- (Optional) Any further comments or feedback?
```
If you are done, click on `Submit new issue`.
## Report {#sec-report}
Your final report has to be created in the `report.ipynb` file (see folder `reports/`) and must be reproducible. Assume that it will be used to communicate your results to other colleagues who are interested in your findings.
All team members should contribute to the GitHub repository, with regular meaningful commits.
**You also need to submit the HTML of your final report in Moodle** (the HTML you submit must match the files in your GitHub repository *exactly*).
The mandatory components of the report are below. You are free to add additional sections as necessary.
The report, including visualizations, should be **no more than 10 pages long.** There is no minimum page requirement; however, you should comprehensively address all of the topics stated in the file.
Be selective in what you include in your final write-up.
The goal is to write a cohesive narrative that demonstrates a thorough and comprehensive analysis rather than explain every step of the analysis.
You are welcome to include an appendix with additional work at the end of the written report document; however, grading will largely be based on the content in the main body of the report.
You should assume the reader will not see the material in the appendix unless prompted to view it in the main body of the report.
The appendix should be neatly formatted and easy for the reader to navigate.
It is not included in the 10-page limit.
The written report is worth 40 points, broken down as follows
| Total | 40 pts |
|-------------------------------|--------|
| **Introduction/data** | 7 pts |
| **Visualizations** | 20 pts |
| **Discussion + conclusion** | 8 pts |
| **Organization + formatting** | 5 pts |
[Click here](https://docs.google.com/spreadsheets/d/1RNPoP8xnnbd3TmEHxVGQXjH4DzWBCmCVVhlleD1ldoI/edit?usp=sharing) for an overview of the written report rubric.
### Introduction and data
This section includes an introduction to the project motivation, data, and (research) question (main content from the Big Idea Worksheet).
Describe the data and definitions of key variables.
It should also include some exploratory data analysis.
All of the EDA won't fit in the paper, so focus on the EDA for the response variable and a few other interesting variables and relationships.
#### Grading criteria
The research question and motivation are clearly stated in the introduction, including citations for the data source and any external research.
The data are clearly described, including a description about how the data were originally collected and a concise definition of the variables relevant to understanding the report.
The data cleaning process is clearly described, including any decisions made in the process (e.g., creating new variables, removing observations, etc.) The explanatory data analysis helps the reader better understand the observations in the data along with interesting and relevant relationships between the variables.
It incorporates appropriate visualizations and summary statistics.
### Visualizations
This section includes a profound description of your visualization selection and creation process (with regard to the topics covered in our course).
Explain the reasoning for the type of visualization considered for the project.
Additionally, show how you arrived at the final visualizaion by describing the selection process, variable transformations (if needed) and any other relevant considerations that were part of the creation process.
#### Grading criteria
The selection steps are appropriate for the data and (research) question.
The group used a thorough and careful approach to select the final visualization(s); the approach is clearly described in the report.
The visualization selection process took into account the aspects covered in the course (law of proximity, ...).
If violations of optimal visualization conditions are still present, there was a reasonable attempt to address the violations based on the course content.
### Conclusion + recommended action
In this section you'll include a summary of what you have learned about your (research) question along with (statistical) arguments supporting your conclusions.
In addition, discuss the limitations of your analysis and provide suggestions on ways the analysis could be improved.
Any potential issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here.
Lastly, this section will include your recommended action.
#### Grading criteria
Overall conclusions from analysis are clearly described, and the visualization results are put into the larger context of the subject matter and original (research) question.
There is thoughtful consideration of potential limitations of the data and/or analysis. The recommended action is plausible/convincing and clearly described.
### Organization + formatting
This is an assessment of the overall presentation and formatting of the written report.
#### Grading criteria
The report neatly written and organized with clear section headers and appropriately sized figures with informative labels.
Numerical results are displayed with a reasonable number of digits, and all visualizations are similar formatted as the examples shown in this course.
All citations and links are properly formatted.
If there is an appendix, it is reasonably organized and easy for the reader to find relevant information.
All code, warnings, and messages are suppressed.
The main body of the written report (not including the appendix) is no longer than 10 pages.
## Slides & video presentation {#sec-slides-video}
### Slides {#sec-slides}
In addition to the report, your team will also create presentation slides (e.g. [Google Slides](https://www.google.com/intl/en_en/slides/about/) with an adequate [template](https://slidesgo.com/themes)). Additionally, you have to record a video (screencast) that summarize and showcase your project (see [](video)).
Introduce your project and data set, showcase visualizations, and discuss the primary conclusions.
These slides should serve as a brief visual addition to your report and will be graded for content and quality.
If your presentation slides are online, you can put a link to the slides in a `README.md` file in the `reports/` folder.
**For submission, convert these slides to a .pdf document, and submit the file on Moodle.**
The slide deck should have no more than 6 content slides + 1 title slide.
Here is a *suggested* outline as you think through the slides; you **do not** have to use this exact format for the 6 slides.
- Title Slide
- Slide 1: Introduce the topic and motivation
- Slide 2: Introduce the data
- Slide 3: Highlights from EDA
- Slide 4: Interesting findings
- Slide 5: Conclusions + recommended action
### Video presentation {#sec-video}
For the video presentation (i.e., screencast), you can speak over your slide deck.
**The video presentation must be no longer than 8 minutes.** It is fine if the video is shorter than 8 minutes, but it cannot exceed 8 minutes.
You may use any platform that works best for your group to record your presentation.
Below are a few resources on recording videos:
- [Recording presentations in Zoom](https://kb.siue.edu/61721)
- [Apple Quicktime for screen recording](https://support.apple.com/en-gb/guide/quicktime-player/qtp97b08e666/mac)
- [Windows 10 built-in screen recording functionality](https://www.youtube.com/watch?v=OfPbr1mRDuo)
- [Kap for screen recording](https://getkap.co/)
Once your video is ready, upload the video to Moodle.
## GitHub repo {#sec-github-repo}
All written work (with exception of presentation slides) should be reproducible, and the GitHub repo should be neatly organized.
:::{Note}
Do not change the content or structure of the repo after the stated deadline.
:::
You will find an overview of the GitHub structure in the README.md file of the GitHub repo (the project structure is based on DrivenData's [Cookiecutter Data Science-Template](https://drivendata.github.io/cookiecutter-data-science/)).
Points for reproducibility + organization will be based on the reproducibility of the report and the organization of the project GitHub repo. The repo should be neatly organized, there should be no extraneous files, all text in the README should be easily readable.
## Presentation comments {#sec-comments}
Each student will be assigned 2 presentations to watch (your viewing assignments will be posted later in the semester)
Watch the group's video, then use the Moodle Forum to post a question for the group.
You may not post a question that's already been asked on the discussion thread.
Additionally, the question should be
- substantive (i.e. it shouldn't be "Why did you use a bar plot instead of a pie chart"?)
- demonstrate your understanding of the content from the course, and
- relevant to that group's specific presentation, i.e demonstrating that you've watched the presentation.
*This portion of the project will be assessed individually.*
## Teamwork evaluation {#sec-team-eval}
You will be asked to fill out a survey where you rate the contribution and teamwork of each team member by assigning a contribution percentage for each team member.
Filling out the survey is a prerequisite for getting credit on the team member evaluation.
If you are suggesting that an individual did less than half the expected contribution given your team size (e.g., for a team of four students, if a student contributed less than 12.5% of the total effort), please provide some explanation.
If any individual gets an average peer score indicating that this was the case, their grade will be assessed accordingly.
If you have concerns with the teamwork and/or contribution from any team members, please email me by the project video deadline.
**You only need to email me if you have concerns**.
Otherwise, I will assume everyone on the team equally contributed and will receive full credit for the teamwork portion of the grade.
## Overall grading {#sec-grading}
The grade breakdown is as follows:
| Total | 100 pts |
|------------------------------------|---------|
| **Topic ideas** | 5 pts |
| **Project proposal** | 10 pts |
| **Peer review** | 10 pts |
| **Report** | 40 pts |
| **Slides + video presentation** | 20 pts |
| **Reproducibility + orga (GitHub)**| 5 pts |
| **Video comments** | 5 pts |
| **Peer teamwork evaluation** | 5 pts |
*Note: No late project reports or videos are accepted.*
### Grading summary
Grading of the project will take into account the following:
- Content - What is the quality of the (research) question and relevancy of data to those questions?
- Correctness - Are all procedures carried out and explained correctly?
- Writing and Presentation - What is the quality of the presentation, writing, and explanations?
- Creativity and Critical Thought - Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project?
A general breakdown of scoring is as follows:
- *90%-100%*: Outstanding effort. Student understands how to apply all concepts, can put the results into a cogent argument, can identify weaknesses in the argument, and can clearly communicate the results to others.
- *80%-89%*: Good effort. Student understands most of the concepts, puts together an adequate argument, identifies some weaknesses of their argument, and communicates most results clearly to others.
- *70%-79%*: Passing effort. Student has misunderstanding of concepts in several areas, has some trouble putting results together in a cogent argument, and communication of results is sometimes unclear.
- *60%-69%*: Struggling effort. Student is making some effort, but has misunderstanding of many concepts and is unable to put together a cogent argument. Communication of results is unclear.
- *Below 60%*: Student is not making a sufficient effort.
### Late work policy
Project assignments (see timeline) may be submitted up to 2 days late. There will be a 25% deduction for each 24-hour period the project is late.
Be sure to turn in your work early to avoid any technological mishaps.