This assessment evaluates a candidate's software development skills relevant to CEP's work. Besides efficiency, CEP also looks for clarity in code submissions.
The assessment has three exercises that test a candidate's ability to manipulate CSV data and write JSON outputs using existing Python packages.
We recommend that the candidate has familiarity with statistics and the following Python packages: numpy
, scipy
, and pandas
.
If you have any questions regarding the assessment, please feel free to reach out to Mike Nguyen at [email protected].
For more information about CEP and our work, please visit www.effectivephilanthropy.org
For each exercise folder, there are three subfolders within:
input
: This folder contains all input files to be read by your programoutput
: This folder contains "model" output files that your program ought to produce. For example, in exercise 1, there are two files in theoutput
folder:mean.csv
andstats.csv
. Your program is expected to create these two files in the exact format and layout.submissions
: This is where you can save your program in the formatyour_name.py
To submit your codes, first, fork this repo, add your own program files in the /submissions
folder for each exercise, then email Mike Nguyen ([email protected]) with a link to your GitHub forked repo.
In spring 2014, CEP helped 9 foundation clients administer a survey to their grantees. At the end of the survey, CEP analysts cleaned and sent the data to you in a CSV format. The file is called xl.csv
and saved in the exercise1/input
folder. This file contains all survey responses, with each column representing a survey question and each row a set of responses from a grantee.
In this file, the column fdntext
provides the client code for a given survey respondent, i.e., helps identify from which foundation the respondent receives the grant from. The 9 foundation clients have the following 9 codes:
Arlington 14S
Boylston 14S
Clarendon 14S
Dartmouth 14S
Kenmore 14S
Marlborough 14S
Peabody 14S
Roxbury 14S
Tremont 14S
You are curious to learn how the clients are rated by grantees on the 9 questions in the survey:
fldimp
undrfld
advknow
pubpol
comimp
undrwr
undrsoc
orgimp
undrorg
Respondents answer these survey questions on a scale of 1 to 7, with a higher number indicating a more positive rating. If respondents indicate that the question is not applicable to them, their responses will be coded as 77. If respondents indicate that they are not sure about the question's answers, their responses will be coded as 88. As such, in each of the 9 question columns in xl.csv
, there are 10 unique values: 1
, 2
, 3
, 4
, 5
, 6
, 7
, 77
, 88
, and a blank value.
Since you're not interested in the 77
and 88
responses, you choose to recode these values as blank.
After recoding , you calculate the mean rating at the client level for each of the 9 questions so you can compare the client ratings side by side. For example, the following table format shows how the 9 clients are rated on the comimp
question:
Client | comimp |
---|---|
Arlington 14S | 6.5151515151515156 |
Boylston 14S | 5.4745762711864403 |
Clarendon 14S | 5.7875647668393784 |
Dartmouth 14S | 6.2045454545454541 |
Kenmore 14S | 5.7307692307692308 |
Marlborough 14S | 4.7705627705627709 |
Peabody 14S | 5.8947368421052628 |
Roxbury 14S | 6.2884615384615383 |
Tremont 14S | 5.7445652173913047 |
Read file xl.csv
, look at the 9 question columns above, and recode any 77
and 88
values as blank. After recoding, calculate the mean rating for each funder for each of the 9 questions. Save your output as mean.csv
. You may refer to exercise1/output/mean.csv
for a preview of how the output file should look like.
Hint: If you're familiar with pandas
, the following line should help:
# data is the recoded DataFrame with the 9 question columns
# that you're interested in
mean = data.groupby('fdntext').mean()
You're also interested in knowing more about the descriptive statistics of the client ratings; you want to know the highest, lowest, median, and quartile ratings for each of the 9 questions. Generate a CSV file stats.csv
that will include these descriptive statistics. Refer to exercise1/output/stats.csv
for a preview of how the output file should look like.
Hint: Using pandas
, the following line should help:
# mean is the DataFrame generated from Task 1
stats = mean.describe()
Next, given the set of 9 ratings per question, you're interested in finding out the percentile of each client rating. For example, the following table shows the client ratings and their percentiles for the comimp
question.
Client | comimp | comimp_pct |
---|---|---|
Arlington 14S | 6.5151515151515156 | 94.44444444 |
Boylston 14S | 5.4745762711864403 | 16.66666667 |
Clarendon 14S | 5.7875647668393784 | 50 |
Dartmouth 14S | 6.2045454545454541 | 72.22222222 |
Kenmore 14S | 5.7307692307692308 | 27.77777778 |
Marlborough 14S | 4.7705627705627709 | 5.555555556 |
Peabody 14S | 5.8947368421052628 | 61.11111111 |
Roxbury 14S | 6.2884615384615383 | 83.33333333 |
Tremont 14S | 5.7445652173913047 | 38.88888889 |
Given the mean.csv
file that you have generated in Exercise 1, now stored in exercise2/input/mean.csv
, generate a CSV file pct.csv
that stores the percentile ratings for each client for the 9 survey questions. Refer to exercise2/output/pct.csv
for a preview of how the output file should look like.
Hint: scipy
has a function called percentileofscore that should make percentile calculation easy:
from scipy.stats import percentileofscore
# If you have the list of 9 client ratings stored in rating_list
# you can do the following loop:
percentile_list = []
for r in rating_list:
percentile = percentileofscore(rating_list, r, kind='mean')
percentile_list.append(percentile)
# Now you have 9 percentile ranks for a given question
# stored in percentile_list
At CEP, we deliver results to clients using an online reporting platform. This platform reads client data from a JSON file and renders these data points as charts and objects.
You're now interested in creating a JSON file to render a report for the client Tremont 14S. A JSON file for Tremont 14S, with only one percentile chart in report for the question sample_question
may look as follows:
{
"version":"1.0",
"reports":
[
{
"name":"Tremont 14S Report",
"title":"Tremont 14S Report",
"cohorts": [],
"segmentations": [],
"elements":{
"sample_question":{
"type":"percentileChart",
"absolutes":[5, 6, 6, 6, 7],
"current": {
"name": "2014",
"value": 6.09,
"percentage": 42
},
"cohorts": [],
"past_results":[],
"segmentations": []
}
}
}
]
}
Here's an explanation for all the keys in the JSON object above:
version
: accepts a string value. You may input "1.0" here.reports
: accepts a JSON array of report objects. Each report object contains the following keys:name
: accepts a string value for the name of the report. You may use "Tremont 14S Report"title
: accepts a string value for the name of the report. You may use "Tremont 14S Report"cohorts
: accepts a JSON array of strings. For simplicity's sake, you may input an empty array heresegmentations
: accepts a JSON array of strings. For simplicity's sake, you may input an empty array hereelements
: accepts a JSON object. This object contains all relevant data points for the reports. Each key represents the question column name and its corresponding value is a chart object. The chart object contains the following keys:type
: accepts a string value. You may use "percentileChart"absolutes
: accepts an array of five numeric values. These values represent the 0th, 25th, 50th, 75th, and 100th percentile values for the given questioncurrent
: accepts a JSON object. This object has three keys:name
: accepts a string value. You may use "2014"value
: accepts a numeric value. This is the mean rating for Tremont 14S for the questionsample_question
.percentage
: accepts a numeric value. This is the percentile rating for Tremont 14S for the questionsample_question
.
cohorts
: accepts a JSON array of strings. For simplicity's sake, you may input an empty array herepast_results
: accepts a JSON array of strings. For simplicity's sake, you may input an empty array heresegmentations
: accepts a JSON array of strings. For simplicity's sake, you may input an empty array here
For this exercise, you will read the three files produced in exercises 1 and 2: mean.csv
, stats.csv
, and pct.csv
. These files are saved in the exercise3/input/
folder as well.
From these 3 files, you can extract the following data points for Tremont 14S:
mean.csv
: Mean ratings for Tremont 14S on the 9 questions.pct.csv
: Percentile ratings for Tremont 14S on the 9 questions.stats.csv
: The quartile values for the set of 9 client ratings on the 9 questions.
Using these data points, create 9 percentile chart objects under the elements
key of the JSON object above. Refer to exercise3/output/output.json
for a preview of how the output file should look like.
Hint: The hint for this exercise is buried deep somewhere within this repository. If you're familiar with GitHub, you might be able to find it.
Good luck! 🙏