Lately, I've seen the same questions come up in the programming and CS education circles:
- What age is the best to start to learn programming? What's the minimum? What's the optimal? What type of programming should be learnt?
- Are both genders equally capable of learning to program? Are there ways of teaching that are more gender-neutral than others?
These are interesting questions, and their answers have big implications for people creating programming curriculum and teaching it to others. Since we're fortunate to have many students learning programming on Khan Academy and we're logging their activity, I wanted to see if we could answer those sort of questions in relation to our programming curriculum. We currently only have gender for 10% of our users and birthdate for 13% of our users (as we introduced them as optional fields a year ago), so our data is far from complete - but given we have such large numbers to start with, it seemed worthwhile to explore what we could find out in that set of users.
Effect of gender
When we look at our entire Khan Academy user base and query the user base with filled out gender information, 48% are female and 52% are male - pretty close to half and half. That may be because we have many classroom students, and classrooms tend to be split evenly across the genders.
Now, if we look specifically at the Khan Academy programming curriculum - by looking at all the users that begin the very first coding challenge and have filled out their gender, then we have a different split: 34% are female and 66% are male. We see here a bias towards male, perhaps because much of CS learning happens outside of classrooms right now - more of our independent learners crowd. For whatever reason (and you may be already thinking of many reasons!), more males are choosing to independently learn programming than females.
We've established that more males are doing our curriculum than females, but how well do each of them do, once they've made the choice to start learning? That's where it gets exciting. If we look at the completion stats for that first coding challenge, 86.2% of males complete it (20762/23940) and 86.7 of females complete it (10770/12494). Those are very close numbers- close enough that the difference could be due to statistical error. That means that males and females are just as likely to complete the coding challenge, as long as they start it. This is very encouraging, and gives us good reason to come up with initiatives to raise the numbers of females that try out the curriculum.
Effect of age
We basically looked at the same stats, but for age. Since the main question is how young programming can be taught (elementary school, middle school, high school, or college), I'll snippet out the data that covers that age range. As it happens, that's also where the data is most interesting.
From ages 8-25, we have challenge data from a total of 40,269 students, with some ages much better represented than others. Here's how many students started the first coding challenge at each age:
8 512
9 1280
10 2660
11 4120
12 4068
13 3116
14 5997
15 5897
16 3207
17 2364
18 1949
19 1460
20 818
21 580
22 474
23 494
24 826
25 447
For each of the age groups, we looked at how many of them completed the challenge, and calculated the challenge completion stats (like we did with gender above). We're attempting to answer the question as to whether someone's age significantly affects the likelihood that they'll complete the challenge. Here are the stats for each age:
8 71.48
9 73.44
10 70.94
11 75.46
12 79.74
13 81.68
14 83.06
15 83.52
16 86.81
17 88.16
18 87.79
19 88.01
20 89.85
21 88.62
22 89.03
23 89.47
24 87.41
25 89.93
As you can see, the completion rate does increase with age, and it appears to be a significant difference. Here's what that looks like in graph form:
I have to admit, I was a little disappointed when I saw the stats. There was a part of me that was hoping that even 8 year olds would be as likely as 12 year olds - when in fact, there's nearly a 10% difference between them. The completion rate gets pretty stable around 16, around 86-89%.
Now, there's a lot of possible reasons why there are seemingly significant completion differences from 8 to 16, and not every reason is "its too young to start to learn programming." Some examples: 1) our programming environment involves a bit of basic math/spatial reasoning, which they may not feel as comfortable in 2) programming requires a degree of patience, which younger kids have less of, 3) programming involves typing skills, more so than other apps that kids may use, 4) our programming is easier on desktop browsers than iPads, and maybe young kids tend to use iPads. Some of these reasons are things we could address - for example, we could teach a non-spatial programming environment, we could teach non-syntactic visual programming (like SCRATCH), we could teach typing skills (something we'd like to do anyway). Some of them may just be the biology of younger kids, and there being an age at which the brain is better prepared for the type of abstract concepts and reasoning involved in programming.
Saying all that, the completion rates are 71% for 8-year-olds and 80% for 12-year-olds - still pretty high rates, high enough that a teacher or parent could feel comfortable introducing their student or kids to programming, especially if they are there to support them.
We look forward to continuing to explore our data, to see what we can learn about teaching programming across different demographics, and we'll be attempting to increase the percentage of our user base that has this demographic data filled out as well, to lend more significance to the results.