Kickstarter funding rates

One of the simplest questions one can ask about a particular Kickstarter campaign is: “What are the odds it will get funded?”  Unfortunately, it’s not as simple to answer as it is to ask.  Even if we’re just talking kinematics (which we are, at this point), there are a lot of things we need to consider if we want a realistic idea of how likely it is that a project get funded.

Kickstarter stats page

An obvious starting point would be the Kickstarter stats page.  They list the overall success rate at 44%.  Actually, they list the overall success rate as 43.83%, but two significant figures are enough for me on most days.  I’ll come back to the utility of this number in a minute, but it’s important to notice right away that this overall percentage varies strongly across categories. The most consistently funded category is Dance, with a success rate of 70%, and the least is Fashion, with a success rate of 27%.

There are two other tables on the page; one gives some statistics about the successfully funded projects in each category, and the other gives statistics on the unsuccessful ones.  Unfortunately, the way the data is presented makes it difficult to do any more interesting comparisons.  How does success rate depend on the stated goal? Does that vary by category? How does the overall amount of money raised depend on the stated goal?  Does that vary by category? And so on.  Fortunately, I can use the data I’ve scraped to address these questions.

Scraped data

Of course, before I try to draw any conclusions from my scraped data, I need to know how accurately (or not) it reflects the data set that Kickstarter is using to generate their statistics. They, of course, have access to all of the projects that have launched, and so their data set should be viewed as authoritative.  If I generate the same set of statistics they report on the stats page, and compare the results from my data set to theirs, it will help me identify any systematic errors in my data set.  I expect there will be some (as I’ve noted) but this kind of check will confirm that feeling.

As of this writing, I have 54,034 projects in my scraped data set, which is 71% of the total.  Comparing my number of successful and unsuccessful projects with those on the stats page, I’m missing 2318 successful projects (or 7.2%), and 16,690 unsuccessful projects (or 40.68%).  So, yeah, I’m missing a much higher fraction of the unsuccessful projects than the successful ones, as expected.  I can get even more specific, though, and figure out where in the distribution of unfunded projects the missing ones ought to be.

I can summarize the discrepancies in the following table:

Discrepancies in unsuccessful project counts by category and percentage of goal achieved
category total 0% 1%-20% 21%-40% 41%-60% 61%-80% 81%-99%
Film & Video 5871 1593 4180 119 -3 -22 2
Fashion 665 197 445 20 4 -3 2
Art 1362 368 987 26 -1 -13 -6
Publishing 2572 724 1819 34 5 -16 5
Music 3850 1375 2346 166 -6 -23 -8
Food 195 -94 303 -4 2 -8 -4
Photography 662 168 461 27 2 2 2
Comics 182 25 178 -7 -1 -7 -6
Games 208 -256 495 -19 -4 -9 1
Theater 563 165 395 15 2 -13 -1
Design 244 1 258 -2 -6 -4 -3
Technology 218 -11 224 1 -3 -2 9
Dance 98 39 64 -1 2 -3 -3

Most (nearly all) of the missing projects are in the “less than 20% of goal” category. Also worthy of note is the fact that some of the discrepancies are negative. That means that may dataset puts more projects into that category than are reported by Kickstarter. There are two plausible explanations for this: first, my data is almost undoubtedly somewhat out of date. If I recorded a percentage before the end of the project, and then didn’t go back to refresh the data after the project ended, I would end up with a different number than Kickstarter does.  The second explanation is that there are differences in either rounding or binning these projects. It’s not 100% clear from the stats page exactly where Kickstarter puts the boundaries between their bins, so it’s possible that some of the excess 0% projects I show in Games should balance out the deficit of 1-20% projects, for example.

In practical terms, this means my data set is least reliable for drawing conclusions about projects that get less than 20% of their goal, and it will tend to overestimate the likelihood of getting funded.  With that in mind, I’ll move on.

Exploring the scraped data

Let’s look at some statistics for the whole set of 28,695 successful projects.  I had pandas generate some summary statistics:

Statistics for all successful projects
dollars duration goal # backers # comments # updates percent
count 28695 28695 28695 28695 28695 28695 28695
mean 11649.37 36.73 6679.93 159.26 30.96 7.62 370.96
std 98894.08 15.63 19207.47 1123.94 633.04 8.60 11601.98
min 6.00 1 0.01 1 0 0 100.00
25% 2001.31 30 1500.00 34 0 2 104.21
50% 4031.16 31 3000.00 60 2 5 112.65
75% 8450.00 45 7000.00 112 7 10 136.65
max 10266845.74 92 1100000.00 87142 47865 224 1506600.00

The first thing we notice is that for any quantity other than duration, the mean doesn’t tell us anything useful.  That might be a little too strong, but it certainly doesn’t tell us what we normally think of as an average, namely, where the bulk of the distribution is located.  If this isn’t immediately obvious to you, let me walk through it.

Consider the first column: dollars raised. The mean  is $11,649.37.  The naive way to respond to that statement would be to say, “Oh, so a typical successful project raises about $11,650.”  This statement is false. If you look down a few more rows, you see that the 75th percentile is at $8,450. This means that 75% of projects raise $8,450 or less.  Half of projects raise less than $4,100.

The reason for the disparity is the crazy difference between the maximum amount raised ($10.2 Million) and everything else. The mean values are shifted way up by a few extreme outliers.  This is just as true in the other columns (again, with the exception of duration: people typically want to get their money as quickly as they possibly can).  The number of project updates is also somewhat less skewed than the other columns, but that, too, makes sense.  Even a wildly successful project doesn’t require thousands of times as many updates as a barely successful one.

As with many things, a picture makes this clearer.

Histograms for goal, amount raised, and percentage of goal funded

Histograms for goal, amount raised, and percentage of goal funded, excluding outliers.

Plotted in this way, the numbers in the table above snap into pretty sharp relief.  Large goals (anything over $20,000 or so) are uncommon in successfully funded projects, as is a large amount raised. Projects that hit their funding goal also might be able to double it, if they’re lucky (remember from above that the 75th percentile is 136% of the goal), but expecting more than that takes something special.

It’s important to be clear about what I haven’t said, though. I haven’t said anything (and, indeed, with the data plotted this way, I can’t yet say anything) about how the likelihood of getting funded, or the degree to which a project does eventually get funded, depends on the goal.  So, saying that large goals are uncommon in successfully funded projects does not, in itself, mean that large goals are harder to fund than small ones, because large goals are also uncommon in unsuccessful projects. Having raised that question, however, we now need to answer it, and I will again do so with a plot.

Funding rates by goal, all projects

This plot actually gives a little bit more information than promised, and it’s sort of information dense, so let me unpack it. The projects are organized into bins by goal (the top bin actually goes up to infinity; I truncated it at 120,000 for plotting).  Each bin is labeled with the number of projects with a goal in the appropriate range, and colors indicate what the projects actually achieved: the orange bars represent projects that didn’t make their goal, and teal bars are those that were funded.

Of the 25,945 projects that set goals in the $0 – $5,000 range, for example, about 70% of the projects were successful; of these, the overwhelming majority were funded in the 100%-133% range.  The lightest teal bar at the top represents the wildly successful projects, which for the purposes of this plot are all projects which managed to raise more than three times their goal.

A couple of things pop off of this graph. First, overall funding probability (which can be seen as the boundary between orange and teal regions) depends strongly on the goal up to about $50,000, but above that, there is much less of a dependence. While it isn’t shown on this particular graph, the overall funding probability is roughly constant for projects with goals above $90,000.  The other really interesting point is that the percentage of projects that are wildly successful doesn’t depend strongly on the goal.  The exception to this latter trend is only at the very lowest goal levels, below $1000 or so.

The role of project category

The preceding discussion is true if one looks at the complete collection of all projects.  That’s fine for a first, rough idea of how the goal affects funding probability, but it turns out that the project category also accounts for some variability in funding rates.  That’s clear just from the Kickstarter stats page, as I noted at the very beginning of this post.  From the data on the stats page, however, it’s not so clear if there are qualitative as well as quantitative differences between the project categories. There are.

There are two characteristics I’m going to use to classify each category of projects. The first is the percentage of wildly successful (>300% of goal) projects averaged over the goal bins. Since this percentage varies only weakly with goal, taking an average seems like a reasonable way to characterize the percentage of highly successful projects. I’ll call this the “hit percentage”, realizing that the name isn’t perfect. The second characteristic is the ratio of the population of the highest bin (goal >$100,000) and the lowest bin ($0-$5,000, for now), converted to a percentage. I’ll call this one the tail weight, since it amounts to the integral of the tail of the distribution, as a percentage of the peak of the distribution.  In some sense, the tail weight measures how ambitious project creators are (or need to be), and the hit percentage measures the rate at which projects greatly exceed their goal. All of this information (along with average funding rate by category) can be seen in the following plot:

Scatter plot of categories showing tail weight, hit percentage, and overall funding percentage.The combination of hit percentage and tail weight splits the categories into basically two classes: the low hit percentage, low tail weight “Artistic” categories (Dance, Theater, Music, Photography, Art, Publishing, Fashion, Food, and Film & Video); and the heavy tailed, high hit percentage “Techie” categories (Games, Design, and Technology).  Comics is off by itself, with a moderately high hit percentage and a light tail.

Artistic Projects

This is the larger class, both in terms of number of categories as well as number of projects. It’s not surprising, then, that the overall picture looks a lot like the one above which includes all projects.

Artistic projects

Since the picture is so similar to what we saw for the full set of projects, the discussion is very similar, too. The differences from the picture with all categories included are what you would expect: the tail is a little bit lighter, and the percentage of wildly successful projects is a little bit lower.  One thing that might not have been obvious without looking carefully at the category scatter plot above is that the overall funding percentage is a bit higher for the artistic projects than it is for the set of all projects. This is because the four categories that have been omitted have a lower percentage of funded projects.

One other caveat with this data is that the majority of the projects missing from my data set would go into the bottom set of bars on this graph; most of the projects were in the 0-20% funding range, and the four categories with the largest counts of missing projects were Film & Video, Music, Publishing, and Art.  So this graph overestimates the average funding rate somewhat.

Games, Design, and Technology projects

Here’s where the picture gets different.    These projects are, generally, more ambitious in terms of their funding goals (hence the heavy tails) but independently of that they are also more likely to be “hits.”  That is, they are more likely to raise significantly more money than they asked for.  They also dominate the press about Kickstarter. More on that in a minute.  Here’s the data:

Games, Design, and Technology projects

We’ll start at the top and move downward. The overall hit rate is much bigger for these categories than it is for the Artistic categories, and while it varies some with goal, the bins with lower hit rates also have lower numbers of projects.  This means that those rates have higher uncertainties associated with them. This post is already long enough that I don’t want to include a full discussion of uncertainty quantification here; I’ll save that for the future. The important feature is that even though we might think that a project might be less likely to get triple their goal (or more) with a large goal than a small one, for these project categories it seems that that tendency is weak, if it exists at all.

In terms of overall funding rate (the boundary between teal and orange on the graph) the dependence on goal is also not quite as strong in these project categories as it is in the others.  Projects at the low end of the displayed range are less likely than their artistic counterparts to meet their goal, and projects at the high end are more likely.  The ones in the middle are about the same, interestingly.  It’s important to remember, though, that this conclusion could be misleading, since the data I’m using for artistic projects is less complete than for these categories, and the bulk of missing projects did not meet their funding goals. In other words, it’s entirely possible that the high funding rates for artistic projects at the low end of the goal scale are artificially high, and should look more like those in the categories shown here.

One check on this assertion is to look at the Kickstarter stats page.  Overall funding rates for Games, Design and Technology come out to 35% (compared to the 42% indicated by my data set), and the Artistic categories give 47% (compared to the 61% from my data).  Qualitatively, the statement (“Games, Design and Technology projects meet their goals at a lower rate than Artistic projects”) is true, though the degree to which that is true isn’t accurately represented by my (incomplete) data set.  The stats page doesn’t give enough detail for us to check the other part of the assertion—that funding rates for projects with high goals in Games, Design, and Technology are better than they are in Artistic projects, but it seems like a reasonable conclusion. The overwhelming majority of the missing projects are in the Artistic categories and at low funding percentages, which will tend to make the funding rates in Artistic categories worse. Since, at high goals, they are already worse than what we observe in the Games, Design, and Technology projects with the limited data set, it seems unlikely that adding the missing projects would change that.

Discussion

The data I’ve presented above has, I think, two main take home messages.  First, the general (and unsurprising) trend is that higher goals are harder to meet.  More importantly, though, they’re harder to meet in a predictable and well defined way.  This can add some focus to pre-launch research.

The second main message is that the project categories can be qualitatively different from one another. In particular, the Games, Design, and Technology categories are significantly different from the other categories (which I have lumped together as “Artistic” above), and the Comics category is somewhere in between the two groups.

My guess is—and this is almost pure speculation on my part—that the differences between these supercategories can be understood in terms of the characteristics of their creators and their fans. Fans of Games, Technology, Design, and Comics tend to be geeks.  I realize I’m painting with a very broad brush here, and that this clearly isn’t true of every fan or every project, but bear with me. Geeks tend to be very committed to their fandoms, well connected in communities associated with those fandoms, and, once they hit their stride within their professions, have a significant amount of disposable income. These traits add up to both the means and the motive to spend on Kickstarter projects.  Furthermore, Kickstarter as a thing is much better understood in geek circles than it is among the general populace, and geeks have a greater tendency to be online than the general populace, leading to more opportunity for fans of projects in these categories to pledge.  The end result is a larger than average percentage of projects which far exceed their stated goals.  Hence the high hit rates noted above.

On the other hand, the creators of Games, Technology, and Design projects tend to skew more toward entrepreneurs and business types than the creators in the artistic categories (the “starving artist” is a stereotype for a reason, after all).  Again, this is a gross generalization, and I’m lumping freelance designers in with entrepreneurs and business types (which may not be fair to anyone).  Setting that aside, the projects in the artistic categories, including Comics, tend to have less ambitious goals simply because the artists who create them are thinking more about producing the current project than they are about launching a business.  This is what leads to the heavy tails in Games, Design and Technology relative to the other categories.   Comics is in between because while the fans are similar to those from Games, Design, and Technology, the creators are more like those from the Artistic categories.

There is undoubtedly more I could extract from the data, but this post is already long enough that very few people will get to this point, so I’ll save something for future posts.  See below for some ideas of where I think I’m headed with this.

Conclusions

Kickstarter has received a lot of attention, and deservedly so. Both the idea of crowdfunding (or distributed micro-patronage, as I like to think of it) and the details of their implementation are having a transformative effect on how people find money for their projects.  Media coverage, though, either in traditional media or in blogs and aggregators, tends to be dominated by stories about the outliers. This is natural, in some sense, since the big hits are more noteworthy, but it can lead to a skewed view of the process on the part of potential creators.

This difficulty is compounded by the way Kickstarter reports its statistics.  My failed kickstarter project was in the Design category, and had a goal of $75,000.  If I had seen the graphs above, I could have concluded that (a) there weren’t many projects as ambitious as mine, and (b) among those projects, only about 20% met their funding goals.  Instead, I looked at Kickstarter’s overall funding rate of 44%, and the Design category rate of 37%, and concluded that the odds were about twice as good as they actually were.  Add to this sensational media stories about Pebble, Pen Type A, Doublefine, and the like, and I had all kinds of unrealistic hopes.  I dramatically underestimated the amount of marketing and promotion work I would have to do, and the project didn’t come close to meeting its goal.

I need to be clear, here. I’m not faulting Kickstarter for my inability to correctly interpret their statistics, nor for the way in which they report them. I think that they report anything at all is a good thing.  I could have, and should have, done more detailed research before launching my project.

And in fact, my project, while I use it as an illustration, is not the point.  The success or failure of any individual project depends on a combination of factors spread across both project design and the ability to execute. Knowing, for example, that $75,000 is a high goal in the design category (93rd percentile for all Design projects, 96th percentile for successful ones), and that the success rate for projects in that range is about 20%, gives a better sense of what care and work need to go into the other design elements of the campaign, and how important the execution is.

I’m hoping that this work will be useful to someone else as they set out to put a Kickstarter campaign together. Please pass it along, and if there’s something that would make it more useful to you, let me know either in the comments or via email.

Future work

There are a number of directions I’d like to take this work as it develops.  Here’s a partial list:

Modeling the data. I’ve done a bit of this, but it would make this post too long to try to fit it in here. Modeling is useful because once a model is validated, it gives a compact way to describe and compare data sets. Just by squinting, I can say that these data look vaguely power-law-ish, but of course details are really important.

Time dependence of overall statistics.  I’m curious whether the media attention from large successful projects has driven goals up, and to what extent it has affected success rates. This is pretty easy to check, but the sample sizes will get smaller as I restrict projects by date.

Time dependence for individual campaigns.  There are already a couple of websites that track time evolution of Kickstarter campaigns, but they don’t have quite the data granularity that I need. Which leads to:

Campaign response to external forcing (broadly).  How do external events (tweets, especially by influential people; coverage in different blogs or media outlets) affect the time evolution of a project’s funding?  This is really the key to understanding how to make projects succeed, and is where I would like to end up.

The backer/project graph.  I speculated above about the fans and creators of different categories of products. By looking at the graph of creators and projects, I might be able to extract some insight about the overlap of fans between projects, or how to go about reaching potential backers.

Execution. The other piece of the puzzle before I can understand how to make a campaign successful is to ask creators exactly what they did to get the results they got. I’ve read, seen, and listened to a number of interviews, but those are just anecdotes, and the creators don’t often recognize what they have done, since they are only operating within the context of their own project. I need to use surveys to ask a lot of creators enough questions to really understand what works and what doesn’t.

If there’s anything you’d like to see that you think I’ve missed, leave a comment or send me an email.