First Investment Amounts and Venture Outcomes

Jordan Roga
12 min readApr 8, 2022

Exit Likelihood and Value

A logistic regression predicting if companies that received at least one investment from the selected top investors in the data I curated from the amount that was first invested into them does not find that amount invested to be a significant predictor initially.

Even pruning borderline influential points from the set does not yield a high level of importance, though the predictor does become significant. In the original model, the difference in likelihood of exit predicted for a company with $10MM invested is 40.55% (-.3802+-2.275E-10*1E7 as odds ratio exponentiated and then divided by 1 plus itself — applying plogis function to the logit) compared to 40.33% for a prediction with $50MM invested. In the pruned model (just below) the predictions for those same values are 41.14% and 40.94% respectively.

Nor does that model yield a straightforward prediction aligned with trends.

That logistic regression does not follow the pattern observed in the probability of an exit as it varies by that first investment amount just decreasing slowly while the actual probability trends up and down at different points. The model is also not very accurate.

A logistic regression model does not show the first investment amount generally to be a good predictor for if a company is likely to exit. This can also be seen in the density of companies that do an do not exit by their first investment amounts (log scale graph included to further highlight the lack of a clear pattern with this as a predictor — a regression with a log investment amount predictor has a similar lack of a clear trend).

The differences can also be seen as somewhat minimal in a boxplot of the investment amount in companies that did and did not exit.

A survival analysis incorporating the timing of investment and exit can also highlight the impact of a first investment amount as not having an especially large impact on exit likelihood.

The following two diagrams are a sample of how some components of the data are organized on a time basis.

Since all companies have a first investment, those without follow on times (in the form of at least one additional investment or an exit at a later date) are right censored at the time of that first investment (with a days to outcome of 0 and being censored). This is useful in understanding how the survival analysis here and overall data is structured to a degree, but will also be useful later on in conducting a similar analysis with regards to follow on investment and understanding why that may be problematic.

Comparing a cox proportional hazards model incorporating the first investment amount compared to just the exit factor and days to an outcome, exits happen at a 50% likelihood by day 1917 as opposed to day 1902.

The difference in probability of exit over time is observed in a significant but small change in the log hazard ratio as can also be seen in the graphs above stratified by percentile of first investment amount in the dataset with minimal difference.

Observing the time based AUC shows this also to not be an especially accurate model either.

For the companies in this dataset, about 40% of the companies that received investment from these top firms have exited with 60% not having been recorded as having exited (some of these not recorded likely did exit at zero as a failed business — the potential of this is showcased in the diagram of timing above). Though overall there is a slight decrease in likelihood of exit associated, the amount that wound up invested at first into these companies isn’t ultimately the most important predictor of how likely a company is to exit.

Comparing the log of that investment amount to the log exit value of the companies that did wind up being recorded as exiting shows a somewhat clearer relationship.

A linear regression model seems to more closely resemble the trends in how the log exit value varies with changes in the log first investment amount.

The linear regression does show some impact resulting from varying the predictor of the log first investment amount in the model. Changing the log first investment amount from ~16.12 (~$10MM) to ~17.72 (~$50MM) predicts exit values of (12.2+.42*16.12) 19.0 (~$181MM) and 19.7 (~$356MM) respectively.

Given the distribution of actual first investment amounts in the dataset with a median of $7.2MM and a mean of $14.8MM it may be more appropriate to predict values that are more aligned with the data as an example (despite that the earlier logistic regression model is still not very impactful over a large range).

Changing the log first investment amount from ~15.42 (~$5MM) to ~16.12 (~$10MM) predicts exit values of 18.7 (~$135MM) and 19.0 (~$181MM) respectively.

This model still does have an adjusted R-squared of just 0.069 and a root mean squared error of 1.761 — it’s not a prediction I would rely on strongly, though it does describe the trend of increasing exit values associated with increased amounts in a first investment.

Second Investment Likelihood, Value, and Multiple

About 55% of companies that received a first investment from at least one of the top firms in the dataset received a follow on additional investment from any of those firms.

The companies that did not get another investment from any of those investors do appear to have a somewhat higher average amount first invested into them than those that did.

The differences in these groups do vary over the amount invested, however, much of the difference in the group not receiving an additional investment are due to a fatter tail.

A logistic regression looking at if companies do get a follow on investment from any top firm more closely mirrors the reality than a similar regression did for exit probability.

This regression does overestimate the likelihood of a follow on investment at both ends of the distribution.

The probability predicted of a follow on investment from any of the top investors at an investment amount of $10MM is 39.59% and 34.18% at $50MM (40.28% at $5MM), so it does seem to have a somewhat significant impact as a predictor over the majority of the entire data range, though it is not a very accurate model.

Survival analysis incorporating the number of days to a follow on investment from any of the top investors selected will censor all of the observations which do not have a follow on observation at time zero. A survival analysis doesn’t make much sense for this factor.

The likelihood of a company receiving a follow on investment incorporating that time component as it is impacted by the amount first invested in by those firms is expectedly insignificant.

Overall, a higher amount of money a company gets in a first investment could make it so that companies that didn’t go on to receive an additional investment probably didn’t need it as much a bit more often. It is worth noting again that this factor represents additional investment from any of the top firms in the database and is not so broad as to include an additional investment from any other venture firm (even other top firms) and there is a bias I’d expect from that.

The relationship between the log amount first invested into a company in the dataset and the log amount of a second investment seems to be fairly strong.

A linear regression model seems to closely resemble the trends in how the log second investment amount varies with changes in the log first investment amount.

Varying the predictor of the log first investment amount in the model from ~15.42 (~$5MM) to ~16.12 (~$10MM) predicts a log second investment amount of 16.2 (~$10MM) and 16.6 (~$16MM) respectively.

The model has an adjusted R-squared of 0.38 and a root mean squared error of 0.877. A more well fitted model, this describes how increased amounts in a first investment tend to produce increasing second investment amounts.

The relationship between the log amount first invested and the log multiple of the second investment amount relative to the first also seems to be fairly strong but with a negative relationship.

A linear regression model seems to closely resemble the trends in how the log multiple varies with changes in the log first investment amount.

Varying the predictor of the log first investment amount in the model from ~15.42 (~$5MM) to ~16.12 (~$10MM) predicts a log multiple of 0.74 (~2.1x) and 0.47 (~1.6x) respectively.

This model has an adjusted R-squared of 0.192 and a root mean squared error of 0.877. It describes how increased amounts in a first investment tend to produce decreasing second investment multiples.

It is also worth looking at multiples on a categorical basis. A slight majority (~55%) did receive a second investment with the plurality of those that did having a multiple between 1–2x.

Similar to what one might expect based on the continuous regression above, the average amount first invested decreases with increasing categorical multiples.

Companies that have less invested into them at the point of a first investment have a higher likelihood for a larger multiple in a second investment (a 10x second investment is more likely for a company with a first investment of $10MM compared to one with a first investment of $100MM).

Still, accounting for that decreasing amount invested, it is worth noting that the multiples observed do result in something of an upward trend in the second investment amount with increasing categorical multiples.

Looking at the relationship with continuous log multiple as a predictor for the log second investment amount, this relationship is also observed.

Additional Investments Count and Value

While just under half of the companies in the dataset (~45%) have just the first investment of the ~55% that have at least two or more investments in total, just 47% of those (26% of the total) have just two investments from any of the selected investors and the remaining companies have more than two investments in a distribution decreasing with the increasing number of total investments.

The companies with a lower number of investments tended to have a higher amount invested in their first investment from the selected investors.

The relationship between the log of the first investment amount and number of additional investments (additional investments are in focus here because all companies in the dataset have at least one investment from the selected firms) is one where there is something of a gradual decrease accelerated at larger size investments and an overall decreasing trend in the number of additional investments with increasing log first investment amount.

Transforming the graph of those aggregated amounts and the predictions using the different models associated showcases a similar trend but does show how a zero-inflated negative binomial model seems to a bit more closely follow those trends than the other count models.

The probability of different counts given the count models assessed (shown directly and through rootograms) further showcases how the zero-inflated negative binomial model has the more accurate count predictions in its distribution from the dataset overall.

The zero-inflated negative binomial model assumes zeros can come from either a logistic regression or a negative binomial regression with the logistic regression inflating the amount of zeros expected from the negative binomial regression. The logit prediction predicts the structural zero probability, so multiplying (the percentage of the dataset without structural zeros 1-plogis of the logit) with the expected count from the negative binomial count model will reduce that expected count by the inflated zeros.

A company with a first investment amount of ~$5MM (log at 15.42) would have an expected count from the count model of 1.34 additional investments (exp(2.27-.129*15.42)) and 7.8% structural zeros from the logistic regression component (plogis(2.27-.129*15.42)) for an overall expected count of 1.23 (92.2%*1.34) additional investments. A different company with a ~$10MM amount invested (log at 16.12) would have a predicted count from a negative binomial of 1.22 additional investments with a structural component inflating the number of zeros through the logistic regression by 5.1% for an overall expected count of 1.16.

This model shows how the number of additional investments would be expected to decrease with the increasing size of a first investment amount.

There does seem to be a relationship between the log first investment amount and the log amount of the additional amount invested across all investments by the select firms in the dataset (extended beyond and including the second investment amount).

A linear regression model seems to resemble the trends in how the log amount of additional dollars invested varies with changes in the log first investment amount.

Varying the predictor of the log first investment amount in the model from ~15.42 (~$5MM) to ~16.12 (~$10MM) predicts a log additional amount invested of 17.3 (~$31.6MM) and 17.7 (~$47.6MM) respectively.

This model has an R-squared of 0.323 and a root mean squared error of 0.961. It shows a positive relationship between how much is invested initially and how much went on to be invested in additional rounds.

Overall, an increased amount invested by the select investors in the database does not have a clear relationship with exit likelihood (though it overall seems to decrease slightly), predicts a decreased likelihood of an additional investment, predicts a lower number of expected additional investments, and predicts a higher exit value, a higher second investment amount despite predicting a lower second investment multiple, and a higher amount of additional capital invested.

--

--