CrunchBase: Using Crowdsourced Data for Commercial Purposes

For those of you who don’t know about CrunchBase (@crunchbase), it is a crowdsourced database of information about startups, people and investors.  Crunchbase describes themselves as “the free database of technology companies, people, and investors that anyone can edit. Our mission is to make information about the startup world available to everyone and maintainable by anyone.”  AOL acquired Crunchbase and TechCrunch in 2010 from Michael Arrington.

Crunchbase has been very successful in sourcing data, and have established strong relationships with many of the leading venture capital firms who regularly share data about their portfolio companies (fundraising, people, etc.).  CrunchBase has even developed an Excel Data Exporter, in addition to its API access, to allow for the broader distribution of the information contained in its databases.

The current Crunchbase Terms of Service, Privacy Policy,and Licensing Policy govern the use and access of Crunchbase data.

As of the date of this blog, the Licensing Policy provides that:

We permit anyone to republish our content in accordance with this licensing policy.

We provide CrunchBase’s content under the Creative Commons Attribution License [CC-BY]. Our content includes structured data, overviews and media files associated with companies and people. Our schema, and documentation are also offered under the Creative Commons license.

We ask that API users link back to CrunchBase from any pages that use CrunchBase data. We want to make sure that everyone is able to find the source of the content to keep the service up-to-date and accurate.

This Licensing Policy may be updated from time to time as our services change and grow. If you have any questions about this policy please contact us at

CrunchBase provides a specific licensing contract for services that charge for the use of their data. Contact

The CrunchBase Terms of Service provide further restrictions on how the API maybe used:

We provide access to portions of the Site and Service through an API thereby enabling people to build applications on top of the CrunchBase platform. For purposes of this Terms of Service, any use of the API constitutes use of the Site and Service. You agree only to use the API as outlined in the documentation provided by us on the Site.

 On any Web page or Application where you display CrunchBase company or people results, each page must include a hypertext link to the appropriate company or person profile Web page on Additional CrunchBase Branding Requirements can be found on the following Web page: CrunchBase may grant exceptions on a case-by-case basis. Contact us for special branding requests, which must be approved in advance in writing.

CrunchBase will utilize commercially reasonable efforts to provide the CrunchBase API on a 24/7 basis but it shall not be responsible for any disruption, regardless of length. Furthermore, CrunchBase shall not be liable for losses or damages you may incur due to any errors or omissions in any CrunchBase Content, or due to your inability to access data due to disruption of the CrunchBase API.

CrunchBase reserves the right to continually review and evaluate all uses of the API, including those that appear more competitive than complementary in nature.

CrunchBase provides a specific licensing contract for services that charge for the use of their data. Contact

CrunchBase reserves the right in its sole discretion (for any reason or for no reason) and at anytime without notice to You to change, suspend or discontinue the CrunchBase API and/or suspend or terminate your rights under these General Terms of Service to access, use and/or display the CrunchBase API, Brand Features and any CrunchBase content.

I previously reviewed various licensing schemes, including the Creative Commons scheme, in a two part earlier blog series The Call for a Harmonized Community License for 3D Content where I proposed a harmonized “community” type license for content which could be produced on 3D printers (arguing that the existing license types do not “fit” for content which can mix copyright, patent, trade dress and other rights)

For those of you who are not aware, the CC-BY license type is a very broad license grant – providing for the “maximum dissimentation of licensed materials”.  You can find the existing CC license types here and specifically the summary of CC-BY license.

Crunchbase was careful to make clear that uploaded material which they link or provide along with the company information might be licensed differently (e.g. not under the CC-BY license) and specifically made clear that:

The graphical layout of the CrunchBase website and other elements of the Site, Content or Service not described above are the copyright of CrunchBase, and may not be reproduced without permission.


Enter Pro Populi and People+

Pro Populi, a small three person startup, has been developing applications utilizing the CrunchBase dataset, one app called People+.  Pro Populi has apparently been accessing the CrunchBase data (originally via the API, but also through other means apparently) to populate their own database of content and then accessing that content (and other content) from their applications.

Wired (@Wired) reporter David Kravets (@dmkravets) broke the story on November 5th in a story titled AOL Smacks Startup for Using CrunchBase Content It Gave Away.  If you click through the link to the original Wired article, you can review some of the correspondence gathered by David Kravets in support of the story.

Pro Populi was served with a cease and desist letter from AOL (the parent company of CrunchBase).  Quoting from the Wired article, an AOL Assistant General Counsel apparently sent the following in an email to the Pro Populi CEO after a meeting with the President of CrunchBase last Friday:

On the chance that you may have misinterpreted Matt’s willingness to discuss the matter with you last week, and our reference to this as a ‘request,’ let me make clear, in more formal language, that we demand that People+ immediately cease and desist from its current violation and infringement of AOL’s/TechCrunch’s proprietary rights and other rights to CrunchBase, by removing the CrunchBase content from your People+ product and by ceasing any other use of CrunchBase-provided content.

But if CrunchBase didn’t want to allow others to use the data, why does it license its content under the CC-SA scheme?

Hopefully CrunchBase and Pro Populi can come to an agreement which works for both of them and their interests.

While CrunchBase can likely legitimately claim to restrict access to their content via their API (licensed separately, not covered by the CC-SA scheme, and with separate terms), once content covered by the CC-SA license has been accessed and copied in a manner consistent with CC-SA, can CrunchBase assert rights to “get it back?”  That seems to be an incredibly difficult road to hoe, and inconsistent with the very broad terms of the CC-SA license grant.  Worse yet, according to the Wired article, the General Counsel of the Creative Commons Corporation doesn’t think so.  The Electronic Frontier Foundation represents Pro Populi.


CrunchBase could have stayed within the CC license scheme and chosen a different CC license type for the underlying data – including one which specifically prohibits the use of the content for commercial purposes, which prohibits the creation of derivative works, and which requires specific attribution to them.  That license type is CC BY-NC-ND.  On a case by case basis they could have authorized/waived the restrictions contained in the license.  CrunchBase could have also changed the license grant for content accessed via the API.   This is solvable.

For an interesting view of this dispute from TechCrunch (a sister company to CrunchBase), see their take on the dispute.


Impact On Other “Hybrid” Commercial Use of Crowdsourced Data?

While CrunchBase and Pro Populi resolve their dispute, I am most interested in thinking about how this potentially impacts other crowdsourced data platforms and the applications built on top of them.   It is an interesting dilemma and question – how can/should crowdsourced data platforms be able to commercially benefit from their efforts – including restricting other potential competitors from a copying of data for their own purposes – commercial or otherwise?  Sourcing, filtering, vetting, editing, organizing, etc. hundreds, thousands and millions of data points is a complex undertaking.  It takes time, effort, people and ultimately money.  Unless that vetting is also done from a crowdsourced perspective (or mostly so – like the Wikipedia model), allowing potential competitors (commercial or otherwise) to copy that structured content is a potential death knell.  In that instance, openness needs to be balanced against a commercial purpose.

CrunchBase President Matt Kaufmann blogged about the CrunchBase dispute with Pro Populi and the EFF.   He essentially acknowledges the challenge of openness in the context of trying to build a commercial business – but re-affirms his belief that CrunchBase thought they restricted the use of their data (via the API or otherwise) for commercial purposes under their current licensing terms.

[T]o invest in CrunchBase’s constant improvement requires building a business around CrunchBase in a way that successfully takes into account our terms of service and our openness. We are confident that this is possible, and that’s what we are on the path to figuring out.

This is of course the challenge – adding enough value in the stack above the “open” content that can be commercialized.   As an example, take a look at MapBox – MapBox is a cloud-based platform which allows for developers to embed geo rich content into their web and mobile offerings.  They recently took $10M from Foundry Group, and I blogged about that investment – MapBox, Geo Software Platform, Maps $10M from Foundry Group.

MapBox relies on data sourced from OpenStreetMap, the “free wiki world map.”    OpenStreetMap licenses its content in two ways – the underlying data is licensed as open data under the Open Data Commons Open Database License (ODbL) while the cartography and documentation are licensed under the CC BY-SA license, the same license selected by CrunchBase).  BTW, Kevin Scofield likes the MapBox interface too.

It would be a difficult commercial business model indeed for MapBox to go through the effort of building an infrastructure to help source, collect and organize all kinds of mapping data, which was open for other uses, as well as building an application layer on top of it.   MapBox instead focuses on creating a great platform layer on top of the otherwise “open” content (others are free to do so as well).  This model works because there is enough community interest to support an undertaking like OpenStreetMap to begin with.  Can the same be said for the data underlying CrunchBase?

2 thoughts on “CrunchBase: Using Crowdsourced Data for Commercial Purposes”

Comments are closed.