Introducing: The crowdsourced 3D world reality model (let’s make sure we are ready for it!)

For those of you who are semi-regular readers of this blog, you know that I have been talking for several years about the exciting convergence of low cost reality capture technologies (active or passive), compute solutions (GPU or cloud), and new processing algorithms (KinFu to VSLAM). I am excited about how this convergence has transformed the way reality is captured, interacted with (AR), and even reproduced (remaining digital, turned into something physical, or a hybrid of both). I ultimately believe we are on the path towards the creation and continuous update of a 3D “world model” populated with data coming from various types of consumer, professional and industrial sensors. This excitement is only mildly tempered by the compelling legal, policy and perhaps even national security implications that have yet to be addressed or resolved.

My first exposure to reality capture hardware and reconstruction tools was in the late 90s when I was at Bentley Systems and we struck up a partnership with Cyra Technologies (prior to their acquisition by Leica Geosystems). I ultimately negotiated a distribution agreement for Cyra’s CloudWorx toolset to be distributed within MicroStation which we announced in late 2002. I remember that Greg Bentley (the CEO of Bentley Systems) strongly believed that reality capture was going to be transformative to the AEC ecosystem. As can be seen by their continuing investments in this space, he must continue to believe this, and it is bearing dividends for Bentley customers (active imaging systems, photogrammetric reconstructions, and everything in between)!

Fast forward to circa 2007 when Microsoft announced the first incarnation of Photosynth to the world at TED 2007 (approx 2:30 min mark). Photosynth stitched together multiple 2D photos and then related them together spatially (by back computing the camera positions of the individual shots and then organizing them in 3D space). Blaise Aguera y Arcas (then at Microsoft, now leading up Machine Intelligence at Google) showed a point cloud of Notre- Dame Cathedral (approx 3:40 min mark) generated computationally from photos downloaded from Flickr. One of the “by-products” of Photosynth was the ability to create 3D point clouds of real world objects.. Of course photogrammetric reconstruction techniques (2D photo to 3D) have been known for a long time – but this was an illustration of a cloud based service, working at scale, enabling a computational 3D reconstructions using photos provided by many. This was 11 years ago. It was stupefying to me. I immediately starting looking at all of the hacks to extract point clouds from the Photosynth service.  In 2014, a expanded version of the Photosynth 3D was launched, but it never achieved any critical mass. Even though Photosynth was ultimately shut down in early 2017, it was bleeding edge, and it was amazing.

It was likewise exciting (to a geek like me) when I was at Geomagic and the first hacks of the Microsoft Kinect (powered by the PrimeSense stack) began appearing in late 2010, and particularly when Microsoft Research published their KinectFusion paper (publishing algorithms for dense, real-time scene reconstructions using depth sensors). While there is no doubt that much of this work was built on giants (years of structure from motion and SLAM research), the thought that room sized spaces could be reconstructed in real-time using a handheld depth sensor was groundbreaking.  This was happening with the parallel rise of cheap desktop (and mobile) “supercomputer” like GPU compute solutions.  I knew the reality capture ecosystem had changed forever.

There has been tons of progress on the mobile handset side as well — leveraging primarily “passive” sensor fusion (accelerometer + computer vision techniques). Both Apple and Google (with their ARKit and ARCore, now released, respectively) have exposed development platforms to accelerate the creation of reality interaction solutions. I have previously written about how the release of the iPhoneX widely exposed an active scanning solution to millions of users in a mobile handset. Time will tell how that tech is leveraged.

I have long been interested in the crowd-sourced potential that various sensor platforms (mobile handsets, “traditional” DSLRs, UAVs, autonomous vehicles) will unlock. It was exciting to see the work done by Mapillary in using a crowd sourced model to capture the world using photos (leveraging Mapbox and OpenStreetMap data). Mapbox themselves recently announced their own impressive AR toolkit and platform called Mapbox AR — which provides developers with access to live location data from 300 million monthly users combined with geotagged information from 125 million locations, along with 3D DTM models, and satellite imagery of various resolutions.

I was therefore intrigued to read about (not much there on the website) which is emerging from Oxford’s Active Vision Lab. is building a reality-mesh platform for mobile devices leveraging ARCore and ARKit. Their solution will provide the necessary spatial context for AR applications  — it will create and store 3D reconstructions generated as a background process which will then be uploaded and merged with other contributions to fill out a crowdsourced reconstruction of spaces. While my guess a few years ago was this type of platform for near-scale reconstructions would have been generated on depth data generated from passive capture solutions (e.g. light field cameras), and not 2D image based, but it absolutely makes sense that for certain workflows this is absolutely the path forward – in particular when leveraging the reconstruction frameworks exposed in each of the respective handset AR toolkits.

It will be incredibly exciting in time to see the continuing progress and others will make in capturing reality data as a necessary predicate for AR applications of all sorts. We are consuming all types of reality data to create rich information products at Allvision, a new company that I have co-founded along with Elmer Bol, Ryan Frenz and Aaron Morris. This team knows a little bit about reality data — more on that to come in the coming weeks and months.

The era of the crowdsourced 3D world model is truly upon us – let’s make sure we are ready for it!

Leave a Reply

Your email address will not be published. Required fields are marked *

Unable to load the Are You a Human PlayThru™. Please contact the site owner to report the problem.