Imaging and Social Technology

What I’m going to show you first as quickly as I can is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Sea Dragon and it’s an environment in which you can either locally or remotely interact with vast amounts of visual data. We’re looking at many, many gigabytes of visual photos here that seamlessly into and continuously zooming into the thing, rearranging it any way we want. And it doesn’t matter how much information, we’re looking at how big these collections are or how big the images are. And most of them are ordinary digital camera photos. But this one for example is a scan from The Library of Congress and it’s in the 300 megapixel range.

Coolest Imaging and Social Technology EVER!

It doesn’t make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It’s also very flexible architecture. This is an entire book so this is an example of non-image data. This is Bleak House by Dickens. Every column is a chapter. To prove to you that’s it’s really text and not an image we can do something like so to really show that this is really a representation of the text. It’s not a picture. Maybe this is a kind of artificial way to read an eBook, I wouldn’t recommend it. This is a more realistic case. This is an issue of The Guardian. Every large image is the beginning of a section and this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper which is inherently a kind of multi-scale medium.

We’ve also done a little something with the corner of this particular issue of The Guardian. We’ve made up a fake ad that’s a high resolution, much higher than you’re able to get in an ordinary ad and we’ve embedded extra content, if you want to see the features of this car you can see it here, or other models or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We really hope that this means no more pop-ups and no other kind of rubbish like that should be necessary. Of course mapping is one of those really obvious applications for technology like this. And this one I really won’t spend any time on except to say that we have things to contribute to this field as well. But – those are all the roads in the US superimposed on top of a NASA geospatial image. So let’s pull up something else – so this is actually live on the Web now, you can go check it out. This is a project called photosynth which really marries two different technologies; one of them is Sea Dragon, and the other is some very beautiful computer vision research done by Noah Snavely, a graduate student at the University of Washington co-advised by Steve Seitz at [UDub] and Rick Szeliski at Microsoft Research; a very nice collaboration. And so this is live on the Web. It’s powered by Sea Dragon and you can see that when we do kind of these sorts of views where we can dive through images and have this kind of multi resolution experience. But the special arrangement of the images here is actually meaningful.

The computer vision algorithms have registered these images together so that they correspond to the real space in which these spots – all taken near Grasse Lakes in the Canadian Rockies – all these shots were taken. So you see the elements here of stabilized slideshow or panoramic imaging and these things have all been related spatially. I’m not sure if I have time to show you any other environments; there are some that are much more spatial. But I’d like to jump straight to one of Noah’s original data sets. And this is from an early prototype; a photosynth that we first got working in the summer to show you what I really think is the punch line behind this technology, the photosynth technology. And it’s not necessarily apparent from the pictures we’ve put up on the Web site. We had to worry about the lawyers and so on. This is a reconstruction Notre Dame Cathedral that was done entirely computationally from images scraped from flicker. You just type Notre Dame into flicker and you get some pictures of guys in t-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all flicker images and they’ve all been related spatially in this way and we can just navigate in this very simple way. [Applause]. Thank you.

You know I never thought that I’d end up working at Microsoft;  but thoughts become things & it’s very gratifying to have this kind of reception here. But – so this is – this has lots of different types of cameras. This has everything from cell phone cameras to professional SLR’s. Quite a large number of types get into this environment if I can find some of the sort of weird ones; so many of them are included by faces and so on. Somewhere in here there’s actually a – there are a series of photographs – here we go. This is actually a poster of Notre Dame that registered correctly. Okay so if we can dive in from the poster to a physical view of this environment. So what the point here really is, is that we can do things with the social environment – this is now taking data from everybody, from the entire collective memory of visually what the earth looks like and link all of that together, all of those photos become linked together and they make something emerge that’s greater than the sum of the parts.

You have a model that emerges of the entire earth, that’s the long tail to Steven Lawler’s virtual earth work. And this is something that grows in complexity as people use it and whose benefits become greater to the users as their own photos are getting tagged with metadata that somebody else entered. If somebody bothered to tag all of these saints and say who they all are then my photo of Notre Dame Cathedral suddenly gets enriched with all that data and I can use it as an entry point to dive into that space since the metaverse, using everybody else’s photos and do a kind of cross-modal and cross-user social experience that way. And of course a by-product of all of that is immensely rich virtual models of every interesting part of the earth collected not just from overhead flights and from satellite images and so on but from the collective memory. Thank you so much.

So do I understand this right that what you’re software is going to allow is that at some point, really within the next few years, all of the pictures that are shared by anyone across the world are going to basically link together.

Yes, what this is really doing is discovering, it’s creating hyperlinks if you will, between images and it’s doing that based on the content inside the images and that gets really exciting when you think about the richness of the semantic information that a lot of those images have. Like when you do a Web search for images, you type in phrases and the text on the Web page is carrying a lot of information about what that picture is of. Now what if that picture links to all of your pictures, now the amount of semantic interconnection and the amount of richness that comes out of that is really huge. It’s a classic network effect.

That is truly incredible. Congratulations.

Thank you so much.