Friday, April 3, 2009
Interview with Brad Wilson and Shion Deysarkar, 80legs
Our interview today is with Houston-based 80legs (www.80legs.com), a service which is developing a service to offload Internet web crawling and distributed applications. We spoke with Shion Deysarkar and Brad Wilson, to learn more about the firm and its software.
Explain what your service does?
Shion Deysarkar: Basically, what we're looking to do, is to build a platform for web-scale applications--any application that needs to look at data across a large part of the Internet. We've created a platform that helps you to run those application in a very cheap, and very fast way. Right now, if you think about it, there are 100 of millions, maybe billions of pages on the Internet. So, doing that is either really expensive, or really slow. You can either do something with a few machines you have yourself, and maybe some local bandwidth, but if you try to do that--obviously--the performance you get trying to access those millions of pages is really slow. Your other option is to invest in expensive bandwidth and servers in a data center. If you go that route, you get better performance--but you'll end up spending a whole lot of money. We're cutting through those two options, and provide both low cost, and very good performance.
I think what will happen, is people who are trying to build an application, but are stuck because they can't afford the right performance, can now use our platform and scale their applications. To throw out some interesting examples that we thought that might be cool to run on our service, is to search content in a video or several videos across the Internet, things like speeches, keynote addresses, etc.--and you can't do that right now with Google. That's because anyone interested in doing that would have to spend a lot of computation time to access the videos, and that's expensive and time consuming. What they can do now, is to analyze a piece of video and extract the content out of it, they can upload it to 80legs, run across the videos, and create their own search engine for video.
Who are the customers for this?
Shion Deysarkar: There is a pretty wide variety of customers. When we first started, it was obviously search engines. There are many alternative search engines, Kosmix and others, doing search indexing. Those are prime customers. The other week, a couple of companies approached us to verify placements of ads across networks. Again, there's video and media analysis--if you want to find out what kinds of songs and movies are out there on the Internet. Also, I don't know if you've used Google Alerts, or Backtype, or other alert trackers or discussion trackers, but those have to search a large breadth of information and have fast updates. We can help them out. One thing we're excited about, is we're reducing barriers of entry to access Internet data, and other new, interesting ideas and companies are emerging because of us.
How did the company come about?
Shion Deysarkar: Actually, we have a sister company called Plura Processing. Plura is a distributed grid computer--we're able to gather a whole lots of nodes across the Internet for computing processes. We were thinking about how we can use all this computer power. We originally thought about creating a customized web crawler application. Our team has a lot of computer science and technical background.
How is it you can provide these resources without spending millions of dollars on infrastructure, and what makes your grid cheaper to operate?
Shion Deysarkar: Basically, the computer power we use comes from nodes in Plura's network--which is made up of people's home computers all over the world. Plura integrates with a couple of desktop applications, like a pretty popular chat client. People can opt in, and help support the chat client by allowing Plura to run. When they have excess or idle bandwidth and computing power, we can use that for our own purposes.
What's the advantage of this over using something like Amazon web services?
Shion Deysarkar: There's a couple of things. First of all, if you look at Amazon Web Services, you'll see there is a maximum number of nodes you can reserve for yourself. That number is something like 1000, or maybe 3000 nodes. That is not really enough to do web-scale analysis. Our system is about 50,000 nodes available to our customers. That's a level of performance which is in an entirely different ballpark. Our costs are also more competitive than Amazon Web Services. We charge $2 per million pages crawled and 3 cents per CPU hour for analysis. If you look at AWS, if you do a reserved instance, you will get 3 cents per CPU hour, but usually you'll be spending at least 10 cents per CPU hour. We'll be cheaper for most people. On top of that, we've already done the hard stuff, in terms of crawling and doing data store, if you wanted to on a cloud you'd have to re-do all that stuff, or hack together some different technology--it's not easy to do. We've taken care of the hard problem of storing content from the Internet.
How far are you to offering your service?
Shion Deysarkar: We've been working on the technology, and trying to get to launch in March/April. We've been going 110 percent trying to get things out the door. Next week, we should be launching.
How are you funded?
Brad Wilson: We're completely funded by Creeris Ventures. Our company name is Computational Crawling, and 80legs is our product name. We have basically been funding the entire venture, both Plura and Computational Crawling, out of Creeris.
Can you talk a little bit more about how you created your network?
Brad Wilson: Plura has nodes from a wide variety of sources, from web sites and desktop apps. The basic technology is this: we take someone that offers a free service or website to users, and we offer to pay them to embed Plura into their project. In exchange, Plura and their user's computers become part of a grid, when their computers are idle. In exchange, the users of those applications get access to the application for free. Some people think it's a little nefarious, but it's been very well received both by users and the applications and web sites that we integrate our technology with. There's many examples of where it's disclosed to users, and they've been happy that the project will then receive additional funding.
With all the recent fear over botnets, do you have any issues and concerns about your network?
Brad Wilson: It's actually very dissimilar. We don't have control of a user's computer, because we're only running in a protected Java sandbox. We can't send spam, we can't touch a user's disk, and we can't see the user's data. Java protect them from all of those things, and we couldn't do anything bad if we wanted to. We can only use the computation power of those computers.
There's lots of issues with coordination of parallel processing, how do you handle those issues?
Brad Wilson: Shion mentioned this earlier, but one of the biggest reasons for going to use 80legs over a service in the cloud, is that we're cheaper, faster, and have more capacity. However, the biggest capacity we have is that users don't have to deal with the mechanics and hard stuff. They just give us the crawl parameters, and they just have to write the code that would process one page. If you think of it, if someone is going to do some form of image analysis to find pictures that look like you, all they have to do is write a function in Java to do the comparison. They don't have to think about distributed computing, bandwidth, web crawling, making sure there are not duplicate pages, making sure they're crawling the right page at the right time, and making sure they're not taking down the web site from hitting it too much. They don't have to think about it, just their one little area of specialty.
Finally, with all the big companies in this area -- Amazon, Google, etc. -- why did you decide you'd get into this market?
Brad Wilson: This is completely different than anything out there. If somebody like Google, Yahoo, or Microsoft wanted to do create something like this, they'd have to spend hundreds of millions--and by some reports--billions--on data centers to actually replicate the power and bandwidth we are making available through 80legs. You'd have to spend an order of 100 to 200 million dollars in a data center, to start to get the performance we can offer. It's not something somebody is just going to be able to turn on a switch and do, and there's only a few others in the world with the capacity to do what we do. This is really opening up the door for all the small guys, who have been thinking about semantic search, text analysis, or who are looking of ways to apply technology, and not just to limit themselves to something like a subset of Wikipedia. We're opening up the game.