Online editing will take a decade, but its time for sharing office docs

With all the hype around Google docs, it’s good to finally see some stats about their market penetration. TechCrunch today reported results of a survey conducted by NPD: A 73% of Americans have never heard of web-based Office suite (e.g., Google Docs), 94% have never tried one, and only 0.5% have actually switched to one. The survey results don’t surprise me. In fact, I would have guessed that even fewer people would have tried / heard of a web-based Office suite.

In a year of running SlideShare, we have realized how particular people are about the end look & feel of their Office documents (especially their presentations). If SlideShare does not render an image or a font, or messes up some graphics/charts, then our users tell us about it!. We get complaints about the particular shade of purple, and the title that does not look quite right. Makes sense, people work hard on their documents and they want the finished product to have a certain look and feel. I have tried several online Office authoring apps and while they are great when I need to collaborate during the creation process, the latency is annoying when you are working on the document yourself, and the application feels hopelessly limited when you are in the final (production phases). I have used Google Docs & Zoho for word and spreadsheet documents with some success, but was not able to make much headway with Google Presently because the look and feel are much more important for presentations (and no, I don’t see Presently as competition – reasons are described below).

This has been our hypotheses to begin with – that the tools and mindset are not quite there for large-scale shift to online authoring of Office documents. But there is no such barrier to sharing of documents online. SlideShare was the first office document sharing site on the web. We started with the premise that people want a quick and easy path to sharing their presentation documents. And so far, it seems like people do want to do that (look at SlideShare stats below).

Continue reading

Ratio of creators to viewers for SlideShare

I was just running some stats for SlideShare and realized that the ratio of creators (people who upload slideshows) to viewers (who visit SlideShare.net) is just south of 1%. This fits in well with what Bradley Horowitz’s Content Production Pyramid described, with some caveats.

First Bradley also talks about the synthesizers. I have yet to calculate those numbers for SlideShare. However, SlideShare is an active bookmarking community (we have 2.7 tags per slideshow), so those numbers are likely to be meaningful. But a lot of the synthesis is also happening on the web. As people link to and embed slideshows, they add metadata about those slides. Some of the metadata is captured on SlideShare (e.g., we links back to all the embds). But a lot of it cannot be captured easily.

Secondly, the number of viewers is probably an underestimation in an era of widgets. Slideshows are embedded all over the web. Each embed leads to more views which our system does not directly capture.

Google Analytics and problems with AJAX, Flash sites

I spent part of my weekend trying to understand Google Analytics (GA) – mostly why GA shows such low engagement metrics for SlideShare. Every other measure tells us engagement is much higher. Finally I figured out the reason: we use a lot of AJAX and flash, and our media files are served from Amazon S3. So you can view a slideshow for half hour on Slideshare, you can comment, favorite and do many other activities. And none of them would get recorded on Google Analytics which is only recording page to page movement, and only for actions that happen on SlideShare.net (all the slide activity on Amazon S3 is not being captured!).

We started using GA recently and just did an out of box install. To give it credit, GA is very convenient, and rapidly becoming a standard for site statistics. But its out of the box install does not account for the way many modern websites work.

– Distributed Infrastructure: File serving from Amazon S3 is common
– Flash based for media files
– Lots of AJAX for on the page interaction

After spending time on the problem (including reading this book), we have figured out workarounds for most of the the issues. And while GA is flexible enough to accommodate us, it does not make it easy. Out of the box, it seems set up for old school HTML pages where you move from page to page, rather than mini-actions within page. Also, many of the options seem to be for ecommerce sites (tracking steps through an ecommerce funnel etc.) rather than for social (Web 2.0 to use a cliched term!) sites.

My advise – if you are going to use Google Analytics, spend some time upfront to understand how to customize both the analytics code and your own site. Don’t begin collecting data before you do that, or you will get a very biased picture of your site. Also, to do it right, you will end up integrating with GA much more deeply than simply placing some javascript in your pages.

Running a data-driven social software site

SlideShare began with an idea. We built it on instinct, launched it. People liked it, it grew. Our active users wrote to us, blogged, sent us feedback about what they liked or did not like. We took that into account as we planned features.

Its great to listen to active users, but it can bias you towards the superuser. As our userbase is growing, we need to take into account the different user segments, the people who don’t blog and write to us.

There is only way to keep in mind all the user types, to end the endless debates within the company about what we need to do next. And that is to be driven by data. Period.

Continue reading