Tuesday, October 26, 2010

SEO/SEM is no more enough. Get ready for NFO/NFM.


The advent of social networking tools & specially Facebook has changed the arena of online marketing.
Brands & websites which were worried about getting higher page ranks in popular search engines have now one more challenge which is popularly called as NFO – News Feed Optimization (in relevance to Facebook News Feed page).

Earlier a significant amount of online advertising budget of various online sites used to go to Search Engine Marketing initiatives, but now marketing managers also have to allocate money for one more channel which is called NFM – News Feed Marketing.

SEO
NFO
SEO is all about making Google and the other search engines think that you’re authoritative on a given topic and deserve to be listed highly in its search results
Whereas NFO is making Facebook think that the Feed items published from your application are more relevant to your followers & hence should be displayed in their News Feeds.


SEM
NFM
SEM is all about paying Google and the other search engines  for some particular keywords so to put your site link on the search result pages of those particular keywords
NFM is paying Facebook to insert your ad next to Feed items that it thinks are important or relevant to your product/service & to a very targeted audience who might have interest in it.

Until now people Search Engines were the most important tool to get visibility to users but not anymore. Social networking websites hold a very important place in their lives as social sites are where people spend most of their time.

People spend 3x time on Facebook than Google. 

For any marketer NFM is relatively easier to do as for this he/she just need to pay Facebook & they’ll put the News Feed at relevant places & guarantee a very certain quantity of guaranteed traffic.

But News Feed Optimization is a bit trickier & needs a lot more efforts & energy. Similarly to Google’s page rank algorithm, Facebook also has its own NewsFeed algorithm or an EdgeRank algorithm, which is not completely disclosed to public.

At a high level, the EdgeRank formula is fairly straightforward.

 Every item that shows up in your News Feed is considered an Object. If you have an Object in the News Feed (say, a status update), whenever another user interacts with that Object they’re creating what Facebook calls an Edge, which includes actions like tags and comments.

                                       NFO Edge Rank = edges e uewede

where
ue – affinity score between viewing user & edge creator
we – weight for this edge type (create, comment, like, tag, etc.)
de – time decay factor based on how long ago the edge was created

Moreover, gaming the News Feed is going to be harder than gaming Google’s PageRank algorithm because of the personalized nature of Feed item selection. Because so many components of FeedRank depend on individual user behavior, there is only so much you can do as an application developer to boost your Feed item’s score across the board.
The best one can do is to design rich, engaging Feed items that convert well & are viral in nature.
Over & above it there are various Feed item elements like title, body & images which should be optimized & made really interesting. Also the feed application should be designed in such a way that it leads to some sort of provocative action from the user & make him install & share the application.

Well in my personal opinion, the battle for News Feed Optimization has yet to take its true colors as marketers are still struggling to use Facebook as an effective marketing tool for their products & services. But of course as a smart marketer, one must be aware of the nitty gritties of this field.

Sunday, October 10, 2010

How fast can you scale?

The first & foremost thing that any popular web application on Internet today tries to achieve is to provide its user a memorable & seamless UI experience. And do we realize what this great popularity on Internet means in terms of traffic or usage or pure numbers.
Just to bring your attention to some usage statistics on few popular sites

-  Twitter hits 1 billion queries per day & see 50 million tweets per day which means 600 tweets per second
-  Facebook gets 60 million status updates per day & 2.5 billion photos are uploaded on it each month
-  Google is processing 2 billion search queries per day
-  YouTube get 5 billion video streams every month & gets 15 hours of video uploaded every minute
-  Flickr now hosts more than 4 billion images

But do you think it's so easy for all these websites to handle these millions & billions of user requests each day & still give a seamless & uninterrupted browsing experience. 
This thought made me really curious to study the web architecture of some of these websites that see massive traffic every day. And what I found was that all of these have almost similar kind of architecture with minor differences that has resulted in so robust & efficient systems.

The very backbone of all these website is this very popular LAMP architectureLAMP is an acronym for a solution stack of freeopen source software, originally coined from the first letters of Linux (operating system),Apache HTTP ServerMySQL (database software) and PHP/Perl/Python scripting languages. The software combination has become popular because it is free of cost, open-source, and therefore easily adaptable. Moreover almost every distribution of Linux includes Apache, MySQL, PHP, and Perl, so installing the LAMP software is almost as easy as saying it.

Now to appreciate the most sophisticated web architecture one must should also be aware of simple & less efficient solutions as well. So let me talk about different solutions around this basic LAMP architecture.


1. One “Box” Solution


- Basic Web application
- Low Traffic
- Apache/MySQL/PHP on one machine
- Bottlenecks are Disk I/O and Context     switching 







2. Two "Box" Solution

- Higher traffic application
- Apache/PHP on box A and MySQL on box B 
- Bottlenecks like Disk I/O, Network I/O 







3. Many "Boxes" Solution with Replication

- Yet even Higher traffic
- Apache/PHP on box A & MySQL on many boxes
- Writes are separated from reads. Web Servers have read/write ratio of somewhere between 80/20 and 90/10.
- Master gets Insert/Update/Delete & slaves get Select
- Load Balancing is used

Bottleneck
- Slaves can’t keep up with replication as they are too busy Reading (production traffic) and   Writing (replication)
- So this manifests as Comments/photos/any user entered data doesn’t show up on the site right away. So users will repeat the action thinking that it didn’t happen the first time, making situation worse.



4. Many Boxes Solution with Hardware Load Balancing MySL
- Standard MySQL master/slave replication
- All writes (inserts/updates/deletes) from application go to Master 
- All reads (selects) from application go to a load balanced VIP (virtual IP) spreading out load across all slaves



Benefits of Load balancing
 - Add/remove slaves without affecting applications
- Additional monitoring point & some automatic failure handling 
Capacity planning lot easier if the ceiling of each slave is known.

Some more add-ons or tweaks
- Web Server (machine with APACHE/PHP) can also be scaled with up with multi-threading.
- Increase App Servers too with a load balancing mechanism along with DB Clusters. (Each cluster has Master-Slave DB’s of their own).
- Add caches to your App Server. Cache your static content. SQUID is good.
- Also have Memcached which is distributed memory caching solution.
- Use RAID10 instead of RAID5 as it has more read capacity & less write penalty
- Use interleaving memory
- Choose SCSI hard drive over SATA
- Use MySQL with SAN (Storage Area Network). SAN is better than RAID.
- Use CDNs (Hello Akamai)


Hence if we look at an overall picture of any good & robust architecture behind any website, we can easily generalize into something I've shown below.