Opening up the Feedly Digest Filtering Logic

This might sound a little bit cliche but the majority of the improvements we benefited from in the last 6 months can be directly or indirectly attributed to feedback we have received from users on GetSatisfaction or Twitter. One lesson we have learned is that the more transparent we are, the more interactions/feedback/suggestions we get.

So today we decided to go one step forward and open up the part of the feedly code which filters and sorts the recommendations which get displayed on the feedly digest page. Our hope/goal is that some of the feedly users who are also software developers can take a look at the algorithm and provide suggestions on how to increase the relevancy of the filtering. If we see interest from the community, we might even make this part of feedly pluggeable so that third parties can provide custom filters.

Here is the code (as of version 1.2.197):

// The entry passed here can be either a JSONEntry or a FakeEntry so we should NOT
// call any methods on the entry and only use the subset of information available
// in fake entry.
function scoreEntryPopulary( entry, context )
{
	var feedId = entry.feedId

        // PRUNE OUT ENTRIES WHICH ARE NOT GOING TO BE SELECTED ANYWAY.
	// We are not interested in recommending entries which belong to feed the user
	// does not subscribe to. This is the role of the suggestion component.
	var sub  = reader.lookupSubscription( feedId );
	if( sub == null )
	{
		entry.populariy = -1;
		entry.stamp = "trash: not subscribed";
		return;
	}

	var favorite = sub.favorite == true;
	var unreadCount = sub.unreadCount != null ? sub.unreadCount : 1001;

	// Try to forecast if the entry has been read or not.
	if( unreadCount == 0 )
	{
		entry.populariy = -1;
		entry.stamp = "trash: most likely read";
		return;
	}

	// TITLE AND URL UNICITY - Some feeds my publish duplicate articles
	// or we might have two entries for the same article. Here, we do our
	// best effort to remove those duplicates.
	if( entry.getAlternateLink != null )
	{
		var url = entry.getAlternateLink();
		if( context.uurl [ url ] != null )
		{
			entry.populariy = 0.00001;
			entry.stamp = "duplicate: based on link";
			return;
		}
		context.uurl [ url ] = true;
	}

	if( entry.getSourceTitle != null )
	{
		var extitle = entry.getTitle() + " -- " + entry.getSourceTitle();
		if( context.utitle [ extitle ] != null )
		{
			entry.populariy = 0.0001;
			entry.stamp = "duplicate: based on title";
			return;
		}
		context.utitle [ extitle ] = true;
	}

	// FRESHNESS
	// We want to promote a certain number of fresh articles into the miss.
	if( context.freshes == null )
		context.freshes = 0;

	// BUCKETS - what kind of entry is this? platinum, gold or silver.
	// The buckets are a metaphore we have put in place to help explain how
	// the filtering works. It has been helping us think about this problem
	// in simpler terms.
	if( context[ feedId ] == null )
		context[ feedId ] = { 	rank: 0,
					platinums: 0,
					golds: 0,
					silvers: 0,
					bronzes: 0
					}

	// DIVERSITY - the diversity factor helps us promote diversity
	// in each of the platinum, silver and browse bucket. Diversity
	// in terms of sources being represented.
	var platinum_diversity = Math.pow( 1.5, context[ feedId ].platinums );
	var silver_diversity   = Math.pow( 2,   context[ feedId ].silvers   );
	var bronze_diversity   = Math.pow( 4,   context[ feedId ].bronzes   );

	// RANK
	// we want to know if this entry is the 1st, 2nd,., nth for a specific source.
	var rank               = context[ feedId ].rank++;

	// FRESHNESS FACTOR - The idea behind the freshness factor is to actually
	// help emerging articles have a chance to compete with more mature
	// articles and make sure that there is some fresh content available.
	// [TODO We should probably make the current 4, 8, 24 hour windows
	// = function of the timeframe selected by the user]
	var delta = new Date().getTime() - entry.lastModifiedTime;
	var freshness = 1;
	if( delta < 4 * 3600 * 1000 )
		freshness = 6;
	else if( delta < 8 * 3600 * 1000 )
		freshness = 5;
	else if( delta < 12 * 3600 * 1000 )
		freshness = 3;
	else if( delta < 24 * 3600 * 1000 )
		freshness = 2;
	else
		freshness = 1;

	// SOCIAL RECOMMENDATION FACTOR. How popular is this article within the
	// list of people you are following in Google Reader (see entry.network)?
	// How popular is this article in the feedly community at large
	// (entry.metadata.starred).
	var social = ( entry.network != null ? entry.network.length * 3 : 0 )
		   + ( entry.metadata.starred || 0 );

	// VELOCITY - Try to determine is the source is producing a lot of articles or
	// not. This is our way to promote what we call gems: new articles from low
	// throuput, favorite sources.
	var velocity = "other";
	if( sub.unreadCount > 50 )
		velocity = "firehose";
	else if( sub.favorite == true && sub.unreadCount > 25  )
		velocity = "normal";
	else if( sub.favorite )
		velocity = "rare";

	// ASSIGN POPULARITY - For the outside world, the key is the value of popularity
	// and as you will see with browse articles, it does not have to be limited
	// to a finite set of value.
	// [TODO The limits we used to compare social recommendations should be relative
	// to the max number of recommendation per source per day - a number which
	// continuously grows as more and more user sign up and start recommending
	// articles]
	if( velocity == "rare" && rank == 0 && freshness >= 5 )
	{
		// platinum
		entry.popularity = 200000; // platinum
		entry.stamp = "platinum: fisrt recent from rare";

		context[ feedId ].platinums++
	}
	else if( velocity == "firehose"
		 && social * freshness / platinum_diversity >  30 )
	{
		// platinum
		entry.popularity = 200000; // platinum
		entry.stamp = "platinum: very popular from firehose";

		context[ feedId ].platinums++
	}
	else if( velocity == "normal"
	         && social * freshness / platinum_diversity >  18 )
	{
		// platinum
		entry.popularity = 200000; // platinum
		entry.stamp = "platinum: very popular from normal";

		context[ feedId ].platinums++
	}
	else if( context.freshes <= 5 && sub.favorite == true
	         && rank == 1 && freshness >= 5 && social > 0 )
	{
		// this is to make sure that there is always some fresh content on the page.
		entry.popularity = 100000; // gold
		entry.stamp = "gold: first, very recent from favorite";  

		context[ feedId ].golds++
	}
	else if( context.freshes <= 3 && sub.favorite == true
	         && rank == 1 && freshness >= 5 )
	{
		// this is to make sure that there is always some fresh content on the page.
		entry.popularity = 100000; // gold
		entry.stamp = "gold: first, very recent from favorite";  

		context[ feedId ].golds++
	}
	else if( velocity == "rare" && rank <= 1 )
	{
		// silver
		entry.popularity = 50000; // silver
		entry.stamp = "silver: recent from rare";

		context[ feedId ].silver++
	}
	else if( velocity == "firehose"
	         && social * freshness / silver_diversity >  10 )
	{
		// silver
		entry.popularity = 50000; // silver
		entry.stamp = "silver: popular from firehose";

		context[ feedId ].silvers++
	}
	else if( velocity == "normal"
	         && social * freshness / silver_diversity >  5 )
	{
		// silver
		entry.popularity = 50000; // silver
		entry.stamp = "silver: popular from normal";

		context[ feedId ].silvers++
	}
	else if( velocity == "rare" && rank > 0 && freshness >= 2 )
	{
		// bronze
		entry.popularity = 0; // bronze
		entry.stamp = "bronze: second recent from rare";

		context[ feedId ].silver++
	}
	else
	{
		entry.popularity = social / Math.pow( 1.5, rank );;
		entry.stamp = "bronze";

		context[ feedId ].bronzes++
	}

	if( freshness >= 5 )
		context.freshes++;
}

Note 1: If you open your feedly digest and hover your mouse over the title of an article, you should be able to see the stamp assigned to that article.

Note 2: This function is executed in real-time every time feedly needs to determine the content of the digest. A pre-filtering happens automatically based on how recent articles are, if they are provided in favorite sources or not, how many recommendations they have, etc.. Depending on the number of sources you are subscribing to, how popular they are, how many unread articles you have and the number of people you are following, the size of the list of entries to filter can vary between 200 to 2500.

This idea was inspired by a comment make by Sean McBride on Friendfeed last week: “I wish that every company in this game would make their algorithms transparent so that we could understand what’s underneath the hood in producing certain results” and the progress we made last week by working with Phil on this thread on Get Satisfaction. Special thanks to both Phil and Sean.

Thank you again to the feedly community for your support over the last 6 months. We look forward to your suggestions. Please let us know if you have any questions.

Author: @feedly

Read more. Know more.

8 thoughts on “Opening up the Feedly Digest Filtering Logic”

  1. I’m looking over the logic and I have a question. Does feedly track or have any access to user and feed statistics that might assist in selecting articles? I’m thinking of something like %read, #read, # of articles per day by feed for a couple of relevant time horizons. This information is on google reader, can feedly access it?

    In the current logic the user’s interests aren’t playing the role they might in influencing top articles.

    I’ll have some suggestions within the current constraints after I think about it a little more, but if this information is available it opens up a lot of possibilities.

  2. Hello Phil. Although we are not using it in the lates versions of the filtering logic, the subscription object has a field which captures how many articles of that subscription you have read and shared in the last 30 days. Note: the read number is not as reliable as the shared number because there are many ways to mark as read and some are less explicit as others.

  3. Edwin,

    I could make suggestions around the edges, but I couldn’t really figure a minor modification that achieved what I wanted. What do you think about something like the following? (though appropriate scaling would be needed):
    Popularity =
    (Fav. Factor)*(#Read/(Unread count+1))*(freshness – rank + social)

    Then select highest popularity articles. Max one article per feed until a minimum popularity cutoff is reached.

    Phil

  4. The algorithm being used seems to work ok, but it just doesn’t really do much better for me than to identify a set of relatively more interesting feeds with a recent post and give something from a few of them. It’s improved, but ideally would be smarter.

    There are a couple of minor issues with the algorithm being used.

    1) There are a couple of places where context[].silver is used instead of context[].silvers.

    2) The last context.[].silver should should probably be bronzes and why is the popularity put to zero. That’s worse than suspected duplicates.

    3) I notice that when feedly runs out of platinum I am more likely to get multiple articles from the same feed (silvers). I’m not really clear on the diversity factor but it looks like it works better in platinum than in silver. Do I read it correctly that there is less diversity required in lower rank buckets than in higher ones? I would think that there should be similar diversity until you get to bronze.

  5. Phil,

    Thanks for both comments.

    Will need to carv out some time to think more about the formula you suggest in your first comment.

    Regarding the second comment, fixed 1) and 2). Regarding 3) will try to increase the diversity factor for silvers and bronze to 3 and 6:

    var silver_diversity = Math.pow( 3, context[ feedId ].silvers );
    var bronze_diversity = Math.pow( 6, context[ feedId ].bronzes );

    The goal of the formula is to promote more diversity in the lower buckets (bronze and silver) than in the platinum one. The logic is that if quality is not there, let’s make sure that diversity is. All these changes will be included in the 1.2.200 patch we will be pushing out later this evening.

  6. Regarding the numbered suggestions above, I actually had a duplicate show up in a category top section today. So I suspect the part of (2) that noted that a bronzes’ zero would rank lower than a duplicate (at 0.00001) wasn’t entirely fixed.

Comments are closed.