Members: Join   Log In
Conv Primer: Webhits and Publication Dates
by Brittain on Dec 19, 2006 - 01:21 PM read 1268 times
Source: http://www.kalivo.com/convs/show/376
External


Purpose and Methodology 

The publication date on a webhit can be the source of some confusion, this primer addresses these dates.

The webhit below shows a webhit published on "Nov 02, 2006".

pd.GIF

But what does this mean?  The answer, unfortunately, is "depends on the channel that located the webhit".  Because there is no standard way of date stamping a web page, Kalivo applies several techniques to determine the publication date of a webhit:

  • First, the Kalivo Listener queries the channel for the publication date. 
    • The Yahoo channel returns a date for each webhit; however, this date represents the date the web page entered the Yahoo index, not the date the page may have been originally published.
    • Most feeds return actual publication dates; however, these dates can vary slightly depending on the source (e.g. some feeds fix the publication date, others change it if the content is subsequently updated).  Some feeds provide no publication date.
    • The Google channel, while not returning the date of each webhit, will in fact only return webhits that match a date range you supply in the Listener.
    • The Risk Center and GARP channels return publication dates by parsing the web page.
  • Second, if the channel does not return a date, Kalivo attempts to examine the web page meta-data (formerly known as the "HTTP Header") for a publication date.  This method results in an actual publication date for less than 40% of web pages.
  • Finally, if no publication date has yet been found, the publication date is set to today.

Implications and Recommendations 

Simply put, the result of this methodology means the publication date has variable accuracy and you should take this into account when managing webhits.

Kalivo offers the following recommendations:

  • For highly date sensitive uses, use Feeds or restrict Listeners to the Yahoo, GARP or Risk Center channels.  These channels accurately search and return publication dates.
  • For uses that require accurate search intervals (e.g. only listening for webhits of the past month), feel free to include the Google channel.
  • If you encounter a webhit that you'd like in your Hub, you can always update the publication date by editing the conversation (via the Advanced Options section):

advanced_options.gif

  • And lastly, visiting the webhit source (by clicking the source link or the URI title, as seen above) is always an alternative when concerned about a webhit's publication.

Featured

Project ITR
Project CBS
Project LIM
Wiki Archive
Concours Archive

Author Profile

Moderator

Subscribe

Feed for nGenera Community:
Feed_small Public Secure_feed_16 Secure

Why subscribe? What is RSS?