Tech Support Websites

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 13 August 2012

How YouTube, Wikipedia use machine moderation for crowdsourced content

Posted on 20:03 by Unknown

Search engines rely on bots to index web pages but did you know Wikipedia uses more than 700 active bots to keep its content clean! Wikipedia which is 50 times larger than the Encyclopaedia Britannica currently has 4,005,000 articles in the English edition and 22.8m articles in 285 language editions. The bots delete vandalism and foul language, organise and catalogue entries, and handle the reams of behind-the-scenes work that keep the encyclopaedia running smoothly and efficiently. 

BBC News Magazine has a neat writeup on what these bots do -

  • "Interwiki" bots link articles on the same subject in different languages
  • Flag potential copyright violations and other irregularities for human review
  • Add dates to "cleanup" tags so human editors know what needs attention
  • Add articles to category lists, and lists of categories to articles
  • Format and repair citations and references
  • Compare ISBN numbers
  • Flag images that need more licensing details
  • Behind the scenes:
  • Maintain Wikipedia archives
  • Handle evidence in arbitration and administrative matters

Bots have been around almost as long as Wikipedia itself.
The site was founded in 2001, and the next year, one called rambot created about 30,000 articles - at a rate of thousands per day - on individual towns in the US.
The bot pulled data directly out of US Census tables. The articles read as if they had been written by a robot. They were short and formulaic and contained little more than strings of demographic statistics.
But once they had been created, human editors took over and filled out the entries with historical details, local governance information, and tourist attractions.
In 2008, another bot created thousands of tiny articles about asteroids, pulling a few items of data for each one from an online Nasa database.
ClueBot NG, as the bot is known, resides on a computer from which it sallies forth into the vast encyclopaedia to detect and clean up vandalism almost as soon as it occurs.

YouTube relies on its automated copyright detection system to verify if an uploaded video is in fact posted by the owner. It compares each upload against all the reference files in their database.

The scale and speed of this system is truly breathtaking -- we're not just talking about a few videos, we're talking about over 100 years of video every day between new uploads and the legacy scans we regularly do across all of the content on the site. And when we compare those 100 years of video, we're comparing it against millions of reference files in our database. It'd be like 36,000 people staring at 36,000 monitors each and every day without as much as a coffee break.

The official documentation explains how the system works -
If Content ID identifies a match between a user upload and material in the reference library, it applies the usage policy designated by the content owner. The usage policy tells the system what to do with the video. Matches can be to only the audio portion of an upload, the video portion only, or both.
There are three usage policies -- Block, Track or Monetize. If a rights owner specifies a Block policy, the video will not be viewable on YouTube. If the rights owner specifies a Track policy, the video will continue to be made available on YouTube and the rights owner will receive information about the video, such as how many views it receives. For a Monetize policy, the video will continue to be available on YouTube and ads will appear in conjunction with the video. The policies can be region-specific, so a content owner can allow a particular piece of material in one country and block the material in another. 
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in DidYouKnow | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Windows 8 keyboard shortcuts
    Win + X   - context menu to access common features like Control Panel, Task Manager, File Explorer, Programs & Features, Run, Search etc...
  • My 30-day personal project - watch 100 hours of Pluralsight videos
    Inspired by Matt Cutts' 30-day challenges , I plan to watch 100 hours of Pluralsight online videos to get up-to-date with current ...
  • Archive tweets & favorites with Google Reader
    There are numerous ways to archive tweets but rather than use yet another tool, I prefer using Google Reader to preserve the tweets I mark ...
  • Azure in Pictures - overview of Windows Azure Features, Services and Common Uses
    Download the Windows Azure Poster in PDF format (1.1MB)
  • What is the difference between Browser Mode & Document Mode in IE
    If you're a web developer and your job actively involves building web pages that work the same in all browsers including the last three ...
  • Dashboard-like info with Browser tabs, Windows 7 Taskbar tabs
    Browser tabs & Windows 7 Taskbar tabs are turning self-aware.  This is how my browser looked the other day: I had the summary of all tha...
  • 10 ways to make laptop battery last longer
    Paraphrased from a Right Choice magazine article with my own opinions: Keep the brightness of the screen as low as possible. If portability ...
  • India ranks fourth in Internet usage
    Deepak Shenoy informs that as per Telecom Regulatory Authority of India (TRAI) data, India has nearly 10 million Broadband connections in ...
  • HOW TO let Google watch over your web activity
    When we get onto the Internet, we trade our privacy for convenience. Everyone from marketers, ISPs to Governments can watch our activities o...
  • What's common between Kovid Goyal & Antony Lewis?
    Kovid Goyal Antony Lewis They are both PhDs in Physics and creators of free software applications that have been downloaded by millions. Kov...

Categories

  • AJAX
  • Android
  • APIs
  • App
  • ASP
  • ASP.NET
  • ASP.NET-MVC
  • Azure
  • Azure SQL Database
  • AzureInPictures
  • Bing
  • Book Review
  • Bookmarklet
  • Browsers
  • C#
  • chart
  • Chrome
  • Cloud
  • CSS
  • CSS3
  • DidYouKnow
  • E-Commerce
  • Excel
  • FB
  • Fiddler
  • Firefox
  • Gadgets
  • GeoLocation
  • GMail
  • Google
  • Google Docs
  • Google Reader
  • Health
  • Hotmail
  • HOWTO
  • HTML
  • HTML/CSS
  • HTML5
  • Humor
  • Hyderabad
  • IE
  • IIS
  • India
  • Internet
  • IT
  • Javascript
  • jQuery
  • JSON
  • JSONP
  • Laptop
  • Learning Resources
  • Lists
  • Map
  • Metrics
  • Microsoft
  • miscellaneous
  • Mobile
  • NAPA
  • Office365
  • Opera
  • PDF
  • Performance
  • Personal
  • PHP
  • PM
  • PowerShell
  • Privacy
  • Programming
  • Rant
  • Safari
  • Science
  • Search Engines
  • SearchEngines
  • Security
  • SEO
  • Sharepoint
  • SharePoint2013
  • Silverlight
  • Software Engineering
  • Solutions
  • SQL Azure
  • SQL Server
  • TFS
  • Tip
  • Tips
  • Tools
  • Tools/Utilities
  • Trivia
  • TWIL
  • Twitter
  • UX
  • VM
  • VS.NET
  • VS2010
  • VS2012
  • WCF
  • WebApps
  • Websites
  • WF
  • Windows Phone
  • Windows7
  • Windows8
  • Word
  • WP7
  • WPF

Blog Archive

  • ►  2013 (112)
    • ►  October (16)
    • ►  September (14)
    • ►  August (8)
    • ►  July (8)
    • ►  June (13)
    • ►  May (12)
    • ►  April (12)
    • ►  March (8)
    • ►  February (15)
    • ►  January (6)
  • ▼  2012 (127)
    • ►  December (11)
    • ►  November (14)
    • ►  October (13)
    • ►  September (14)
    • ▼  August (16)
      • Store your videos on SkyDrive and Google Drive & a...
      • HOW TO make Google skip redirection step on clicki...
      • What is Inbound Marketing?
      • IE10 - first browser to auto-correct text while yo...
      • "Hidden" features in Google products
      • HOW TO quickly save all URLs in tabs within a IE o...
      • "The Best Programming Advice I Ever Got"
      • Windows 8 keyboard shortcuts
      • Which Windows 8 edition is right for me?
      • QR Codes and MS Tags can connect books to online m...
      • How YouTube, Wikipedia use machine moderation for ...
      • Browse Feeds hands-free with Google Reader Play
      • HOW TO prevent Skype from auto-starting after Wind...
      • HOW TO download multiple documents in a SharePoin...
      • Free 42 episode video series on HTML5, CSS3, JavaS...
      • Twitter Search Tricks
    • ►  July (16)
    • ►  June (6)
    • ►  May (5)
    • ►  April (11)
    • ►  March (12)
    • ►  February (7)
    • ►  January (2)
  • ►  2011 (98)
    • ►  December (5)
    • ►  November (2)
    • ►  October (5)
    • ►  September (7)
    • ►  August (7)
    • ►  July (15)
    • ►  June (10)
    • ►  May (7)
    • ►  April (8)
    • ►  March (10)
    • ►  February (11)
    • ►  January (11)
  • ►  2010 (163)
    • ►  December (14)
    • ►  November (19)
    • ►  October (19)
    • ►  September (15)
    • ►  August (18)
    • ►  July (17)
    • ►  June (20)
    • ►  May (17)
    • ►  April (19)
    • ►  March (5)
Powered by Blogger.

About Me

Unknown
View my complete profile