CMS - the future!

Smart Art started the CMS Wales website in May 2008 in an effort to encourage website owners to switch to CMS as a way to manage the materials they wish to publish on the Internet. Publishing with a CMS is fast, simple and inexpensive, especially when compared to old-fashioned "static" websites. A whole new page and menu item can be added to a CMS site in minutes!
Home News General News Archives aided by anti-spam tools
Archives aided by anti-spam tools PDF Print E-mail
News - World and Local
Monday, 18 August 2008 11:34

Crumbling texts and books are being digitised thanks to anti-spam tools. To thwart spammers many websites force visitors to transcribe obscured words or characters before they get access. Now instead of random words many sites are taking text from old books and documents that have been scanned by character reading software. The words supplied are those the software cannot read but humans can, helping to complete the conversion of old texts to digital form.

Site seeing

The obscured text systems are called Captchas (Completely Automated Public Turing test to tell Computers and Humans Apart) and are widely used by websites to stop scammers and spammers exploiting them to send out junk mail or harvest addresses.

It is estimated that Captcha schemes are used about 100 million times every day.

Created by Luis von Ahn at Carnegie Mellon University in Pittsburgh, the Recaptcha project scoops up words that optical character reading software has marked as unreadable by computers.

In some documents, where ink has faded and paper has yellowed, the character reading software can flag up to 20% of words as indecipherable.

The hard-to-read words are then farmed out to the many thousands of sites that have signed up to be Recaptcha partners.

Words are supplied to sites along with a control word that aims to ensure the person answering is human.

The responses to the obscured text are added to a database and particularly mangled text will be put before several people to ensure it is read accurately.

Reporting in the journal Science the Recaptcha team says the scheme is about 99.1% accurate - as good as professional transcribers and beyond the limit demanded by archivists.

About 40,000 sites have signed up to use words supplied by Recaptcha and it now collects about four million responses every day.

In the last year it has helped resolve more than 440 million words and has just helped to complete the conversion of the entire archive of the New York Times from 1908 into digital form.

Source: BBC

Trackback(0)
Comments (0)Add Comment

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

security code
Write the displayed characters


busy
Last Updated on Monday, 18 August 2008 11:39