Ramteen Talib
- CAPTCHA is a powerful tool
- Qingkun Ramteen Talib
- A CAPTCHA or Captcha is a type of challenge-response test used in to computing o ensure that the response is not generated by a computer.
- CAPTCHAÂ requires that the user type letters or digits from a distorted image that appears on the screen.
- Any user entering a correct solution is presumed to be human else user is bot and denied access.
- It is sometimes described as a reverse Turing test.
- OCRs(Optical Character Recognition) are not able to read CAPTCHAs
A CAPTCHA is a means of automatically generating new challenges which:
- Current software is unable to solve accurately.
- Most humans can solve
- Does not rely on the type of CAPTCHA being new to the attacker.
- CAPTCHAs rely on difficult problems in artificial intelligence
- First developed by Alta Vistain 1997.The term coined in 2000 by Luis von Ahn, Manuel Blum and Nicholas J. Hopper of Carnegie Mellon University and John Langford of IBM.
- Primitive CAPTCHAs seem to have been developed in 1997 by Andrei Broder, Martin Abadi, Krishna Bharat, and Mark Lillibridge to prevent bots from adding URLs to their search engine.
- Proposed by Alan Turing
- To test a machine’s level of intelligence Human judge asks questions to two participants, one is a machine, he doesn’t know which is which, If judge can’t tell which is the machine, the machine passes the test.
- CAPTCHA employs a reverse Turing test,
- Judge = CAPTCHA program ,
- Participant = user
- if user passes CAPTCHA, he is human
- If user fails, it is a machine
- Types of CAPTCHAs
- 1.Text Based CAPTCHAs
- 2.Graphics Based CAPTCHAs
- 3.Audio or Sound Based CAPTCHAs
- Text Based
- Typically relay on sophisticated distortion of text images rendering them unrecognizable to the state of the art of the pattern recognition programs but recognizable by humans.
- Examples:
- Simple, normal language questions:
- What is sum of three and thirty-five?
- If today is Saturday, what is day after tomorrow?
- Very effective, needs a large question bank
- Cognitively challenged users find it hard .
- Originally designed by Yahoo and CMU.
- Based on human ability to read heavily distorted and corrupted text.
- works by choosing a certain number of words from a dictionary, and then displaying them corrupted and distorted in an image; after that Gimpy asks the user to type the words displayed in that image.
- A modified version of Gimpy.
- Used in Yahoo Messenger Service.
- It contains only one random character string.
- The word is random and not picked from the dictionary.
- Its not a good implementation of CAPTCHA, and already broken OCRs.
- MSN Passport service CAPTCHAs:
- its provided for Microsoft MSN services.
- uses 8 characters.
- Warping is used to distort.
- Its very strongly implemented and hasn·t been broken
- Graphic Based CAPTCHAs
- Requires user to perform image recognition test.
Examples:
- IMAGINATION:
- CAPTCHA that requires two steps to be passed.
- first step visitor clicks elsewhere on the picture that composed of a few images and selects in this way a single image.
- second step the selected image is loaded. It is enlarged but very distorted. Also variants of the answer are loaded on the client side. The visitor should select a correct answer from the set of the proposed words.
- BONGO:
- After M.M.Bongard, pattern recognition expert.
- User has to solve a pattern recognition problem.
- ASSIRA:
- Animal Species Image Recognition for Restricting Access.
- It’s a HIP that works by asking users to identify photographs of cats and dogs.
- Difficult for computers but humans can accomplish it very quickly and accurately.
- Audio CAPTCHAs
- Require user to solve a speech recognition test.
- In this version of captcha letters are read aloud instead of being displayed in an image.
- Helps visually disabled users
- Below is the Google’s audio enabled CAPTCHA.
- 3D CAPTCHA
- 3DCaptcha is the “captcha nice to humans, bad to machines”.
- It is written in PHP.
- A new approach to captchas, using human’s spatial cognition abilities to differentiate humans from machines.
- It uses a markov-chain to generate words that resemble human language and are easy to type, yet avoid dictionary lookups.
- It filters profane language.
- It’s easy to deploy.
- Re-CAPTCHA
- Free CAPTCHA service that helps to digitize books, newspapers and old time radio shows.
- reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the
- Web in the form of CAPTCHAs for humans to decipher.
- Each word that cannot be read correctly by OCR is placed on animage and used as a CAPTCHA.
- This is possible because most OCR programs alert you when a wordcannot be read correctly.
Working of reCAPTCHA:
- Two words are shown, one word is known as Control Word, and another one is known a questionable word.
- System assumes that if human types the control word correctly, the questionable word is also correct.
- The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point.
- Once a given identification hits 2.5 votes, the word is considered called.
Applications
- Preventing Comment Spam in Blogs
- Protecting Website Registration
- Protecting Email Address From Scrapers
- Online Polls
- Preventing Dictionary Attacks
- Search Engine Bots
- Worms and Spam
- Advancing Artificial Intelligence
- Called Hard-AI problems.
- CAPTCHA tests are based on open problems in artificial intelligence (AI).
- A win-win scenario:
- Either a CAPTCHA is not broken and there is a way to differentiate humans from computers.
- Or the CAPTCHA is broken and an AI problem is solved. Thus AI knowledge is advanced if CAPTCHAs are broken.
Things to keep in mind:
- Don’t store CAPTCHA solution in Web page’s meta data
- A CAPTCHA is no good if it doesn’t distort
- Need a large database of different CAPTCHA questions
- Avoid repetition of questions
- Generate the question
- Persist the correct answer
- Present the question to user
- Evaluate answer, if incorrect, start again-Generate a different CAPTCHA
- If correct, allow access to user
- Guidelines:
- Accessibility
- Image security
- Script security
- Security after widespread adoption
- Custom implementation or a general CAPTCHA?
- Breaking CAPTCHAs
- Cracking CAPTCHAs through programs
- Convert CAPTCHA into greyscale
- Detect patterns in the image corresponding to characters
- Or, read session files of that user and know the CAPTCHA word
- Solution: Only store a hash of the CAPTCHA word in session files
Usability issues
- W3C mandates
- Web to be accessible to all people
- Some CAPTCHAs are inaccessible to visually impaired, cognitively challenged people
Compatibility issues
- JavaScript may need to be activated in browsers
- Some may need Adobe Flash plugin installed
Conclusion
- CAPTCHAs are an effective way to counter bots and reduce spam
- They serve dual purpose²help advance AI knowledge
- Applications are varied²from stopping bots to character recognition & pattern matching
- Some issues with current implementations represent challenges for future improvements