How to Identify Duplicate and Similar Text

It's not a common problem but sometimes you have to check if 2 texts are similar. If you have to aggregate data from multiple sources you might know what I'm talking about.

Related Tutorials
Easily finding duplicate files
Some days ago I was asked by my mother if there was an easy way to find duplicate photos on her computer. I thought about it and I came up with the idea that the easiest way to do this is to just compare if some hash matches between the files (which works fine as long the images are not modified). Then came the implementation and I thought since I know PHP best for this job, why not use it. Now I know that PHP hasnt much of a reputation as a command line scripting language, but bear with me .
Prevent Duplicate Form Submission
Learn how to use PHP sessions to prevent duplicate form submission or re-submission.
15 regular expressions examples
Regular expressions are a very useful tool for developers. They allow to find, identify or replace text, words or any kind of characters. In this article, I have compiled 15+ extremely useful regular expressions that any web developer should have in his toolkit.
Session
The first time a user accesses to a our pages some connections and disconnections took place. During this process the server and the client will interchange information to identify each other. Due to this exchange of information our server will be able to identify a specific user and this information may be use to assign specific information to each specific client. This relationship between computers is call a session. During the time a session is active, it is possible to assign information to a specific client by using Session related commands.
Building an Opt-in Email List
This helpful tutorial shows you how to build an opt-in email list in PHP. It covers validating email addresses, checking for duplicate email addresses, adding to a mailing list, and changing a strings case.
Reading the clean text from RTF
Rich Text Format (often abbreviated as RTF), to surprise of many, is quite complex text data format. Let's read the plain text from RTF file.
Introduction To Cookies
A collection of information, usually including a username and the current date and time, stored on the local computer of a person using the World Wide Web, used chiefly by websites to identify users who have previously registered or visited the site.Cookies basically store information on a user, you must have seen those 'Remember Me' buttons on login forms, these forms set cookies on your hard drive with your username and password inside them (or any other information specified).
Basic Shoutbox
This tutorial is very similar to the news portal and a basic shoutbox.
Making summaries from larger bodies of text using substr
Many websites containing articles have a summary which leads you to the main article. PHP has a function for displaying a small section of the text, it is substr(). substr() has a number of uses, it can limit the amount of text by a set value...
Backtracing
If you need to debug a PHP script but do not have debugging tools at hand in your IDE or similar, an easy way to try to see what's happening and what functions are being called is to use PHP's backtracing tools. It can be useful also to include a backtrace when sending or logging errors that have occured on a production website.
 
Categories