php - Pattern to match and replace words with upper and also lower case in them
I have this problem with eliminating meaningless words from a string, for example:
$string = "Hi, my name is Tom. jc2pMK NB,xVD NOZmF__u cYNdtR46eEb8y,74 Today i registered to stack overflow. krEBNB1cB8 cq7,zCL x5KOwwRZfU13.bI g_IXxlcztXYN , [email protected] I like IT. 0T1LAkuoPXscYC5uK6mlG R1nix_5kwF ,EKxXvT1 SjZYC4A6YQ 4E";
Now I want to be able to search and destroy those meaningless words from there, in PHP. I was trying
preg_replace($pattern, "", $string) but couldn't figure out a pattern for letting "Hi" stay there but deleting "jc2pMK". I bet this is an elementary procedure with strings, that every basic programmer should easily figure out, but I have no experience with regular expressions.
I am open minded about any other idea, how to get rid of the meaningless words.
If you want to solve this on the semantic level, you'd need a dictionary of some sort. A poor man's approach would be to do something like
This would load a dictionary into an array, split your string into an array and then create a diff to give you the words from your string that also exist in the dictionary. In the example's case, I used http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt for a dictionary which would result in:
The result will only be as good as your dictionary obviously. Also, the solution does not take casing into account. But it should give you an idea on how to approach the problem.
You'll find more sophisticated solutions in PHP's Human Language and Character Encoding Support, for instance with the Enchant and PSpell extensions, which allow you to spell check words against dictionary files.
As everyone else commented, you aren't defining what a "meaningless word" is so it's impossible to answer your question. But, a regular expression that would work ONLY for your example
$string, no guarantee for other strings, is the following:
Match (there's a space in front):
You can test it online at regex101.
Here's the equivalent PHP code snippet:
Again, this only a quick and dirty solution for your specific string.