I would like to check if every word in a text file exists in any "LINES" of another large dictionary text file.

Every way I have tried this has failed, or worked only briefly.

How can I do without a million nested loops?




I'm answering this way too often. But a regex would avoid much of the looping.

// get words
preg_match_all(':\p{L}{2,}:u', $text_file, $words);
$words = end($words);

// make a search regex  "abc|foobar|xyz|text|.."
$rx_words = implode("|", $words);

// find all words that exist on a line
preg_match_all(':^($rx_words)$:', file_get_contents("LINES"), $cmp);

// everything found if:
$found_all = !array_diff($cmp[1], $words);

Reading in the wholeLINES file can be avoided with some extra coding. But I wanted to keep it simple here.




Psuedocode If you have enough memory:

for each line in text file:
   break line into words
   for each word in line:
       $wordMap[lowercase($word)] = 1;

for each line:
   break line into words
   for each word:
       if $wordMap[lowercase($word)] == 1:
          line has word $word

If you don't have enough memory for $wordMap, then make $wordMap some sort of database. You might also try a bloom filter (,

