I want to use PHP to search through a directory of txt files for a particular ID that may appear in multiple instances.

When the ID appears there will always be a statement like "Found an XML file" that appears before it, and "Closing XML file" after it. These represent the 'start' and the 'finish' of the section I want to copy.

I would then like to copy this section out to another text file. This would replace the process of me grepping through the files for an ID, then manually copying out the relevant sections.

In pseudo code my idea is;

while(parsing text file)
  if (current line == search_ID)
    loop for "Found an XML file"
    start copying
    loop for "Closing XML file"
    output string to txt file

So my question is how would I loop "upwards" from the search ID until the nearest "Found an XML file" is found?




What you want to do is read the entire file contents in as a single string, then split it up based on what you find in it. As follows:

// Read the contents of the file into $file as a string
$mainfilename = "/path/to/file.txt";
$handle = fopen($mainfilename, "r");
$file = fread($handle, filesize($mainfilename));

/* $file contains your file contents
 * $findme contains "Found an XML file"
 * $splitter contains "Closing XML file"

// We only do anything if the string "Closing XML file" is inside the file
// in a place other than at the beginning of the file
if (strpos($file, $splitter) > 0) {

    // Break up $file into pieces by splitting it along "Closing XML file"
    $parts = explode($splitter, $file);

    // Traverse the newly-formed pieces
    foreach ($parts as $part) {

        // If we have "Found an XML file" contained in this piece of the file
        if (strpos($part, $findme) !== false) {

            // Split up our smaller string around "Found an XML file"
            $foundparts = explode($findme, $part);

            // The last piece will always contain the filename,
            // but only if there are two or more pieces
            // i.e. something between the strings
            if (count($foundparts) > 1) $filename = array_pop($foundparts);
            /* Do whatever you want with $filename */ 

What this will do is, supposing$file == "Closing XML file gibberish goes here Found an XML file garbage Found an XML file filename.xls Closing XML file more gibberish":

  1. Check to make sure Closing XML file is present within$file in a place other than at the start - it is, near the end.
  2. Split$file into pieces:$parts = ['', ' gibberish goes here Found an XML file garbage Found an XML file filename.xls ', ' more gibberish']
  3. Traverse$parts looking for instances of "Found an XML file" -$parts[1] has it
  4. Split$parts[1] into pieces:$foundparts = [' gibberish goes here',' garbage ', ' filename.xls ']
  5. If there are at least 2 pieces in$foundparts, 'pop' the last element off of$foundparts, as that will always be the one that contains the filename
  6. You now have the filename in$filename, to do with as you please

Note: these functions are case-sensitive, so if you also want to find instances of "Found an xml file" (with xml being lowercase), you'll need to do some string conversion to all lower-case for all of$file,$splitter, and$findme




// Ex: OPA_4636367.xml
foreach(glob("*.txt") as $file) {
    $file_designation = explode('_', $file);
    if ($file_designation[0] == 'OPA') {
        // XML found
        // Do file_get_contents($file) or whatver

