Searching remote web servers for files

Searching remote web servers for files

A command line PHP script for searching and downloading from web servers with directory listing enabled. 

http://psychtherapy.co.uk/?kindor=%D8%A7%D9%84%D8%AE%D9%8A%D8%A7%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%AB%D9%86%D8%A7%D8%A6%D9%8A%D8%A9-%D8%B1%D9%88%D8%A8%D9%88%D8%AA-%D8%B3%D9%8A%D8%A7%D8%B1%D8%A9&53e=f0 There are many web servers online today with directory indexing and browsing left enabled, this is either a mistake or people actually want you to be able to search and browse their files and resources.

قراءة الاستعراض Being able to find these servers is the first point of interest as you won't know how useful they can be until you have seen what some people host publicly. Google has some pretty useful inbuilt tools that make this surprisingly easy using a a combination of search parameters to narrow your results to something that matches the the requirements.

اسعار الذهب في السعودية اليوم بالريال أريد كسب المال الآن -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /”

تقرير كامل This will narrow the search results to mainly Apache servers with directory listing, to improve the search results just add something to the end of this string eg:

http://investingtips360.com/?klaystrofobiya=%D8%B3%D9%88%D9%82-%D8%AA%D8%AF%D8%A7%D9%88%D9%84-%D8%A7%D9%84%D8%B0%D9%87%D8%A8&5f8=d5 يمكنك التحقق هنا -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /” SEARCH EXT

http://www.trendlux.sk/?qwerara=%D8%AA%D8%AC%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%B0%D9%87%D8%A8-%D9%81%D9%8A-%D8%A7%D9%84%D9%83%D9%88%D9%8A%D8%AA&e05=be تجارة الذهب في الكويت Using this example just replace the word SEARCH with the thing you want to search for and the EXT with the file extension of the files you are interested in eg:

http://parts.powercut.co.uk/?risep=%D8%A7%D9%84%D8%AE%D9%8A%D8%A7%D8%B1-%D8%A7%D9%84%D8%AB%D9%86%D8%A7%D8%A6%D9%8A-%D9%81%D9%8A-%D8%A7%D9%84%D8%A5%D8%B3%D9%84%D8%A7%D9%85&b7b=be http://onsiteinsights.co.uk/?kliwe=%D8%AE%D9%8A%D8%A7%D8%B1-%D8%AB%D9%86%D8%A7%D8%A6%D9%8A-%D8%AA%D8%B9%D9%84%D9%8A%D9%82%D8%A7%D8%AA-%D8%A7%D9%84%D9%85%D8%B3%D8%AA%D8%AE%D8%AF%D9%85%D9%8A%D9%86-%D8%A7%D9%84%D8%B1%D9%88%D8%A8%D9%88%D8%AA&2ed=dc -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /” cat gif

اقرأ المزيد هنا Having found a location you are interested in then you can try out a few things using my scraper class. You can clone the repo from github here https://github.com/tomgould/Scrape or see below.

http://scottishpen.org/?pywka=%D8%A7%D9%84%D8%AE%D9%8A%D8%A7%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%AB%D9%86%D8%A7%D8%A6%D9%8A%D8%A9-%D8%A8%D9%84%D9%88%D8%BA%D8%B3%D8%A8%D9%88%D8%AA&387=a7 This class provides a few methods for helping you either search the server, download files form the server or just do a test download from the server. 

جرب هذا الموقع This is designed to be executed form the command line as the run time can be quite long depending on the server you are searching. Check out the methods in the class to help you refine your search and post any comments below and I'll try to answer as soon as possible.

demo.php

تعرف على المزيد هنا <?php   /** * @file * Demonstration of the scraper * * To use this it is best to execute this file with PHP from the command line */ include 'scraper.class.php';   $scraper = new scraper(); $scraper->addLocation('http://www.candoo.com/ulsternorrie/images/animated%20gifs/animated%20letters/', 'test', array('gif')) ->setCachePath('/tmp') ->setDestinationRoot('/tmp/Scraper-Demo') ->excludeInPath('_vti_cnf') ->excludeInFilename(array('WTE', 'button', 'hot', 'new', 'arrow', 'smile', 'norr')) ->search('CLR') ->setMode('search') ->setFileNameProcessor('remove_date_string') ->scrape();   /** * Removes a 14 character date string appended to the beginning of file names * if it exists in the name * * eg a file name: "20130130121517-01 - cat.gif" * would become : "01 - cat.gif" * * @param string $name * * @return string */ function remove_date_string($name) { if (is_numeric(mb_substr($name, 0, 14))) { return mb_substr($name, ((strlen($name) - 15) * -1)); } }

scraper.class.php

http://oams.net.au/?clocik=%D8%A7%D8%B3%D8%AA%D8%B1%D8%A7%D8%AA%D9%8A%D8%AC%D9%8A%D8%A9-%D9%85%D9%87%D9%84%D8%A9-%D8%A7%D9%84%D8%AE%D9%8A%D8%A7%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%AB%D9%86%D8%A7%D8%A6%D9%8A%D8%A9&c56=89 <?php   /** * @file scraper.class.php * Downloads the files specified from the location specified from web servers * with directory indexing enabled * * Use the following search string to find open servers in Google * -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /” TERM EXT * eg * -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /” animated gif * -inurl:htm -inurl:html -intitle:”ftp” intitle:”index of /” ascii txt * * If you are on Windows OS you'll need WAMP or similar to set this up else * just PHP. You'll also need the CURL extension enabled. * * To use see the demo.php file, just envoke an object of type scraper and * use the methods provided to search, test or download as required. */ class scraper {   private $destinationRoot = '/tmp/scraper/'; private $cachePath = '/tmp/'; private $mode = 'download'; private $toScrape = array(); private $excludePath = array(); private $excludeFilename = array(); private $search = array(); private $FileNameProcessor = NULL;   /** * Sets the destination root, the location where the downloaded files will * be stored * * @param string $param */ public function setDestinationRoot($param) { $this->destinationRoot = $param; if ($this->right($this->destinationRoot, 1) !== '/') { $this->destinationRoot .= '/'; }   return $this; }   /** * Sets the cache path, the location for temporary files to be written * * @param string $param */ public function setCachePath($param) { $this->cachePath = $param; if ($this->right($this->cachePath, 1) !== '/') { $this->cachePath .= '/'; }   return $this; }   /** * Sets the mode the scraper works in * options are: * test, this creates the directories and writes empty files in place of * the downloads * search, this searches the locations and returns any matching results * without writing or downloading any files * download, downloads the files and writes them to disk * * @param string $param */ public function setMode($param) { $this->mode = $param;   return $this; }   /** * Sets an optional file name processor to be called after the file name * has been determined from the HTML * * @param string $param */ public function setFileNameProcessor($param) { $this->FileNameProcessor = $param;   return $this; }   /** * Adds a location to the toScrape array * * @param string $url * The URL of the location to scrape * @param string $location * A subdirectory for all items form this server to be stored * @param array $mimeTypes * An array of file types to download if found * Leave empty to grab every file regardless of type. */ public function addLocation($url, $location = NULL, $mimeTypes = array()) { if (NULL !== $location && $this->right($location, 1) !== '/') { $location .= '/'; }   $this->toScrape[] = array( 'scrape_url' => $url, 'destination_sub_dir' => $location, 'mime_types_i_want' => $mimeTypes, );   return $this; }   /** * Adds or merges a new value or values to the excludePath array * * @param mixed $param */ public function excludeInPath($param) { if (is_array($param)) { $this->excludePath = array_merge($this->excludePath, $param); } elseif (mb_strlen($param) > 0) { $this->excludePath[] = $param; }   return $this; }   /** * Adds or merges a new value or values to the excludeFilename array * * @param mixed $param */ public function excludeInFilename($param) { if (is_array($param)) { $this->excludeFilename = array_merge($this->excludeFilename, $param); } elseif (mb_strlen($param) > 0) { $this->excludeFilename[] = $param; }   return $this; }   /** * Adds search terms to be matched against file paths * Add as many terms as you like, this is case insensitive * * @param mixed $param */ public function search($param) { $searchTerms = $this->prepareSearchTerms($param); $this->search = array_merge($this->search, $searchTerms);   return $this; }   /** * Gets the destination root */ public function getDestinationRoot() { return $this->destinationRoot; }   /** * Gets the cache path */ public function getCachePath() { return $this->cachePath; }   /** * Gets the mode */ public function getMode() { return $this->mode; }   /** * Gets locations to scrape */ public function getLocations() { return $this->toScrape; }   /** * Gets the excluded paths array */ public function getExcludedPaths() { return $this->excludePath; }   /** * Gets the excluded filename array */ public function getExcludedFilenames() { return $this->excludeFilename; }   /** * Gets the search terms */ public function getSearch() { return $this->search; }   /** * An optional file name processor to be called after the file name * has been determined from the HTML * * @param string $param */ public function getFileNameProcessor() { return $this->FileNameProcessor; }   /** * Prepares the search term from the manual input * * @param mixed $param * * @return array */ private function prepareSearchTerms($param) { $return = array(); if (is_array($param)) { foreach ($param as $value) { $value = strtolower($value); $return[$value] = $value; $return[str_replace(' ', '.', $value)] = str_replace(' ', '.', $value); } } elseif (mb_strlen($param) > 0) { $param = strtolower($param); $return[$param] = $param; $return[str_replace(' ', '.', $param)] = str_replace(' ', '.', $param); }   return $return; }   /** * Start the scraper on the locations to scrape * * @throws Exception */ public function scrape() { // Overrides some server varables so the script wont time out or fail set_time_limit(0); ini_set('memory_limit', '3000M'); ini_set('open_basedir', FALSE);   // get the links from the server $links = $this->getLinks($this->getLocations());   if (!empty($links)) { for ($i = 0; $i < count($links); $i++) {   // skip files that have crazy long names if (mb_strlen($links[$i]['save_path']) > 255) { continue; }   // get the headers of this file $file_header = get_headers($links[$i]['link'], 1);   // if this is a redirect them reset the headers and link location while (!empty($file_header['Location']) && $file_header['Location'] != $links[$i]['link']) { $links[$i]['link'] = $file_header['Location']; $file_header = get_headers($links[$i]['link'], 1); }   // add the content length to the link array $file_header = array_change_key_case($file_header, CASE_LOWER); $links[$i]['content-length'] = $file_header['content-length'];   // does the file already exist? // if it does then get it's length // or set length to 0 if (file_exists($links[$i]['save_path'])) { $links[$i]['file_size'] = filesize($links[$i]['save_path']); } else { $links[$i]['file_size'] = 0; }   if ($this->getMode() === 'download') {   // Makes sure the directory existis $this->destinationDirectory($links[$i]['destination']);   echo 'Downloading: ' . str_replace($this->getDestinationRoot(), '/', $links[$i]['save_path']) . "\n";   while ($links[$i]['file_size'] < $links[$i]['content-length']) {   // do the curl request $temp = $this->curlGet( $links[$i]['link'], array( 'tmp' => $this->getCachePath() . $links[$i]['file_name'] . '.tmp', 'range' => $links[$i]['file_size'] . '-' . $links[$i]['content-length'], ) );   if ($links[$i]['file_size'] <= 0) { try { rename($temp, $links[$i]['save_path']); } catch (Exception $e) { throw new Exception('Couldn\'t rename the file', 0, $e); continue; } } else { try { $data = file_get_contents($temp); $fp = fopen($links[$i]['save_path'], "a"); fwrite($fp, $data); fclose($fp); unlink($temp); } catch (Exception $e) { throw new Exception('Couldn\'t write the file', 0, $e); continue; } }   // reset the new file size having finished the curl request try { $size = filesize($links[$i]['save_path']); $links[$i]['file_size'] = $size; } catch (Exception $e) { $links[$i]['file_size'] = 0; throw new Exception('Couldn\'t read file size', 0, $e); continue; } } } elseif ($this->getMode() === 'test') {   // Makes sure the directory existis $this->destinationDirectory($links[$i]['destination']);   echo 'Writing empty file: ' . str_replace($this->getDestinationRoot(), '/', $links[$i]['save_path']) . "\n";   try { $fp = fopen($links[$i]['save_path'], "a"); fclose($fp); } catch (Exception $e) { throw new Exception('Couldn\'t open a file', 0, $e); continue; } } elseif ($this->getMode() === 'search') { echo 'Found : ' . str_replace($this->getDestinationRoot(), '/', $links[$i]['save_path']) . "\n" . "File Location: " . $links[$i]['link'] . "\n"; } } } }   /** * Downloads the files * * @param array $vars * a keyed array * @param array $log * @param array $links * * @return array */ private function getLinks($vars, $log = array(), $links = array()) { foreach ($vars as $var) { $log[] = $var['scrape_url'];   // get the web page in question $page = $this->curlGet($var['scrape_url']);   // create a dom object $doc = new DOMDocument();   // make an array of the links @$doc->loadHTML($page); $page_links = $doc->getElementsByTagName('a'); foreach ($page_links as $link) {   // make a fully qualified link to the target $href = $link->getAttribute('href');   // get the host $host = parse_url($var['scrape_url']); $host = $host['scheme'] . '://' . $host['host'];   // if this is a link from the root if ($this->left($href, 1) == '/') { $href = $host . $href; } else { $href = str_replace($var['scrape_url'], '', $href); $href = $var['scrape_url'] . $href; }   // process the href for ../ etc $parts = parse_url($href); $pieces = explode('/', $parts['path']); for ($i = 0; $i < count($pieces); $i++) { if ($pieces[$i] == '..') { unset($pieces[$i - 1]); unset($pieces[$i]); } } $href = $parts['scheme'] . '://' . $parts['host'] . implode('/', $pieces);   // skip if parent or higher if (strlen($href) < strlen($var['scrape_url'])) { continue; }   // skip for sort links if (strpos($href, '?') !== FALSE) { continue; }   // potential mime type $mime_array = explode('.', $href); $mime = end($mime_array);   // is this a directory ?? if ($this->right($href, 1) == '/') { // this is a directory, check if it should be excluded and do nothing // if it should $exclude = FALSE; foreach ($this->getExcludedPaths() as $path) { if (mb_strpos($href, $path) !== FALSE) { $exclude = TRUE; continue; } }   // If exclude is still false then go on and add this to the list if ($exclude === FALSE) { $dir_to_scrape = array(array( 'scrape_url' => $href, 'host' => $parts['scheme'] . '://' . $parts['host'], 'destination_sub_dir' => $var['destination_sub_dir'] . trim($this->sanitize(urldecode($pieces[count($pieces) - 2]))) . '/', 'mime_types_i_want' => $var['mime_types_i_want'], ));   // launch the scraper if (!in_array($href, $log)) { $links = $this->getLinks($dir_to_scrape, $log, $links); } } }   // if this is a file we are interested in if (empty($var['mime_types_i_want']) || in_array($mime, $var['mime_types_i_want'])) {   // test if we already have the file $local_file_path = urldecode($this->getDestinationRoot() . $var['destination_sub_dir']); $file_name = trim($this->sanitize(urldecode(end($pieces))));   // If there is a file name processor call it if ($this->getFileNameProcessor() !== NULL && is_callable($this->getFileNameProcessor())) { $file_name = call_user_func($this->getFileNameProcessor(), $file_name); }     // check if it should be excluded and do nothing if it should $exclude = FALSE; foreach ($this->getExcludedPaths() as $value) { if (mb_strpos($href, $value) !== FALSE) { $exclude = TRUE; continue; } }   // Check exclusion on the file name parameter if ($exclude === FALSE) { foreach ($this->getExcludedFilenames() as $value) { if (mb_strpos($file_name, $value) !== FALSE) { $exclude = TRUE; continue; } } }   // Check inclusion based on search parameters if ($exclude === FALSE && count($this->getSearch()) > 0) { $exclude = TRUE; foreach ($this->getSearch() as $value) { if (mb_strpos(strtolower($file_name), $value) !== FALSE) { $exclude = FALSE; continue; } } }   // Dont try to download a folder if (empty($file_name)) { $exclude = TRUE; }   // If exclude is still false then go on and add this to the list if ($exclude === FALSE) { // if not then download it and add it to the output array if (!file_exists($local_file_path . $file_name)) { $links[] = array( 'save_path' => $local_file_path . $file_name, 'link' => $href, 'file_name' => $file_name, 'destination' => $local_file_path, ); } } } } }   return $links; }   /** * Makes a directory if required to save the files in * * @param string $param */ private function destinationDirectory($destination) { // make sure the destination folder exists if ($this->getMode() != 'search') { if (!is_dir($destination)) { mkdir($destination, 0775, TRUE); echo 'Making: ' . $destination . "\n"; } } }   /** * Convert a string to the file/URL safe form (safe for Windows) * * @param string $string the string to clean * * @return string */ private function sanitize($string = '') { // Replace all weird characters with spaces $string = preg_replace('/[^\w\-()\&\#\%\[\]\'\.]+/u', ' ', $string);   // Only allow one space separator at a time return trim(preg_replace('/ +/u', ' ', $string)); }   /** * The cURL wrapper * * @param $urll * The URL to get * * @param array $opts * A keyed array containing any parameters that I may need * - tmp - (optional) The location of the temporary file to create to hold the response * - cookie_jar - (optional) The location of the cookies folder * - headers - (optional) The headers to send for the request * - user_agent - (optional) The user agent used for the request * - encoding - (optional) The encoding * - time_out - (optional) The connection time out limit in seconds * - range - (optional) The Byte range to get * - low_speed_limit - (optional) See description in private function * - low_speed_time - (optional) See description in private function * * @return string * Either the data or the location of the temporary file that is holding the data */ private function curlGet($url, $opts = array()) {   // a new cURL instance $ch = curl_init();   // set the URL to the cURL options curl_setopt($ch, CURLOPT_URL, $url);   // The cookie jar if (!empty($opts['cookie_jar'])) { $cookie = tempnam($opts['cookie_jar'], "cookie"); } else { $cookie = tempnam(sys_get_temp_dir(), "cookie"); } curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);   // the headers if (!empty($opts['headers']) && is_array($opts['headers'])) { $headers = $opts['headers']; } else { $headers = array(); $headers[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $headers[] = "Connection: keep-alive"; $headers[] = "Keep-Alive: 115"; $headers[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $headers[] = "Accept-Language: en-us,en;q=0.5"; $headers[] = "Pragma: "; } curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);   // The user agent if (!empty($opts['user_agent'])) { $user_agent = $opts['user_agent']; } else { $user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162 Safari/535.19'; } curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);   // The encoding if (!empty($opts['encoding'])) { $encoding = $opts['encoding']; } else { $encoding = 'gzip,deflate'; } curl_setopt($ch, CURLOPT_ENCODING, $encoding);   // The Connection time out in seconds if (!empty($opts['time_out'])) { $time_out = $opts['time_out']; } else { $time_out = 60; } curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $time_out);   // Used with CURLOPT_LOW_SPEED_TIME. If the transfer speed falls below // this value in bytes per second for longer than CURLOPT_LOW_SPEED_TIME, // the transfer is aborted. if (!empty($opts['low_speed_limit'])) { $low_speed_limit = $opts['low_speed_limit']; } else { $low_speed_limit = 10240; } curl_setopt($ch, CURLOPT_LOW_SPEED_LIMIT, $low_speed_limit);   // Used with CURLOPT_LOW_SPEED_LIMIT. If the transfer speed falls below the // value given with the CURLOPT_LOW_SPEED_LIMIT option for longer than the // number of seconds given with CURLOPT_LOW_SPEED_TIME, the transfer is aborted. if (!empty($opts['low_speed_time'])) { $low_speed_time = $opts['low_speed_time']; } else { $low_speed_time = 60; } curl_setopt($ch, CURLOPT_LOW_SPEED_TIME, $low_speed_time);   // The range of bytes to get from the target if (!empty($opts['range'])) { curl_setopt($ch, CURLOPT_RANGE, $opts['range']); }   // these are always the same for my purpose curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);   // if we are writing this to a temporary file then return the file location if (!empty($opts['tmp'])) { // open the temp file if necessary $fp = fopen($opts['tmp'], 'w'); curl_setopt($ch, CURLOPT_FILE, $fp); $data = curl_exec($ch); curl_close($ch); fclose($fp); return $opts['tmp']; } else { // or just return the data $data = curl_exec($ch); curl_close($ch); return $data; } }   /** * Returns right n chars from input * * @param string $str * the string to cut * @param int $count * the length to cut * * @return string */ private function right($str, $count) { return mb_substr($str, ($count * -1)); }   /** * Returns left n chars from input * * @param string $str * the string to cut * @param int $count * the length to cut * * @return string */ private function left($str, $count) { return mb_substr($str, 0, $count); }   }

Comment

By submitting this form, you accept the Mollom privacy policy.