Robots.txt is an increasingly important file found on websites that determine whether you permit a website crawler to index your page for search engine optimization. As web-scraping is entirely legal in the US, this is the wild west of scraping and thus I want to keep mu brain and information safe from scraping. Fun Fact: Google [open-sourced](https://opensource.googleblog.com/2019/07/googles-robotstxt-parser-is-now-open.html) their [robots.txt parser](https://github.com/google/robotstxt) in 2019 if you want to see an example of reverse engineering the robots.txt file for search indexing. *Resources*: - [Robots.txt file examples](https://blog.hubspot.com/marketing/robots-txt-file) - Robots.txt [generator tool](https://www.internetmarketingninjas.com/tools/robots-txt-generator/) - another [robots.txt](https://www.cutercounter.com/robots.txt) file sample Example: ``` User-agent: * Disallow: / User-agent: Googlebot Disallow: / User-agent: AdsBot-Google Disallow: / User-agent: bingbot Disallow: / User-agent: msnbot Disallow: / User-agent: Slurp Disallow: / User-agent: Facebot Disallow: / User-agent: facebookexternalhit Disallow: / User-agent: baiduspider Disallow: / User-agent: Applebot Disallow: / User-agent: sosobot Disallow: / User-agent: exabot Disallow: / User-agent: seznambot Disallow: / User-agent: Teoma Disallow: / User-agent: ScoutJet Disallow: / User-agent: DuckDuckBot Disallow: / User-agent: Twitterbot Disallow: / User-agent: LinkedInBot Disallow: / User-agent: Yandex Disallow: / User-agent: Relcybot Disallow: / User-agent: Feedly Disallow: / User-agent: Netvibes Disallow: / User-agent: Pingdom Disallow: / User-agent: PGBot Disallow: / User-agent: Laserlikebot Disallow: / User-agent: PetalBot Disallow: / User-agent: ia_archiver Disallow: / User-agent: JamesBOT Disallow: / User-agent: * Disallow: / ```