Notepad/enter/Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md

797 B

Robots.txt is an increasingly important file found on websites that determine whether you permit a website crawler to index your page for search engine optimization. As web-scraping is entirely legal in the US, this is the wild west of scraping and thus I want to keep mu brain and information safe from scraping.

Fun Fact: Google open-sourced their robots.txt parser in 2019 if you want to see an example of reverse engineering the robots.txt file for search indexing.

Resources: