Compare commits

...

2 Commits

5 changed files with 24 additions and 20 deletions

View File

@ -1,12 +1,12 @@
{
"recentFiles": [
{
"basename": "Webscraping",
"path": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md"
"basename": "Robots.txt Files",
"path": "Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md"
},
{
"basename": "Robots.txt Files",
"path": "Robots.txt Files.md"
"basename": "Webscraping",
"path": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md"
},
{
"basename": "Potentiometers & Analog SerialReader",

View File

@ -25,7 +25,7 @@
"state": {
"type": "markdown",
"state": {
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md",
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md",
"mode": "source",
"source": false
}
@ -107,7 +107,7 @@
"state": {
"type": "backlink",
"state": {
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md",
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md",
"collapseAll": false,
"extraContext": false,
"sortOrder": "alphabetical",
@ -124,7 +124,7 @@
"state": {
"type": "outgoing-link",
"state": {
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md",
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md",
"linksCollapsed": false,
"unlinkedCollapsed": true
}
@ -147,7 +147,7 @@
"state": {
"type": "outline",
"state": {
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md"
"file": "Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md"
}
}
}
@ -174,9 +174,10 @@
"obsidian-excalidraw-plugin:Create new drawing": false
}
},
"active": "dbad7b010371d947",
"active": "0a0de85a51848b9d",
"lastOpenFiles": [
"Robots.txt Files.md",
"Coding Tips (Classical)/Terminal Tips/GUIs/Tools/Webscraping.md",
"Coding Tips (Classical)/Terminal Tips/GUIs/Internet/Websites/Robots.txt Files.md",
"Excalidraw/Drawing 2023-10-16 12.13.42.excalidraw.md",
"Machine Tips (Quantum)/Physics/Hardware/Potentiometers & Analog SerialReader.md",
"Excalidraw",
@ -206,7 +207,6 @@
"Untitled.canvas",
"Coding Tips (Classical)/Project Vault/Current Occupations/Manhattan Youth",
"Coding Tips (Classical)/Project Vault/Current Occupations/Website Projects/My Domain Names.md",
"Coding Tips (Classical)/Project Vault/Current Occupations/Potential and Future/Career Tips.md",
"Coding Tips (Classical)/Project Vault/About Obsidian/imgFiles/Pasted image 20231011091043.png",
"Coding Tips (Classical)/Project Vault/About Obsidian/Slides & Tools/export/Slides/plugin/chalkboard/_style.css",
"Coding Tips (Classical)/Project Vault/About Obsidian/Slides & Tools/export/Slides/plugin/chalkboard/img/blackboard.png",

View File

@ -0,0 +1,9 @@
Robots.txt is an increasingly important file found on websites that determine whether you permit a website crawler to index your page for search engine optimization. As web-scraping is entirely legal in the US, this is the wild west of scraping and thus I want to keep mu brain and information safe from scraping.
Fun Fact: Google [open-sourced](https://opensource.googleblog.com/2019/07/googles-robotstxt-parser-is-now-open.html) their [robots.txt parser](https://github.com/google/robotstxt) in 2019 f you want to see an example of reverse engineering the robots.txt file for search indexing.
*Resources*:
- [Robots.txt file examples](https://blog.hubspot.com/marketing/robots-txt-file)
- Robots.txt [generator tool](https://www.internetmarketingninjas.com/tools/robots-txt-generator/)

View File

@ -1,9 +1,10 @@
# Webscraping
# Web-scraping
Webscraping is a common task in the CS world that makes it easy and efficient to extract large amounts of data. It is part of a larger topic of data mining which allows for the human understandable analysis of all the data that is out there.
Web-scraping is a common task in the CS world that makes it easy and efficient to extract large amounts of data. It is part of a larger topic of data mining which allows for the human understandable analysis of all the data that is out there.
You will often use requests and beautifulsoup libraries. To prevent webscraping on your own sites, refer to the rob
You will often use requests and `beautifulsoup` libraries.
To prevent web-scraping on your own sites, refer to the [robots.txt](obsidian://open?vault=enter&file=Robots.txt%20Files) information.
---

View File

@ -1,6 +0,0 @@
Robots.txt is an increasingly important file found on websites that determine whether you permit a website crawler to index your page for search engine optimization. As webscraping is entirely legal in the US, this is the wild west of scraping and thus I want to keep mu brain and information safe from scraping.
*Resources*:
- [Robots.txt file examples](https://blog.hubspot.com/marketing/robots-txt-file)