# Webscraping Webscraping is a common task in the CS world that makes it easy and efficient to extract large amounts of data. It is part of a larger topic of data mining which allows for the human understandable analysis of all the data that is out there. You will often use requests and beautifulsoup libraries. --- #### Comparing web scraping libraries: ![[Pasted image 20220730121832.png]] ## Sample scraper ```python import pandas as pd from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome(executable_path='/nix/path/to/webdriver/executable') driver.get('https://your.url/here?yes=brilliant') results = [] other_results = [] content = driver.page_source soup = BeautifulSoup(content) for a in soup.findAll(attrs={'class': 'class'}): name = a.find('a') if name not in results: results.append(name.text) for b in soup.findAll(attrs={'class': 'otherclass'}): name2 = b.find('span') other_results.append(name.text) series1 = pd.Series(results, name = 'Names') series2 = pd.Series(other_results, name = 'Categories') df = pd.DataFrame({'Names': series1, 'Categories': series2}) df.to_csv('names.csv', index=False, encoding='utf-8') ``` You can also use asyncio or multithreading to make web scraping even [faster](https://oxylabs.io/blog/how-to-make-web-scraping-faster). Right click > Inspect > Network ##### More helpful tutorials - [How To Scraper Yelp Review For Free [No Coding Required] | ProWebScraper](https://medium.com/prowebscraper/how-to-scraper-yelp-reviews-899b7480eb8d) - [How to Build a Web Scraper With Python [Step-by-Step Guide] | HackerNoon](https://hackernoon.com/how-to-build-a-web-scraper-with-python-step-by-step-guide-jxkp3yum) - [Python Web Scraping Tutorial: Step-By-Step [2022 Guide] | Oxylabs](https://oxylabs.io/blog/python-web-scraping) - [ Intro to Yelp Scraping using Python ](https://towardsdatascience.com/intro-to-yelp-web-scraping-using-python-78252318d832) ## Alternative tools: - [Octoparse](https://developer.chrome.com/docs/devtools/workspaces/?utm_source=devtools) is a good one which is free for 14 days.