Notepad/enter/Coding Tips (Classical)/Terminal Tips/Languages/Python/tools/Libraries/beautiful soup.md

# Beautiful Soup 

Beautiful Soup is a popular library commonly used for  webscraping, or the automated process of gathering public data extracting large amounts of public data from target  websites in seconds. 

Used often alongside to [requests](obsidian://open?vault=Coding%20Tips&file=Requests),  it is a parser to extract the data from  HTML and can turn even invalid markup into a parse tree. It cannot request data and  is only designed for parsing. 

**Part 1: Get  HTML using Requests**


```
import requests url='https://oxylabs.io/blog' response = requests.get(url)
```

**Part 2: Find Element **

```
from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') print(soup.title)

```

output will be: 
```javascript
<h1 class="blog-header">Oxylabs Blog</h1>
```

Due to its simple ways of navigating, searching and modifying the parse tree, Beautiful Soup is ideal even for beginners and usually saves developers hours of work. For example, to print all the blog titles from this page, the **findAll()** method can be used. On this page, all the blog titles are in h2 elements with class attribute set to ```blog-card__content-title```. This information can be supplied to the findAll method as follows

```python
blog_titles = soup.findAll('h2', attrs={"class":"blog-card__content-title"}) 
for title in blog_titles: print(title.text) 
# Output: 
# Prints all blog tiles on the page
```

Can also easily work with CSS Selectors so don't even need findAll. 

```python 
blog_titles = soup.select('h2.blog-card__content-title') for title in blog_titles: 
	print(title.text)
```