# Pandas tips 

Pandas will be used all the time in jupyter notebooks and datasets. So it's best to get familiar with it while you can. 

- It is useful to know the[ common ways](obsidian://open?vault=Coding%20Tips&file=Python%2Fcodes%2Fcommon%20pandas%20commands) it is used first.
- Also merging cells when you should is helpful. 
- [Geocoding](https://towardsdatascience.com/six-python-tips-for-geospatial-data-science-4438a531b0bf) spatial data 


#### read_csv()

### copy()

```
df2=df1.copy()
df2['b']=df2['b']+100 
df2
```
![[Pasted image 20220708093923.png]]

```
df1
```
![[Pasted image 20220708093939.png]]

### concat()

```
df3=pd.concat([df1,df2])
df3
```

![[Pasted image 20220708094039.png]]


If you have multiple files to deal with, you can also combine pd.concat and pd.read_csv

```
for i in path_data.glob("*.csv"):
	print(i)
```

![[Pasted image 20220708094155.png]]

```
flightlist = pd.concat(pd.read_csv(file) for file in path_data.glob("*.csv"))
```

### value_counts()

Used to count unique values. 

```
df['callsign'].value_counts()
```
![[Pasted image 20220708094708.png]]
can also be normalized by setting ``` normalize=True```

```
df['callsign'].value_counts(normalize=True)

```
![[Pasted image 20220708094651.png]]
Can also be used for continuous data by putting them into discrete intervals using ```bins```

```
df['altitude_1'].value_counts(bins=10)
```

![[Pasted image 20220708094636.png]]

- More tips for data analysis can be found [here](https://towardsdatascience.com/5-useful-tips-for-exploratory-data-analysis-using-pandas-in-python-7c05808c9408)
	- including data missing %, max values rows, aggregate across columns, and more
-