74 lines
1.6 KiB
Markdown
74 lines
1.6 KiB
Markdown
|
# Pandas tips
|
||
|
|
||
|
Pandas will be used all the time in jupyter notebooks and datasets. So it's best to get familiar with it while you can.
|
||
|
|
||
|
- It is useful to know the[ common ways](obsidian://open?vault=Coding%20Tips&file=Python%2Fcodes%2Fcommon%20pandas%20commands) it is used first.
|
||
|
- Also merging cells when you should is helpful.
|
||
|
- [Geocoding](https://towardsdatascience.com/six-python-tips-for-geospatial-data-science-4438a531b0bf) spatial data
|
||
|
|
||
|
|
||
|
#### read_csv()
|
||
|
|
||
|
### copy()
|
||
|
|
||
|
```
|
||
|
df2=df1.copy()
|
||
|
df2['b']=df2['b']+100
|
||
|
df2
|
||
|
```
|
||
|
![[Pasted image 20220708093923.png]]
|
||
|
|
||
|
```
|
||
|
df1
|
||
|
```
|
||
|
![[Pasted image 20220708093939.png]]
|
||
|
|
||
|
### concat()
|
||
|
|
||
|
```
|
||
|
df3=pd.concat([df1,df2])
|
||
|
df3
|
||
|
```
|
||
|
|
||
|
![[Pasted image 20220708094039.png]]
|
||
|
|
||
|
|
||
|
If you have multiple files to deal with, you can also combine pd.concat and pd.read_csv
|
||
|
|
||
|
```
|
||
|
for i in path_data.glob("*.csv"):
|
||
|
print(i)
|
||
|
```
|
||
|
|
||
|
![[Pasted image 20220708094155.png]]
|
||
|
|
||
|
```
|
||
|
flightlist = pd.concat(pd.read_csv(file) for file in path_data.glob("*.csv"))
|
||
|
```
|
||
|
|
||
|
### value_counts()
|
||
|
|
||
|
Used to count unique values.
|
||
|
|
||
|
```
|
||
|
df['callsign'].value_counts()
|
||
|
```
|
||
|
![[Pasted image 20220708094708.png]]
|
||
|
can also be normalized by setting ``` normalize=True```
|
||
|
|
||
|
```
|
||
|
df['callsign'].value_counts(normalize=True)
|
||
|
|
||
|
```
|
||
|
![[Pasted image 20220708094651.png]]
|
||
|
Can also be used for continuous data by putting them into discrete intervals using ```bins```
|
||
|
|
||
|
```
|
||
|
df['altitude_1'].value_counts(bins=10)
|
||
|
```
|
||
|
|
||
|
![[Pasted image 20220708094636.png]]
|
||
|
|
||
|
- More tips for data analysis can be found [here](https://towardsdatascience.com/5-useful-tips-for-exploratory-data-analysis-using-pandas-in-python-7c05808c9408)
|
||
|
- including data missing %, max values rows, aggregate across columns, and more
|
||
|
-
|