This repository has been archived on 2023-07-05. You can view files and clone it, but cannot push or open issues/pull-requests.
notes/Terminal Tips/Commands + Settings/Languages/Python/tools/Libraries/pandas/Pandas tips.md

74 lines
1.6 KiB
Markdown
Raw Permalink Normal View History

2023-07-05 03:05:42 +00:00
# Pandas tips
Pandas will be used all the time in jupyter notebooks and datasets. So it's best to get familiar with it while you can.
- It is useful to know the[ common ways](obsidian://open?vault=Coding%20Tips&file=Python%2Fcodes%2Fcommon%20pandas%20commands) it is used first.
- Also merging cells when you should is helpful.
- [Geocoding](https://towardsdatascience.com/six-python-tips-for-geospatial-data-science-4438a531b0bf) spatial data
#### read_csv()
### copy()
```
df2=df1.copy()
df2['b']=df2['b']+100
df2
```
![[Pasted image 20220708093923.png]]
```
df1
```
![[Pasted image 20220708093939.png]]
### concat()
```
df3=pd.concat([df1,df2])
df3
```
![[Pasted image 20220708094039.png]]
If you have multiple files to deal with, you can also combine pd.concat and pd.read_csv
```
for i in path_data.glob("*.csv"):
print(i)
```
![[Pasted image 20220708094155.png]]
```
flightlist = pd.concat(pd.read_csv(file) for file in path_data.glob("*.csv"))
```
### value_counts()
Used to count unique values.
```
df['callsign'].value_counts()
```
![[Pasted image 20220708094708.png]]
can also be normalized by setting ``` normalize=True```
```
df['callsign'].value_counts(normalize=True)
```
![[Pasted image 20220708094651.png]]
Can also be used for continuous data by putting them into discrete intervals using ```bins```
```
df['altitude_1'].value_counts(bins=10)
```
![[Pasted image 20220708094636.png]]
- More tips for data analysis can be found [here](https://towardsdatascience.com/5-useful-tips-for-exploratory-data-analysis-using-pandas-in-python-7c05808c9408)
- including data missing %, max values rows, aggregate across columns, and more
-