# Pandas tips Pandas will be used all the time in jupyter notebooks and datasets. So it's best to get familiar with it while you can. - It is useful to know the[ common ways](obsidian://open?vault=Coding%20Tips&file=Python%2Fcodes%2Fcommon%20pandas%20commands) it is used first. - Also merging cells when you should is helpful. - [Geocoding](https://towardsdatascience.com/six-python-tips-for-geospatial-data-science-4438a531b0bf) spatial data #### read_csv() ### copy() ``` df2=df1.copy() df2['b']=df2['b']+100 df2 ``` ![[Pasted image 20220708093923.png]] ``` df1 ``` ![[Pasted image 20220708093939.png]] ### concat() ``` df3=pd.concat([df1,df2]) df3 ``` ![[Pasted image 20220708094039.png]] If you have multiple files to deal with, you can also combine pd.concat and pd.read_csv ``` for i in path_data.glob("*.csv"): print(i) ``` ![[Pasted image 20220708094155.png]] ``` flightlist = pd.concat(pd.read_csv(file) for file in path_data.glob("*.csv")) ``` ### value_counts() Used to count unique values. ``` df['callsign'].value_counts() ``` ![[Pasted image 20220708094708.png]] can also be normalized by setting ``` normalize=True``` ``` df['callsign'].value_counts(normalize=True) ``` ![[Pasted image 20220708094651.png]] Can also be used for continuous data by putting them into discrete intervals using ```bins``` ``` df['altitude_1'].value_counts(bins=10) ``` ![[Pasted image 20220708094636.png]] - More tips for data analysis can be found [here](https://towardsdatascience.com/5-useful-tips-for-exploratory-data-analysis-using-pandas-in-python-7c05808c9408) - including data missing %, max values rows, aggregate across columns, and more -