tobiaswillmann.de

Python Pandas most used code snippets and use cases for SEO

2018-07-20

some examples

Merge

Merge two dataframes on a specific column. In this case “adress” in df1 with “Adresse” in df2.

merged = pd.merge(df1, df2, left_on='adress', right_on='Adresse', how='outer')

Its also possible to merge on multiple fields. This merges two dataframes with total pageviews and entrances on url and date

merged = pd.merge(dfTotal, dfOrganic,  how='left', left_on=['ga:landingPagePath','ga:year','ga:month','ga:day'], right_on = ['ga:landingPagePath','ga:year','ga:month','ga:day'])

Filter

You want to filter for example in a crawl status code == 200.

filter = df['Status Code'] == 200
df = df[filter]

Or pageviews > 1000

filter = df['pageviews'].astype('int', copy=False) > 1000
df = df[filter]

Reverse filters work like this. This is all medium possibilities without “organic”

filter = df['ga:medium'] == "organic"
df = df[~filter]

Filter e.g. for > 0.7 organic_entrance_share

filter = df['organic_entrance_share'] > 0.7
df = df[filter]

Filter for NaN / notnull in a Pandas dataframe column is also needed a lot.

filter = df['ga:entrances_y'].notnull()
df = df[filter]

Pivot

You want for example to sum up for example all pageviews and entrances per day and ga:landingPagePath

dfpivot = pd.pivot_table(data,index=['ga:landingPagePath','ga:year','ga:month','ga:day'], values=['ga:entrances','ga:pageviews'], aggfunc='sum')
dfpivot.fillna(0, inplace=True)
dfpivot.reset_index(inplace=True)

Extract

This can be uses to extract e.g. an article id from a url. If your URL looks like this https://www.blick.ch/gugus-id12345.html

df['articleid'] = df['ga:landingPagePath'].str.extract(r'-id(d+)', expand=True)

Drop a column

Drop the column _id

df = df.drop("_id",1)

Drop duplicates

df = df.drop_duplicates()

Work with dates

df["realdate"] = pd.to_datetime(df["date"], format='%Y-%m-%d', errors='ignore')

Import

Import for example a Screaming Frog crawl, which is too big for Excel to handle.

df = pd.DataFrame(pd.read_csv('/Users/...wurstbrot.csv', low_memory=False, header=0, delimiter=";"))

Export

At some point you maybe prefer to go back to Excel

df.to_csv('export1.csv', sep='t')

This was useful when I had some encoding problems

df.to_csv('export3.csv', sep=b't', encoding='utf-8')

… to be continued …

© 2020 Tobias Willmann