If your work is even slightly related to online marketing – you probably have heard already about new version of Google Analytics – Google Analytics 4 Property. Everybody are talking about it now because Google Analytics is the most popular web analytics tool, so a lot of businesses worldwide are depend on it. One of the new cool features about GA4 – is data transfer to Google BigQuery, which previously was available only for expensive Google Analytics 360. But BigQuery costs and it’s not cheap, especially for big amount of data. Still – sometimes we need to get raw data from Google Analytics for automatisation or data modeling purposes. I prefer using Python and Pandas for this kind of tasks, hope you too. 🙂
In previous version of Google Analytics (Universal Analytics) – Google offered us a great “analytics reporting API v4”, but what about the new GA4? It’s API called – Google Analytics Data API:
https://developers.google.com/analytics/devguides/reporting/data/v1
Getting access to your account with this API is quite similar to previous one. In my example – I am going to use Google service account generated in Google Cloud Platform. So in order to get access to API with service account you should do next steps:
- Go to Credentials section on your GCP project, create a Service account (if you already don’t have one) and download the secret json:
https://console.cloud.google.com/apis/credentials - Enable Google Analytics Data API for your project:
https://console.cloud.google.com/marketplace/product/google/analyticsdata.googleapis.com - Go to your Google Analytics 4 Property account and add your service account email in Account User Management Section as any other user:
https://analytics.google.com/analytics/web/ - Now go to Property Settings and save somewhere your property id – you will need it later in Python script:
- Install python library google-analytics-data using PIP or add it to the requirements.txt in your python project:
https://pypi.org/project/google-analytics-data/
Python example script
Now – you should be able to run the Python script below. All you should change in this example script – add path to you your service account secret json and add your property_id from step 4:
from google.analytics.data_v1alpha import AlphaAnalyticsDataClient
from google.analytics.data_v1alpha.types import DateRange
from google.analytics.data_v1alpha.types import Dimension
from google.analytics.data_v1alpha.types import Entity
from google.analytics.data_v1alpha.types import Metric
from google.analytics.data_v1alpha.types import RunReportRequest
import pandas as pd
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path_to_your_google_secrets.json'
def ga4_response_to_df(response):
dim_len = len(response.dimension_headers)
metric_len = len(response.metric_headers)
all_data = []
for row in response.rows:
row_data = {}
for i in range(0, dim_len):
row_data.update({response.dimension_headers[i].name: row.dimension_values[i].value})
for i in range(0, metric_len):
row_data.update({response.metric_headers[i].name: row.metric_values[i].value})
all_data.append(row_data)
df = pd.DataFrame(all_data)
return df
def get_ga4_report_df(property_id, dimensions, metrics, start_date, end_date):
dimensions_ga4 = []
for dimension in dimensions:
dimensions_ga4.append(Dimension(name=dimension))
metrics_ga4 = []
for metric in metrics:
metrics_ga4.append(Metric(name=metric))
client = AlphaAnalyticsDataClient()
request = RunReportRequest(entity=Entity(property_id=property_id),
dimensions=dimensions_ga4,
metrics=metrics_ga4,
date_ranges=[DateRange(start_date=start_date,
end_date=end_date)])
response = client.run_report(request)
return ga4_response_to_df(response)
property_id = 'your_property_id'
dimensions = ['source', 'medium']
metrics = ['sessions']
start_date = '2021-01-01'
end_date = 'today'
df = get_ga4_report_df(property_id, dimensions, metrics, start_date, end_date)
print(df)
The base for this script was taken from Google’s documentation:
https://developers.google.com/analytics/devguides/reporting/data/v1/quickstart-client-libraries
I just added a function which transforms API response to Pandas DataFrame and some other little changes in order to make it more convenient. Hope it will help you to get your data from GA4 easier.
Nice Article Thanks for sharing
Great article.. Works for Beta libraries as well.