Data Collection Methods – How GA4 Does It and What’s Missing in Your Configuration
Internet users access the web through various devices and browsers, each of which is assigned cookies and other technologies. These contain unique IDs that enable the identification of specific devices, browsers, or applications. Google Analytics 4 (GA4) primarily relies on these identifiers, using them to distinguish between new and returning users.
With this method of data collection, a single user who interacts with your website on three different devices (e.g., a computer, smartphone, and tablet) will be counted by GA4 as three separate users. However, Google provides a way to avoid such inaccuracies by integrating two additional data sources into its methodology. This requires configuring the appropriate settings on your end.
Google Signals – This is a feature you can enable in the GA4 panel. It allows you to gather information about visits to your site and link them to Google’s data on logged-in users (only those who have previously consented to personalized ads). In short, when a user logged into Google visits your website on two different devices, they will be identified as a single user.
By configuring these options, you gain:
- Enhanced accuracy in Google Analytics’ user count, providing a complete view of the user’s customer journey with your brand.
- Expanded reach for remarketing campaigns, meaning you can display ads to users across multiple devices.
- Additional demographic data and insights into user interests.
Data Retention – What Settings to Adjust?
The default settings in this section of the GA4 panel limit the use of collected data in Explorations to only two months. What does this mean in practice? When creating custom dashboards in GA4, you won’t be able to analyze data year-over-year or even look back at data from three months ago. To change this, set the Data Retention period to 14 months and keep the Reset User Data on New Activity option enabled. This extended retention period will also apply to the data already collected.
It’s important to note that this change won’t apply to data related to age, gender, and interests, nor to services that process data on a large scale.
Reporting Identity – Access Modeled Data!
GA4 provides three identity options for reporting (outlined below). While this may sound complex, “identity” here refers to the methodology used to present data in reports. By consciously selecting the appropriate option, you can unlock more valuable data.
-
Device-Based – Only the device ID is considered. If the service records other identifiers, they are ignored.
-
Observed Categories – The User-ID function is prioritized first, followed by Google Signals, and then the user’s device ID. If you haven’t implemented User-ID on your site, GA4 recognizes users through Google Signals. If this option is unavailable, GA4 will use the device ID.
-
Blended – This option functions similarly to observed categories, but in addition to User-ID, Google Signals, and the device ID, it also incorporates data modeling. When is this option useful and applicable?
Choosing blended identity reporting is especially beneficial when you implement or plan to implement a consent management tool on your website. Consent management systems ensure privacy compliance by triggering tracking scripts only when users have consented to the use of cookies. Consequently, data gaps may occur when some users do not agree to cookie tracking. You can address this issue by using modeled data. How does it work?
Behavioral modeling relies on machine learning systems to predict the behavior of users who reject cookies by analyzing the behavior of similar users who consent to tracking. However, accessing this data requires more than just selecting the blended reporting identity in the GA4 admin panel. You must also integrate your consent management system with Google Consent Mode. This solution provides GA4 with information about users’ consent status and enables the transmission of cookie-less pings related to visits where consent was not given. These pings form the basis of the machine learning systems used to model data.
Key Takeaways and Expert Advice
Setting up GA4 may seem like a quick and easy task, but it can be incorrectly configured if not done carefully. Remember not to rely solely on the default settings, and make sure to:
- Implement and transmit User-ID,
- Enable Google Signals,
- Extend the data retention period,
- Implement or integrate a consent management system with Google Consent Mode,
- Choose the blended reporting identity.
These steps will allow you to fully utilize GA4’s potential and collect as much accurate data as possible about your audience. It’s important to note that the appearance of modeled data and information from Google Signals in reports depends not only on additional configurations outside the GA4 panel but also on meeting certain criteria.
For example, to include data from Google Signals in reports, you need an average of 500 Google-logged-in users per day among your audience. For behavioral data modeling to activate, your service must record at least 1,000 cookie consent acceptances over a minimum of seven days, along with the same number of refusals.
If your website has lower traffic, don’t be discouraged. It’s still worthwhile to implement an advanced GA4 setup. You’re likely thinking long-term and aiming to grow your brand, which involves continuously expanding your reach. Don’t wait—configure GA4 correctly today! If you need support, we’re here to help.
O AUTORCE
Katarzyna Góraj
Senior Digital Analyst
Pierwsze kroki na ścieżce zawodowej stawiała w social listeningu, po czym doszczętnie przepadła w świecie badań. W Yetiz zajmuje się analityką, ale i prowadzeniem kampanii PPC. Prywatnie jest uzależniona od górskich wędrówek i nie wyobraża sobie życia bez wokalu Freddiego Mercurego.