Hi,
tl;dr
I (any many others) need to archive legacy Google Analytics Universal data because it's shutting down on July 1, 2024. But doing so is a nightmare.
I've been wanting to switch to locally hosted access_log
analyzer tool anyway.
I realized that I could maybe skip the whole "exporting data from GA and then figuring out a way to visualize it" nightmare and instead use a locally hosted stats tool that would both use the current access_logs
and import the legacy access_log
data that I have.
Can anyone help?
Overview
I started using Google Analytics because it was easy to setup and provided a lot of information. But I never loved using it for all of the obvious reasons (Google controls it, somewhat locked-in data, opaque, etc.). It was helpful to have an off-site tool during the the transition from WebFaction to Opalstack, because it continued to provide analytics without any extra work (since there isn't yet any built-in log analyzers at OS).
Starting on July 1, 2024, legacy Google Analytics Universal data (IOW: Pre-GA4) will become inaccessible and then purged.
Apparently there's no straightforward way of exporting ALL of the GA data, let alone visualize that exported information in any meaningful way. (You can export PDFs of the Dashboard visualizations, but you can only do that one view at a time.)
There's lots of blog posts about this and various companies selling their services to facilitate different styles of exports, imports, and visualizations of this data. I can post some of those links in a follow-up post if necessary.
Some methods and services seem better than others and most lock you into their own monthly subscription fees*. But I'm not confident in any of them without doing a lot of research into each.
And then I realized:
I think one of the factors that a lot of these services are relying on is many people (most?) who use GA do not have access to their raw server access_logs
* (error_logs
would be good too).
But we (Opalstack customers) do!
By default we have the server logs for the past year and I have a script running that archives each year's data. It even stretches back to logs generated at WebFaction.
(*I realize that the server's access_logs
do not capture some of the data that GA does. Namely all of the frontend stuff that their Javascript tracker can.)
So my questions are:
Does that sound correct?
- That I can essentially forgo all of this craziness of trying to download historical GA data because I/we already have it (for the most part) via the server's
access_logs
.
- Or am I missing something very important? (No, I don't care much about Google Ad Sense and how it correlates with Google Analytics.)
Does anyone here have experience with any of the various server log analysis tools?
But mostly: Importing legacy apache_access.log
style file into them, in order to create a local GA replacement that would include both the historical AND current data?
Some of the tools:
Thanks!!
These OS Community posts were helpful in my research: