allBlog

DeGoogling my Life Part II: Exploring Mountains of Extracted Data and Revealing How Much Information I Handed Over

After deciding to dump Google (or as much as I could, at least) I figured I’d go through the export of my main personal account that I have had for many years.

I’m by no-means an online privacy connoisseur. As stated in the first part of my DeGoogling journal, I opted for Outlook as an alternative to Gmail and I am still predominantly a Windows user with multiple Android devices in my office. Whilst I maintain the view that Google has many great features that are hard to match, I know that they’re not the only company out there using our information for their gain.

Here’s a view of the archive structure:

It’s no surprise that Google had loads of information stored on their servers about my life and day-to-day activities. After all, I uploaded the entries and enabled them to do so by not taking better care of what activity I had set in the accounts but it’s still quite overwhelming to go through roughly 15 years of data and see what types of information have been kept all this time — most of which has either been removed due to expiry or data loss or otherwise. As far as I could research, I don’t see any automatic deletion process if not elected (which I describe further in the post) 

Every voice input into Google Assistant — both intentional and accidental — since 2017 set out beautifully across multiple folders. Some of these recordings begin BEFORE I am heard saying “Hey Google!”

DeGoogIing as of 26 Feb 2020 
@gmail.com 26 Feb 2020 
D Hidden 
My Activity 
Assista nt 
_J as of 26 Feb 2020 
ContrL 
Name 
gmail.com 26 Feb 2020 > My Activity > Voice and Audio 
ackups 
@gmail.col 
Name 
* 2018-10-24 DO 54 
* 2018-10-24 DO 54_D6 UTC.mp3 
* 2018-10-24_DO 54_07 628 UTC.mp3 
* 2018-10-24_DO i4_D8 571 UTC.mp3 
* 2018-11-02 232228 121 UTC.mp3 
* 2018-11-02 23 22 29 127_UTC.mp3 
* 2018-11-02 23 22 30 368 UTC.mp3 
* 2018-11-02 23 22 32 045_UTC.mp3 
* 2018-11-02 23 22 33 025_UTC.mp3 
* 2018-11-02 23 22 35 211 UTC.mp3 
* 2018-11-02 23 22 UTC.mp3 
* 2018-11-02 23 22 38 3S8_UTC.mp3 
* 2019-01-28 og 39 18 324_UTC.mp3 
* 2019-01-28 39 19 248_UTC.mp3 
* 2019-01-28 og 39 20 279 UTC.mp3 
* 2019-01-28 20 UTC.mp3 
* 2019-03 12 02 37 330 UTC.mp3 
* 2019-03-12 02 59 38 282 UTC.mp3 
* 2019-03 12 02 59 39 257_UTC.mp3 
* 2019-03-12 02 59 40 3S8_UTC.mp3 
* 2019-04-01 DO 21 39 8S3_UTC.mp3 
* 2019-u 01 DO 21 40 602 UTC.mp3 
* 2019-04-01 DO 21 41 UTC.mp3 
* 2019-u 01 DO 21 42 103 UTC.mp3 
* 2019-06 25_23 33 46 UTC.mp3 
* 2019-06-27 07 DO 41 gu UTC.mp3 
* 2019-06 27 07 26 38 302 UTC.mp3 
* 2019-06 28 038 UTC.mp3 
* 2019-06 28 22 48 54_460 UTC.mp3 
* 2019-06 28 22 27 UTC.mp3 
* 2019-06-29 02 26 48 270 UTC.mp3 
* 2019-06 02 27 02 721 UTC.mp3 
* 2019-06 29 07 22 16 278_UTC.mp3 
* 2019-06-29 07 22 22 262 UTC.mp3 
* 2019-06 07 23 21 927 UTC.mp3 
Tltle 
Customize 
Ttle 
Assistant Properties 
General Security Previous Version s 
Assistant 
File folder 
Contributing artists Album 
Voice and Audio Properties 
General Securty previous Veßions Custon•ize 
Sze on disk 
Containe 
Attribut es 
C \ as of 26 Fe 
7 16 MB (7,516.356 bytes) 
787 MB (8,253.440 bytes) 
344 Files. O Folders 
Wednesday. 26 February 2020. 124358 PM 
@ Readonly (Only applies to files n folder) 
Type: 
Location : 
Sze dsk: 
Conte-vs : 
Created: 
Advanced 
Re folder 
as 26 
236 ME (24.752297 bytes) 
265 MB (27.820032 bytes) 
1 *463 Res. O Folders 
Wedr,esday. 26 February 2020. 124358 PM 
@Read only Only to files füer) 
Hdden 
OK

One only needs to look through WikiLeaks’ Vault7 and Edward Snowden’s NSA leaks to realise the extent of the issue. But don’t listen to silly old me banging on about intelligence alliances like Five Eyes, for political leaders across the world ensure me it — as Foreign Minister at the time Julie Bishop put it — it saves lives!

Some of the data is simply anonymised rather than deleted. As per Google’s privacy policy under “How Google retains data…”:

In some cases, rather than provide a way to delete data, we store it for a predetermined period of time. For each type of data, we set retention time frames based on the reason for its collection. For example, to ensure that our services display properly on many different types of devices, we may retain browser width and height for up to nine months. We also take steps to make certain data anonymous within set time periods. For example, we make advertising data in server logs anonymous by removing part of the IP address after nine months, and cookie information after 18 months.

And:

For example, after you delete a specific Google search from My Activity, we might keep information about how often you search for things, but not what you searched for.

To be fair, you can disable the activity, or have it set that the activity is deleted after 3 months, but the defaults are something people often overlook. Google doesn’t make it any easier with its consistent warnings that by deleting or disabling the activity history you’ll essentially break every service you use. According to Google’s help pages on the subject, your data is deleted from your view and may be retained for a duration they see fit.

Google has a cleverly worded video that makes all this seem a little less creepy – though liking/disliking the video or commenting is disabled.

How your activity is deleted 
When you use Google sites, apps, and services, some of your activity is saved in your Google Account. Most of 
this data is kept until you delete it, like when you manually delete or set time periods to automatically delete your 
data in My Activity . Some data may expire sooner. 
When you delete data, we follow a policy to safely and completely remove it from your account. First, deleted 
activity is immediately removed from view and no longer used to personalize your Google experience. Then, we 
begin a process designed to safely and completely delete the data from our storage systems. 
Even when activity is deleted, some data about your use of Google services may be kept for the life of your 
Google Account. For example, after you delete a search from My Activity, your account will store the fact that you 
searched for something, but not what you searched for. 
Sometimes we retain certain information for an extended period of time to meet specific business needs or legal 
requirements. When you delete your Google Account, much of this information is also removed. 
Learn more about the data we retain and why .

Google News, Shopping and Applications:

Every article I have ever read via Google News and, more concerningly, every article I didn’t want to read is recorded. If I didn’t know the user of the account, I’d be able to perfectly handcraft a description of his or her political associations and beliefs, taste in music (even though I didn’t use Google Play music), preferred applications and even applications I have searched or viewed in the store. I had also searched for medication prices using Google Search and this would assist in painting the picture of the medication I may be on or have been on previously and the conditions I may have or had. Suddenly the idea of the masses referring to Dr Google gives another worrying prospect.

play.google.com 
Yesterday 
Google Play Store 
Visited Microsoft News 
9:13 PM Details 
Google Play Store 
Visited Microsoft News 
912 PM Details 
Google Play Store 
Visited Google Play Store 
9:12 PM Details 
i? 
i?
Not looking for an alternative now are we? Tsk. Tsk.

Further inspection into my infrequent use of the Google News aggregator shows they at least managed to discover what games, software and music I liked to know about:

Xbox Series X 
12:43 PM 
Read PSS leak hints at exciting new features the Xbox Series X won't 
get 
Details • Google News 
Read Buy Cyberpunk 2077 on Xbox One, get Xbox Series X upgrade 
free 
Details • Google News 
Australian Broadcasting Corporation 
12:42 PM 
Read Children among dozens injured after car drives into German 
carnival parade 
Details • Google News
Ah yes, so clearly I am interested in knowing what Microsoft are up to these days and if their news application is any good. Add it to the record please, Google!

I never really used most of Google’s features and yet the profile built around my usage of the few I did has created some interesting connections within the Google Takeout archive. Kudos to Google for allowing me to see it all, even though it’s my basic right as an account holder – but it’s what they don’t reveal (if so) that has me wondering now. Google’s “My Activity” dashboard likes to remind the user that “Only you can see this data”

I have never used Google Fit, yet there is plenty of data stored suggesting I used an application that must have linked to it from May to September of 2019. I use a Fitbit watch (which will be dismissed in favour of something a bit more private, which seems like a bit of an oxymoron) – as far as I can tell, the Google Fit data isn’t consistent with the information that Fitbit has collected during the same period, leading me unable to connect it to Google’s buyout of Fitbit, Inc which was announced several months after said data collection has stopped. Mysteriously though, I have never used any fitness tracking application or device other than the official Fitbit application and the Fitbit Versa, respectively. The .tcx files (which are essentially XML files) contain suggestions that Garmin are related to the data. Some basic research suggests Pokémon Go and its “Adventure Sync” could be the culprit but I ceased playing that several years ago. Your guess is good as mine, but it demonstrates the reach of Google, their affiliates and any applications you use that could be sharing data to Google.

Google Play Console:

My Google Play developer console folder has several applications and versions that I removed when I ceased using my developer account. Whilst they haven’t stored (or returned) the application files and source code, I see a history of application updates and application build attempts. Change logs and other information submitted is retained from 2015 – the date that I removed all the applications from my account and from download and relocated them to my Google business account.

Visiting the Developer Console reveals that there is absolutely nothing available to touch and that I am only able to upload a new application. There is nothing to delete.

I contacted Google and was informed I would have to contact their privacy team to request removal of the data that had appeared in my Takeout. Since doing this, it seems I am entirely unable to even visit my developer console. I am now redirected to a page that suggests I was banned from the service:

A screenshot of a social media post

Description automatically generated

It doesn’t stop there. I did another Google Takeout – and the data is still there. I contacted Google’s privacy team again and received a canned response a few hours later simply stating the data cannot be removed and that it was secure and anonymised as per their policies. From that, I assume that deleting my account completely will not remove the information stored. I suspect that intentionally getting myself banned from Google by breaking their rules will not help either. Luckily, I have nothing to hide and that all the information is useless to me and everyone outside of my life – but that doesn’t mean it is okay! Although things such as my email address and phone number are present within the files.

Some of the data Google have provided has not actually been unencrypted upon extraction. For example, my various preferences and settings .json files are filled with unusable information:

"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Encrypted" : 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
"Your data 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. " 
encrypted and cannot be exported. "

I contacted Google and authorised them to provide me with the data that has been unencrypted. They can’t help with that apparently.

Furthermore, I noticed that some contact information was available in the “team_members.json” – this included a couple of people I had been developing applications with who had access to the account. It seems their data will remain too.

DEVICES:

Despite having removed the retired devices from the multitude of Google services over the years, it appears that some – but not all – devices are contained within the archive. From an old smart TV I logged into last year and subsequently removed from my account after realising I didn’t need the features Android would offer me, to a Samsung SII I had back in 2011; it’s all still here complete with IMEI numbers, Android ID’s, device serial numbers and even the IP address from the date of its last connection whilst associated with my account.

Information including first data connection times, SIM operators and applications installed are all featured within an extensive HTML document. If you visit Google’s “Your Devices” dashboard, you’ll see every device you currently have connected to your account. From there, you can locate a device (which is a great feature) and remove any old devices you are signed in to. You can also see the devices you have signed in or registered to your account on the Google Play settings page (though you cannot remove them)

But what about the devices you have signed out from or formatted for resale or recycling? Well on that same page you will find any device you have signed out of…in the last 28 days.

There is absolutely no mention of any device that you have linked that Google still holds the information to such as those I found in my data export.

I also found that I had a device that was replaced under warranty a few days after I activated it. The device isn’t shown anywhere in my account, but the long page of information Google has on it remains.

As of writing, it appears that there is no method to remove those old unused devices from your account. If you’re concerned about your security, change your password. If you’re concerned about the fact that Google has records of these devices on their servers, complete with information such as the applications you installed, networks used and so on then you can wait for the dormant device to be removed from your account automatically…but, as was in my case, this doesn’t seem to apply to all of them. Once again, it seems like Google will let you hide something from view, but ultimately continue storing any data collected. Google’s own support page seemingly glosses over that little fact.

Navigating to the “My Activity > Android” folder gave me an insight into how much application usage is stored. From opening the file manager and deleting some junk downloads, to opening an image application and viewing some viral videos; it’s all here and clear.

ADVERTISEMENTS:

This was always going to be an unsurprising discovery but as somebody who doesn’t exactly click on every advertisement they see, it was amazing to see the thousands of entries over the past decade that told me what I clicked, the URL it led to and the date and time I clicked on it. Even the advertisements I wouldn’t have visited intentionally – such as those that were thrown at me by an Android app – are included.

LOCATION HISTORY:

My location history contained at least 300 different locations and routes that I had used through the Google Maps application on my phone and any searches I had done on my desktop.

There is a pile of files in the “Semantic Location History” that give me insight into what was collected. Opening these .json files up in Visual Studio (code editor – you can use Notepad++ or the free Visual Studio Code if you want to dig through your own .json extractions) shows me the calculations (or “probability”) of my movements:

Also, take note of the “CATCHING_POKEMON” estimation!

" endTimestampms" • 
. "1491875661eoø•• 
"distance : 
" 95136, 
"activityType 
"confidence • ' 
"activities" 
PASSENGER 
VEHICLE" , 
lea 
lei 
le2 
" . • HIGH", 
"activityType" 
: "IN PASSENGER VEHICLE", 
"probability 
" : 97.9662884926075 
"activityType" 
"MOTORCYCLING" , 
"probability 
" : 1.2Q5732ß22513323 
"activityType" 
"WALKING" , 
"probability 
" : e.3132954679592573 
"activityType" 
: "IN BUS", 
"probability 
" : e. 3e78ß664486e2561 
"activityType" 
: "FLYING", 
"probability 
" : e.ue75897495566281 
"activityType" 
: "IN FERRY", 
"probability 
" : e.ø35434712858U8766 
"activityType" 
. "RUNNING", 
"probability 
" : e.ø22B526ß3eø444ß83 
"activityType" 
: "BOATING", 
"probability 
" : e.mge3593e819449126 
"activityType" 
"CYCLING" , 
"probability 
" : e.ø1328e3ge88940øg36 
"activityType" 
: "IN WHEELCHAIR", 
"probability 
" : e.øe6647288545427814 
"activityType" 
"HORSEBACK RIDING", 
"probability 
" : e.øe62178617816e81375 
"activityType" 
: "IN TRAIN", 
"probability 
" : e.øe516e346532e5296 
"activityType" 
: "IN GONDOLA LIFT", 
"probability 
" : e.øe32654619353888878 
"activityType" 
: "SNOWMOBILE", 
"probability 
" : e.øe261971182e728552 
"probability 
" : e.øe17se7452484154876 
"activityType" 
: "SWI%ING", 
"probability 
" : e.øe17øge186473923987 
"activityType" 
"SAILING", 
"probability 
" : 9.875131292191347E-4 
"activityType" 
: "KAYAKING", 
"probability 
" : 9.584742818715e1E-4 
"activityType" 
. "ROWING", 
"probability 
" : 7.234494378827893E-4 
"activityType" 
: "IN CABLECAR", 
"probability 
" : 4.8158776187366414E-4 
"activityType" 
: "IN FUNICULAR", 
"probability 
" : 4.1456343439149197E-4 
"activityType" 
"WALKING NORDIC" , 
"probability 
" : 3.9048354997378+-4 
"activityType" 
"SNOWSHOEING" , 
"probability 
" : 3.778723779745218E-4 
"activityType" 
: "SNOWBOARDING" , 
"probability 
" : 3. se19979991ß9788E-4 
"activityType" 
"SKATEBOARDING " , 
"probability 
" : 2.8481858728559216E-4 
"activityType" 
: "IN TRAM", 
"probability 
" : 2.14674722881397+-4 
"activityType" 
"SKATING" , 
"probability 
" : 1.5678673917836E-4 
"activityType" 
: "KITESURFING", 
"probability 
" : 1.26441577133064E-4 
"activityType" 
: "IN SUBWAY", 
"probability 
" : 1.1258999264873218E-4 
"activityType" 
"CATCHING POKE"ON", 
"probability 
" : 3.9748912619895684E-42

The specifics of vehicles I was potentially driving or passenger in are even noted. For example: IN_ROAD_VEHICLE and IN_FOUR_WHEELER_VEHICLE.

One of these location files was a whopping 281MB. If you’re doing this yourself and have huge files of mass text too then you’ll want to be using 64-bit software with a decent enough computer. Even then, expect some crashing!

That’s a lot of text. With word-wrapping enabled in Visual Studio, that’s 12,517,808 lines of text. To put that into perspective by using words, that’s a total of twelve million, five hundred and seventeen thousand, eight hundred and eight lines. All up about 17 million words and I’d estimate about 12-14 million of that is personal location data if I take all the random accuracy and prediction lines into consideration.

Again, in total acknowledgement, I could have stopped Google from keeping all my location history by taking a proper look at what data collection I had set.

Google Photos:

Despite never using Google’s services to upload personal photos – I was, for a time, a big contributor of professional photography to Panoramio even prior to Google’s purchase and later discontinuation of the product – but it’s the things like profile pictures, YouTube thumbnail uploads and similar that make up a folder filled with metadata even if you deleted the images. Thankfully, photos dating as far back as 2004 that have long-since been deleted are not present, but their metadata is. Within each of the 750 directories is the information for images I have emailed or uploaded to my account. Even figures for images within email signatures are included.

I don’t tend to upload photographs with geolocation data embedded, but for those that were uploaded and later deleted, the .json files reveal the coordinates that were attached. As of writing, Google Photos gives users unlimited storage for high quality photos and videos at no [financial] cost. I’ll let you conclude why.

With all these files and their information available to me in my extracted archive, I will show you what is contained in my online Google Photos account:

A screenshot of a cell phone

Description automatically generated

That’s right. Absolutely nothing. Or well so I thought and so it would seem to most…

Investigation into the “My Activity” folder showed me the likely-overseen fact that closing a suggestion, swiping away an article on your connected device or accessing a website, application or viewing content anywhere via one of Google’s many services creates a log that tells Google what you are and aren’t interested in seeing. From a usefulness point-of-view, I entirely understand that – but when that information can be used to suggest things like, for example, you don’t care about what Meghan Markle has to say about the Royal Family – it paints a picture that is incredibly valuable to Google and their associates. It surprises me that there wasn’t a log of every event in my calendar that I had cancelled or postponed. Combine that with an entry in my location history and you could almost frame me for something in which I was not involved. So that leads me to my next paragraph:

This is pure speculation, but should that information be accessed by a hacker, law enforcement agency or other interested party; will they be able to label you a racist for not caring about what a politician had to say about neo-Nazism or discrimination of ones’ race? Will you be attributed as heartless – whether by person or machine – for swiping away that advertisement about the dying children in Africa needing your help? And will you find yourself facing the consequences of watching legal pornography or for spending a few minutes on a page where the author has expressed their opinion on something considered ‘dangerous’ to society?

Google’s Privacy Principles seem great. According to their Safety Centre, they vow to Respect their users and privacy, never sell personal information (the claim is that their free services are funded by relevant advertising) and empower people to remove their data should they wish. If “leading by example” is their wish, then we are in trouble. I am still in my early stages and I doubt I’ll ever be truly rid of Google – short of becoming a hermit who hides in the wilderness – but I have taken the early steps and it’s quite an interesting and redeeming experience. Dare I say it’s my first digital awakening in a journey of self-exploration?

As a business owner, it will be a little more involved to transition to other monetisation networks and discover the best of many alternative services available, but I know it will benefit both myself and any customers/visitors. I had previously moved away from Google’s Analytics system (which I used to understand where my visitors were coming from and how often) and I removed many of my published works from Googles’ stores due to concerns over customer information and misinformation and I am hoping to continue down that path.

I encourage you to take the first leap by removing yourself from one of your most-used Google services. I’m not suggesting you completely rip Google out of your life right now but consider the reality that giving Google less of your information will give you more control over your private life.

I think the first baby step is to download an archive of your information and see for yourself what the company is storing:

If you want to go through the data that Google has so-considerately maintained in a massive database then login and head to https://takeout.google.com/

I recommend setting your export to split files, as I’ve found that the downloads like to time-out and there’s a limit to how many retries you get before having to request another export. A lot of the files may be empty or contain no actual information. For example, if you have never used Google Chrome, you’re still likely to be given a bunch of files that don’t have anything to say – or you’ll get a lovely set of document that feature multiple lines saying “Your data is encrypted and cannot be exported”

Thanks for reading. I hope my experience offers an insight into the company and why I decided to DeGoogle.