Aren’t mobile phones using mac randomization these days? When they send out probe requests, they are not doing it by broadcasting their real mac address, this globally unique identifier, or are they?
I have been asked that a few times recently. The below is a summary of how things look from my perspective.
I live in the suburbs of Bankok and I have 7 days worth of mac addresses in my database. The mac addresses were collected with the help of my laptop’s wi-fi adapter in monitor mode. No special antenna or other dedicated hardware, just some freely available software.
Our house is not too far from a street with light car traffic. Most of the mac addresses are probably from devices in passing cars.
So, after one week, this is what has been collected:
A little more than half of the mac addresses collected are randomized mac addresses.
I group them by looking at the second last bit of the first byte of the mac address. If it’s a ‘0’ I count the mac address as a hardware mac, if it is a ‘1’ I count it as a randomized mac address, which sounds better than ‘locally administered mac address‘ as they are referred to in IEEE documents.
Recent reports have shown that the randomization schemes applied by Android and Apple phones have significant weaknesses, but let’s assume these weaknesses will eventually be fixed and in a purely passive traffic monitoring exercise one can only collect properly randomized mac addresses.
What can still be learned from wifi packets with a randomized mac address?
The below is an example from my data set:
Well, we know for one that a device was observed. We also know for how long this device was communicating with the same randomized mac address. Then we know that the total traffic observed was 71,817 bytes.
Most Google devices use ‘daa119’ as the first half of randomized mac addresses. Google actually bought that address space for that purpose. Since the first half of the mac address above is different, we also know that it is most likely not one of a great many Android devices who use this.
In my data set of 5,896 randomized mac addresses 823 start with ‘daa119’.
The main thing one cannot know with perfectly implemented randomization, is when the same device stays around for longer or comes back after a while. Without randomization this looks like the below:
I don’t know what device this is, but I have a lot more information to find out. I see presence patterns. I also see that the device is an Apple device. If this specific device appears again sometime in the future, I will recognize it as the same device.
With a bigger effort it is currently possible to link multiple randomized mac addresses generated by the same device. According to the report referenced above it is even possible to extract the hardware mac in many cases, even with strictly passive monitoring.
If one would be willing to go active and actually send out reply packages to probe requests from devices with randomized mac addresses, the devices would reply readily with their real hardware mac – the globally unique identifier – and the hiding game is over.
I have not done that, but it could be done easily. So easily that it is not even a challenge for a hobbyist like me.
The conclusion for me is that yes, mac address randomization, if done correctly, reduces the information content of packets anybody can pull out of the air, but it cannot be relied upon, because it is still too easy to get the real hardware mac from any device.
This does not make it useless. I look at it like a bicycle lock. With the right tools any bicycle lock can be broken, but still, it feels safer to have one.