Even if you have encrypted your traffic with a VPN (or the Tor Network), advanced traffic analysis is a growing threat against your privacy. Therefore, we now introduce DAITA.
Through constant packet sizes, random background traffic and data pattern distortion we are taking the first step in our battle against sophisticated traffic analysis.
The point is that for a state actor which can watch (or at least buy detailed traffic data for) both ends, a certain pattern of packets happenning from your side to a known Tor entry node and the exact same pattern between a specific server being watched and a known Tor exit node on the other side will indicate that it’s your machine connecting to that end server, the more such patterns spotted the higher the level of confidence.
This is quite independent of how much your data is mixed with other data inside the Tor network and how many nodes it has been routed around, because this kind of analysis doesn’t care about the IP address your machine is sending requests to or the IP address the watched server is receiving request from, it only cares about your pattern of data requests and responses matching that server’s pattern of received requests and returned responses.
Whatever protocol is in the middle is wholly irrelevant. At best if the website is heavilly used and you’re lucky, the specific end node (be it the router on the other side of your VPN connection or the exit node of your Tor connection) sending your requests to that server might have other users also sending requests to that server hence you’re all disguising each other’s pattern, but this is to do with popularity of the service more than the protocol itself being good at defeating this kind of analysis.
Edit
This is not entirelly true - if the protocol changes the exit node between requests to the server then it can disguise your pattern. However given that changing the IP address from were the request comes breaks all the keep-alive performance optimizations in HTTP since v1.1, performance would be horrible at least for web browsing in modern websites (which have tons of additional content associated with a typical webpage).
/Edit
It’s all there in the Mullvad post (so you need to actually read it) and it helps if you have a background in IT Security and Cryptography since there are kinds of attack using similar mathematical principles in other areas (such as the statistical analysis of unchained symetrical encryption protocols to derive the text from the encrypted text based on the probability of the words and letters occuring in a specific pattern or the power consumption analysis of cryptographic microchips such as those in smartcards to derive the encryption keys based on the way power was drawn by the ALU during encryption and decryption, a weakness which was funnilly enough also defeated by adding noise in the form of junk operations).
It’s all pretty obvious, really ;)