Can someone explain Differential Privacy to me?
I have recently been reading more carefully into differential privacy. There are some things that I have been very confused about.
My initial understanding is that the purpose of differential privacy is to be able to publish statistics. A piece of data is defined to be differentially private if two datasets, D1, and D2 that, differ by a single unit, according to the original text, uses the l1 distance.
The expression for differential privacy is given as the following
P( M(D1) ∈ S) <= P( M(D2) ∈ S) * e𝜀
So then for this purpose, the function M defined from the data, outputting the statistic would be the function involved in the differential privacy equation. Then this means that we would be comparing the probability vectors of each of the possible values in the image of the function M. For computation's sake, we can say the rationals are bounded by the number of floating points. I hope everything is correct so far.
However, my confusion comes from some works I have read about applying it, especially to machine learning algorithms. However, the definition of function M is very unclear in these cases, so can someone explain to me what function M actually is? In my opinion, function M is not a mechanism such as the Laplacian but a function that needs to be evaluated.
For example, if we want to evaluate the average age of individuals, the image would be the rational numbers [0, 100], and we would need to evaluate the difference in the distribution of obtaining specific values. This distribution would be created through the sampling of neighboring datasets of the original. This makes sense to me, but this function of M becomes very unclear for ML algorithms.
I feel like I am misunderstanding things because I'm not very good at real analysis.