Differential Privacy
@differentialprivacy.org
470 followers 0 following 860 posts
🤖 new arXiv preprints mentioning "differential privacy" or "differentially private" in the title/abstract/metadata + updates from https://differentialprivacy.org [Under construction.]
Posts Media Videos Starter Packs
differentialprivacy.org
Evaluation of Differential Privacy Mechanisms on Federated Learning

Tejash Varsani

http://arxiv.org/abs/2510.09691
Evaluation of Differential Privacy Mechanisms on Federated Learning

Tejash Varsani

http://arxiv.org/abs/2510.09691

Federated learning is distributed model training across several clients
without disclosing raw data. Despite advancements in data privacy, risks still
remain. Differential Privacy (DP) is a technique to protect sensitive data by
adding noise to model updates, usually controlled by a fixed privacy budget.
However, this approach can introduce excessive noise, particularly when the
model converges, which compromises performance. To address this problem,
adaptive privacy budgets have been investigated as a potential solution. This
work implements DP methods using Laplace and Gaussian mechanisms with an
adaptive privacy budget, extending the SelecEval simulator. We introduce an
adaptive clipping approach in the Gaussian mechanism, ensuring that gradients
of the model are dynamically updated rather than using a fixed sensitivity. We
conduct extensive experiments with various privacy budgets, IID and non-IID
datasets, and different numbers of selected clients per round. While our
experiments were limited to 200 training rounds, the results suggest that
adaptive privacy budgets and adaptive clipping can help maintain model accuracy
while preserving privacy.
differentialprivacy.org
An information theorist's tour of differential privacy

Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

http://arxiv.org/abs/2510.10316
An information theorist's tour of differential privacy

Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

http://arxiv.org/abs/2510.10316

Since being proposed in 2006, differential privacy has become a standard
method for quantifying certain risks in publishing or sharing analyses of
sensitive data. At its heart, differential privacy measures risk in terms of
the differences between probability distributions, which is a central topic in
information theory. A differentially private algorithm is a channel between the
underlying data and the output of the analysis. Seen in this way, the
guarantees made by differential privacy can be understood in terms of
properties of this channel. In this article we examine a few of the key
connections between information theory and the formulation/application of
differential privacy, giving an ``operational significance'' for relevant
information measures.
differentialprivacy.org
Secret-Protected Evolution for Differentially Private Synthetic Text Generation

Tianze Wang, Zhaoyu Chen, Jian Du, Yingtai Xiao, Linjun Zhang, Qiang Yan

http://arxiv.org/abs/2510.10990
Secret-Protected Evolution for Differentially Private Synthetic Text Generation

Tianze Wang, Zhaoyu Chen, Jian Du, Yingtai Xiao, Linjun Zhang, Qiang Yan

http://arxiv.org/abs/2510.10990

Text data has become extremely valuable on large language models (LLMs) and
even lead to general artificial intelligence (AGI). A lot of high-quality text
in the real world is private and cannot be freely used due to privacy concerns.
Therefore, differentially private (DP) synthetic text generation has been
proposed, aiming to produce high-utility synthetic data while protecting
sensitive information. However, existing DP synthetic text generation imposes
uniform guarantees that often overprotect non-sensitive content, resulting in
substantial utility loss and computational overhead. Therefore, we propose
Secret-Protected Evolution (SecPE), a novel framework that extends private
evolution with secret-aware protection. Theoretically, we show that SecPE
satisfies $(\mathrm{p}, \mathrm{r})$-secret protection, constituting a
relaxation of Gaussian DP that enables tighter utility-privacy trade-offs,
while also substantially reducing computational complexity relative to baseline
methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE
consistently achieves lower Fr\'echet Inception Distance (FID) and higher
downstream task accuracy than GDP-based Aug-PE baselines, while requiring less
noise to attain the same level of protection. Our results highlight that
secret-aware guarantees can unlock more practical and effective
privacy-preserving synthetic text generation.
differentialprivacy.org
N-output Mechanism: Estimating Statistical Information from Numerical Data under Local Differential Privacy

Incheol Baek, Yon Dohn Chung

http://arxiv.org/abs/2510.11116
N-output Mechanism: Estimating Statistical Information from Numerical Data under Local Differential Privacy

Incheol Baek, Yon Dohn Chung

http://arxiv.org/abs/2510.11116

Local Differential Privacy (LDP) addresses significant privacy concerns in
sensitive data collection. In this work, we focus on numerical data collection
under LDP, targeting a significant gap in the literature: existing LDP
mechanisms are optimized for either a very small ($|\Omega| \in \{2, 3\}$) or
infinite output spaces. However, no generalized method for constructing an
optimal mechanism for an arbitrary output size $N$ exists. To fill this gap, we
propose the \textbf{N-output mechanism}, a generalized framework that maps
numerical data to one of $N$ discrete outputs.
  We formulate the mechanism's design as an optimization problem to minimize
estimation variance for any given $N \geq 2$ and develop both numerical and
analytical solutions. This results in a mechanism that is highly accurate and
adaptive, as its design is determined by solving an optimization problem for
any chosen $N$. Furthermore, we extend our framework and existing mechanisms to
the task of distribution estimation. Empirical evaluations show that the
N-output mechanism achieves state-of-the-art accuracy for mean, variance, and
distribution estimation with small communication costs.
differentialprivacy.org
How to Get Actual Privacy and Utility from Privacy Models: the k-Anonymity and Differential Privacy Families

Josep Domingo-Ferrer, David Sánchez

http://arxiv.org/abs/2510.11299
How to Get Actual Privacy and Utility from Privacy Models: the k-Anonymity and Differential Privacy Families

Josep Domingo-Ferrer, David Sánchez

http://arxiv.org/abs/2510.11299

Privacy models were introduced in privacy-preserving data publishing and
statistical disclosure control with the promise to end the need for costly
empirical assessment of disclosure risk. We examine how well this promise is
kept by the main privacy models. We find they may fail to provide adequate
protection guarantees because of problems in their definition or incur
unacceptable trade-offs between privacy protection and utility preservation.
Specifically, k-anonymity may not entirely exclude disclosure if enforced with
deterministic mechanisms or without constraints on the confidential values. On
the other hand, differential privacy (DP) incurs unacceptable utility loss for
small budgets and its privacy guarantee becomes meaningless for large budgets.
In the latter case, an ex post empirical assessment of disclosure risk becomes
necessary, undermining the main appeal of privacy models. Whereas the utility
preservation of DP can only be improved by relaxing its privacy guarantees, we
argue that a semantic reformulation of k-anonymity can offer more robust
privacy without losing utility with respect to traditional syntactic
k-anonymity.
differentialprivacy.org
Continual Release of Densest Subgraphs: Privacy Amplification & Sublinear Space via Subsampling

Felix Zhou

http://arxiv.org/abs/2510.11640
Continual Release of Densest Subgraphs: Privacy Amplification & Sublinear Space via Subsampling

Felix Zhou

http://arxiv.org/abs/2510.11640

We study the sublinear space continual release model for edge-differentially
private (DP) graph algorithms, with a focus on the densest subgraph problem
(DSG) in the insertion-only setting. Our main result is the first continual
release DSG algorithm that matches the additive error of the best static DP
algorithms and the space complexity of the best non-private streaming
algorithms, up to constants. The key idea is a refined use of subsampling that
simultaneously achieves privacy amplification and sparsification, a connection
not previously formalized in graph DP. Via a simple black-box reduction to the
static setting, we obtain both pure and approximate-DP algorithms with $O(\log
n)$ additive error and $O(n\log n)$ space, improving both accuracy and space
complexity over the previous state of the art. Along the way, we introduce
graph densification in the graph DP setting, adding edges to trigger earlier
subsampling, which removes the extra logarithmic factors in error and space
incurred by prior work [ELMZ25]. We believe this simple idea may be of
independent interest.
differentialprivacy.org
Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Patrick Mesana, Gilles Caporossi, Sebastien Gambs

http://arxiv.org/abs/2510.08869
Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Patrick Mesana, Gilles Caporossi, Sebastien Gambs

http://arxiv.org/abs/2510.08869

Data valuation methods assign marginal utility to each data point that has
contributed to the training of a machine learning model. If used directly as a
payout mechanism, this creates a hidden cost of valuation, in which
contributors with near-zero marginal value would receive nothing, even though
their data had to be collected and assessed. To better formalize this cost, we
introduce a conceptual and game-theoretic model, the Information Disclosure
Game, between a Data Union (sometimes also called a data trust), a member-run
agent representing contributors, and a Data Consumer (e.g., a platform). After
first aggregating members' data, the DU releases information progressively by
adding Laplacian noise under a differentially-private mechanism. Through
simulations with strategies guided by data Shapley values and multi-armed
bandit exploration, we demonstrate on a Yelp review helpfulness prediction task
that data valuation inherently incurs an explicit acquisition cost and that the
DU's collective disclosure policy changes how this cost is distributed across
members.
differentialprivacy.org
On the Fairness of Privacy Protection: Measuring and Mitigating the Disparity of Group Privacy Risks for Differentially Private Machine Learning

Zhi Yang, Changwu Huang, Ke Tang, Xin Yao

http://arxiv.org/abs/2510.09114
On the Fairness of Privacy Protection: Measuring and Mitigating the Disparity of Group Privacy Risks for Differentially Private Machine Learning

Zhi Yang, Changwu Huang, Ke Tang, Xin Yao

http://arxiv.org/abs/2510.09114

While significant progress has been made in conventional fairness-aware
machine learning (ML) and differentially private ML (DPML), the fairness of
privacy protection across groups remains underexplored. Existing studies have
proposed methods to assess group privacy risks, but these are based on the
average-case privacy risks of data records. Such approaches may underestimate
the group privacy risks, thereby potentially underestimating the disparity
across group privacy risks. Moreover, the current method for assessing the
worst-case privacy risks of data records is time-consuming, limiting their
practical applicability. To address these limitations, we introduce a novel
membership inference game that can efficiently audit the approximate worst-case
privacy risks of data records. Experimental results demonstrate that our method
provides a more stringent measurement of group privacy risks, yielding a
reliable assessment of the disparity in group privacy risks. Furthermore, to
promote privacy protection fairness in DPML, we enhance the standard DP-SGD
algorithm with an adaptive group-specific gradient clipping strategy, inspired
by the design of canaries in differential privacy auditing studies. Extensive
experiments confirm that our algorithm effectively reduces the disparity in
group privacy risks, thereby enhancing the fairness of privacy protection in
DPML.
differentialprivacy.org
Locally Optimal Private Sampling: Beyond the Global Minimax

Hrad Ghoukasian, Bonwoo Lee, Shahab Asoodeh

http://arxiv.org/abs/2510.09485
Locally Optimal Private Sampling: Beyond the Global Minimax

Hrad Ghoukasian, Bonwoo Lee, Shahab Asoodeh

http://arxiv.org/abs/2510.09485

We study the problem of sampling from a distribution under local differential
privacy (LDP). Given a private distribution $P \in \mathcal{P}$, the goal is to
generate a single sample from a distribution that remains close to $P$ in
$f$-divergence while satisfying the constraints of LDP. This task captures the
fundamental challenge of producing realistic-looking data under strong privacy
guarantees. While prior work by Park et al. (NeurIPS'24) focuses on global
minimax-optimality across a class of distributions, we take a local
perspective. Specifically, we examine the minimax risk in a neighborhood around
a fixed distribution $P_0$, and characterize its exact value, which depends on
both $P_0$ and the privacy level. Our main result shows that the local minimax
risk is determined by the global minimax risk when the distribution class
$\mathcal{P}$ is restricted to a neighborhood around $P_0$. To establish this,
we (1) extend previous work from pure LDP to the more general functional LDP
framework, and (2) prove that the globally optimal functional LDP sampler
yields the optimal local sampler when constrained to distributions near $P_0$.
Building on this, we also derive a simple closed-form expression for the
locally minimax-optimal samplers which does not depend on the choice of
$f$-divergence. We further argue that this local framework naturally models
private sampling with public data, where the public data distribution is
represented by $P_0$. In this setting, we empirically compare our locally
optimal sampler to existing global methods, and demonstrate that it
consistently outperforms global minimax samplers.
differentialprivacy.org
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

Anthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma

http://arxiv.org/abs/2510.07452
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

Anthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma

http://arxiv.org/abs/2510.07452

Language models (LMs) may memorize personally identifiable information (PII)
from training data, enabling adversaries to extract it during inference.
Existing defense mechanisms such as differential privacy (DP) reduce this
leakage, but incur large drops in utility. Based on a comprehensive study using
circuit discovery to identify the computational circuits responsible PII
leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should
be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware
Targeted Circuit PatcHing), a novel approach that first identifies and
subsequently directly edits PII circuits to reduce leakage. PATCH achieves
better privacy-utility trade-off than existing defenses, e.g., reducing recall
of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to
reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis
shows that PII leakage circuits persist even after the application of existing
defense mechanisms. In contrast, PATCH can effectively mitigate their impact.
differentialprivacy.org
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by
grounding them in external knowledge. However, its application in sensitive
domains is limited by privacy risks. Existing private RAG methods typically
rely on query-time differential privacy (DP), which requires repeated noise
injection and leads to accumulated privacy loss. To address this issue, we
propose DP-SynRAG, a framework that uses LLMs to generate differentially
private synthetic RAG databases. Unlike prior methods, the synthetic text can
be reused once created, thereby avoiding repeated noise injection and
additional privacy costs. To preserve essential information for downstream RAG
tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate
text that mimics subsampled database records in a DP manner. Experiments show
that DP-SynRAG achieves superior performanec to the state-of-the-art private
RAG systems while maintaining a fixed privacy budget, offering a scalable
solution for privacy-preserving RAG.
differentialprivacy.org
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136
Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136

We study the problem of spectral graph clustering under edge differential
privacy (DP). Specifically, we develop three mechanisms: (i) graph perturbation
via randomized edge flipping combined with adjacency matrix shuffling, which
enforces edge privacy while preserving key spectral properties of the graph.
Importantly, shuffling considerably amplifies the guarantees: whereas flipping
edges with a fixed probability alone provides only a constant epsilon edge DP
guarantee as the number of nodes grows, the shuffled mechanism achieves
(epsilon, delta) edge DP with parameters that tend to zero as the number of
nodes increase; (ii) private graph projection with additive Gaussian noise in a
lower-dimensional space to reduce dimensionality and computational complexity;
and (iii) a noisy power iteration method that distributes Gaussian noise across
iterations to ensure edge DP while maintaining convergence. Our analysis
provides rigorous privacy guarantees and a precise characterization of the
misclassification error rate. Experiments on synthetic and real-world networks
validate our theoretical analysis and illustrate the practical privacy-utility
trade-offs.
differentialprivacy.org
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304

Machine learning (ML) models memorize and leak training data, causing serious
privacy issues to data owners. Training algorithms with differential privacy
(DP), such as DP-SGD, have been gaining attention as a solution. However,
DP-SGD adds a noise at each training iteration, which degrades the accuracy of
the trained model. To improve accuracy, a new family of approaches adds
carefully designed correlated noises, so that noises cancel out each other
across iterations. We performed an extensive characterization study of these
new mechanisms, for the first time to the best of our knowledge, and show they
incur non-negligible overheads when the model is large or uses large embedding
tables. Motivated by the analysis, we propose Cocoon, a hardware-software
co-designed framework for efficient training with correlated noises. Cocoon
accelerates models with embedding tables through pre-computing and storing
correlated noises in a coalesced format (Cocoon-Emb), and supports large models
through a custom near-memory processing device (Cocoon-NMP). On a real system
with an FPGA-based NMP device prototype, Cocoon improves the performance by
2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).
differentialprivacy.org
DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288
DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288

Large language models (LLMs) such as ChatGPT have evolved into powerful and
ubiquitous tools. Fine-tuning on small datasets allows LLMs to acquire
specialized skills for specific tasks efficiently. Although LLMs provide great
utility in both general and task-specific use cases, they are limited by two
security-related concerns. First, traditional LLM hardware requirements make
them infeasible to run locally on consumer-grade devices. A remote network
connection with the LLM provider's server is usually required, making the
system vulnerable to network attacks. Second, fine-tuning an LLM for a
sensitive task may involve sensitive data. Non-private fine-tuning algorithms
produce models vulnerable to training data reproduction attacks. Our work
addresses these security concerns by enhancing differentially private
optimization algorithms and applying them to fine-tune localizable language
models. We introduce adaptable gradient clipping along with other engineering
enhancements to the standard DP-Adam optimizer to create DP-Adam-AC. We use our
optimizer to fine-tune examples of two localizable LLM designs, small language
model (Qwen2.5-0.5B) and 1.58 bit quantization (Bitnet-b1.58-2B). We
demonstrate promising improvements in loss through experimentation with two
synthetic datasets.
differentialprivacy.org
Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416
Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416

Differentially private stochastic gradient descent (DP-SGD) offers the
promise of training deep learning models while mitigating many privacy risks.
However, there is currently a large accuracy gap between DP-SGD and normal SGD
training. This has resulted in different lines of research investigating
orthogonal ways of improving privacy-preserving training. One such line of
work, known as DP-MF, correlates the privacy noise across different iterations
of stochastic gradient descent -- allowing later iterations to cancel out some
of the noise added to earlier iterations. In this paper, we study how to
improve this noise correlation. We propose a technique called NoiseCurve that
uses model curvature, estimated from public unlabeled data, to improve the
quality of this cross-iteration noise correlation. Our experiments on various
datasets, models, and privacy parameters show that the noise correlations
computed by NoiseCurve offer consistent and significant improvements in
accuracy over the correlation scheme used by DP-MF.
differentialprivacy.org
Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581
Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581

Traditional collaborative learning approaches are based on sharing of model
weights between clients and a server. However, there are advantages to resource
efficiency through schemes based on sharing of embeddings (activations) created
from the data. Several differentially private methods were developed for
sharing of weights while such mechanisms do not exist so far for sharing of
embeddings. We propose Ours to learn a privacy encoding network in conjunction
with a small utility generation network such that the final embeddings
generated from it are equipped with formal differential privacy guarantees.
These privatized embeddings are then shared with a more powerful server, that
learns a post-processing that results in a higher accuracy for machine learning
tasks. We show that our co-design of collaborative and private learning results
in requiring only one round of privatized communication and lesser compute on
the client than traditional methods. The privatized embeddings that we share
from the client are agnostic to the type of model (deep learning, random
forests or XGBoost) used on the server in order to process these activations to
complete a task.
differentialprivacy.org
DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777
DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777

Single nucleotide polymorphism (SNP) datasets are fundamental to genetic
studies but pose significant privacy risks when shared. The correlation of SNPs
with each other makes strong adversarial attacks such as masked-value
reconstruction, kin, and membership inference attacks possible. Existing
privacy-preserving approaches either apply differential privacy to statistical
summaries of these datasets or offer complex methods that require
post-processing and the usage of a publicly available dataset to suppress or
selectively share SNPs.
  In this study, we introduce an innovative framework for generating synthetic
SNP sequence datasets using samples derived from time-inhomogeneous hidden
Markov models (TIHMMs). To preserve the privacy of the training data, we ensure
that each SNP sequence contributes only a bounded influence during training,
enabling strong differential privacy guarantees. Crucially, by operating on
full SNP sequences and bounding their gradient contributions, our method
directly addresses the privacy risks introduced by their inherent correlations.
  Through experiments conducted on the real-world 1000 Genomes dataset, we
demonstrate the efficacy of our method using privacy budgets of $\varepsilon
\in [1, 10]$ at $\delta=10^{-4}$. Notably, by allowing the transition models of
the HMM to be dependent on the location in the sequence, we significantly
enhance performance, enabling the synthetic datasets to closely replicate the
statistical properties of non-private datasets. This framework facilitates the
private sharing of genomic data while offering researchers exceptional
flexibility and utility.
differentialprivacy.org
The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803
The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803

The Five Safes is a framework used by national statistical offices (NSO) for
assessing and managing the disclosure risk of data sharing. This paper makes
two points: Firstly, the Five Safes can be understood as a specialization of a
broader concept $\unicode{x2013}$ contextual integrity $\unicode{x2013}$ to the
situation of statistical dissemination by an NSO. We demonstrate this by
mapping the five parameters of contextual integrity onto the five dimensions of
the Five Safes. Secondly, the Five Safes contextualizes narrow, technical
notions of privacy within a holistic risk assessment. We demonstrate this with
the example of differential privacy (DP). This contextualization allows NSOs to
place DP within their Five Safes toolkit while also guiding the design of DP
implementations within the broader privacy context, as delineated by both their
regulation and the relevant social norms.
differentialprivacy.org
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959

Distributed control of connected and automated vehicles has attracted
considerable interest for its potential to improve traffic efficiency and
safety. However, such control schemes require sharing privacy-sensitive vehicle
data, which introduces risks of information leakage and potential malicious
activities. This paper investigates the stability and privacy-preserving
properties of distributed platoon control under two types of quantizers:
deterministic and probabilistic. For deterministic quantization, we show that
the resulting control strategy ensures the system errors remain uniformly
ultimately bounded. Moreover, in the absence of auxiliary information, an
eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the
use of probabilistic quantization enables asymptotic convergence of the vehicle
platoon in expectation with bounded variance. Importantly, probabilistic
quantizers can satisfy differential privacy guarantees, thereby preserving
privacy even when the eavesdropper possesses arbitrary auxiliary information.
We further analyze the trade-off between control performance and privacy by
formulating an optimization problem that characterizes the impact of the
quantization step on both metrics. Numerical simulations are provided to
illustrate the performance differences between the two quantization strategies.
differentialprivacy.org
Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860
Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860

In Federated Learning (FL) with over-the-air aggregation, the quality of the
signal received at the server critically depends on the receive scaling
factors. While a larger scaling factor can reduce the effective noise power and
improve training performance, it also compromises the privacy of devices by
reducing uncertainty. In this work, we aim to adaptively design the receive
scaling factors across training rounds to balance the trade-off between
training convergence and privacy in an FL system under dynamic channel
conditions. We formulate a stochastic optimization problem that minimizes the
overall R\'enyi differential privacy (RDP) leakage over the entire training
process, subject to a long-term constraint that ensures convergence of the
global loss function. Our problem depends on unknown future information, and we
observe that standard Lyapunov optimization is not applicable. Thus, we develop
a new online algorithm, termed AdaScale, based on a sequence of novel per-round
problems that can be solved efficiently. We further derive upper bounds on the
dynamic regret and constraint violation of AdaSacle, establishing that it
achieves diminishing dynamic regret in terms of time-averaged RDP leakage while
ensuring convergence of FL training to a stationary point. Numerical
experiments on canonical classification tasks show that our approach
effectively reduces RDP and DP leakages compared with state-of-the-art
benchmarks without compromising learning performance.
differentialprivacy.org
Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027
Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027

With the increasing need to safeguard data privacy in machine learning
models, differential privacy (DP) is one of the major frameworks to build
privacy-preserving models. Support Vector Machines (SVMs) are widely used
traditional machine learning models due to their robust margin guarantees and
strong empirical performance in binary classification. However, applying DP to
multi-class SVMs is inadequate, as the standard one-versus-rest (OvR) and
one-versus-one (OvO) approaches repeatedly query each data sample when building
multiple binary classifiers, thus consuming the privacy budget proportionally
to the number of classes. To overcome this limitation, we explore all-in-one
SVM approaches for DP, which access each data sample only once to construct
multi-class SVM boundaries with margin maximization properties. We propose a
novel differentially Private Multi-class SVM (PMSVM) with weight and gradient
perturbation methods, providing rigorous sensitivity and convergence analyses
to ensure DP in all-in-one SVMs. Empirical results demonstrate that our
approach surpasses existing DP-SVM methods in multi-class scenarios.
differentialprivacy.org
DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902
DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902

The tuning of hyperparameters in distributed machine learning can
substantially impact model performance. When the hyperparameters are tuned on
sensitive data, privacy becomes an important challenge and to this end,
differential privacy has emerged as the de facto standard for provable privacy.
A standard setting when performing distributed learning tasks is that clients
agree on a shared setup, i.e., find a compromise from a set of hyperparameters,
like the learning rate of the model to be trained. Yet, prior work on
differentially private hyperparameter tuning either uses computationally
expensive cryptographic protocols, determines hyperparameters separately for
each client, or applies differential privacy locally, which can lead to
undesirable utility-privacy trade-offs.
  In this work, we present our algorithm DP-HYPE, which performs a distributed
and privacy-preserving hyperparameter search by conducting a distributed voting
based on local hyperparameter evaluations of clients. In this way, DP-HYPE
selects hyperparameters that lead to a compromise supported by the majority of
clients, while maintaining scalability and independence from specific learning
tasks. We prove that DP-HYPE preserves the strong notion of differential
privacy called client-level differential privacy and, importantly, show that
its privacy guarantees do not depend on the number of hyperparameters. We also
provide bounds on its utility guarantees, that is, the probability of reaching
a compromise, and implement DP-HYPE as a submodule in the popular Flower
framework for distributed machine learning. In addition, we evaluate
performance on multiple benchmark data sets in iid as well as multiple non-iid
settings and demonstrate high utility of DP-HYPE even under small privacy
budgets.
differentialprivacy.org
Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979
Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are
fundamental tools for evaluating machine learning classifiers, offering
detailed insights into the trade-offs between true positive rate vs. false
positive rate (ROC) or precision vs. recall (PR). However, in Federated
Learning (FL) scenarios, where data is distributed across multiple clients,
computing these curves is challenging due to privacy and communication
constraints. Specifically, the server cannot access raw prediction scores and
class labels, which are used to compute the ROC and PR curves in a centralized
setting. In this paper, we propose a novel method for approximating ROC and PR
curves in a federated setting by estimating quantiles of the prediction score
distribution under distributed differential privacy. We provide theoretical
bounds on the Area Error (AE) between the true and estimated curves,
demonstrating the trade-offs between approximation accuracy, privacy, and
communication cost. Empirical results on real-world datasets demonstrate that
our method achieves high approximation accuracy with minimal communication and
strong privacy guarantees, making it practical for privacy-preserving model
evaluation in federated systems.
differentialprivacy.org
Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021
Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021

The Wasserstein barycenter is defined as the mean of a set of probability
measures under the optimal transport metric, and has numerous applications
spanning machine learning, statistics, and computer graphics. In practice these
input measures are empirical distributions built from sensitive datasets,
motivating a differentially private (DP) treatment. We present, to our
knowledge, the first algorithms for computing Wasserstein barycenters under
differential privacy. Empirically, on synthetic data, MNIST, and large-scale
U.S. population datasets, our methods produce high-quality private barycenters
with strong accuracy-privacy tradeoffs.
differentialprivacy.org
Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076
Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076

We consider online and PAC learning of Littlestone classes subject to the
constraint of approximate differential privacy. Our main result is a private
learner to online-learn a Littlestone class with a mistake bound of
$\tilde{O}(d^{9.5}\cdot \log(T))$ in the realizable case, where $d$ denotes the
Littlestone dimension and $T$ the time horizon. This is a doubly-exponential
improvement over the state-of-the-art [GL'21] and comes polynomially close to
the lower bound for this task.
  The advancement is made possible by a couple of ingredients. The first is a
clean and refined interpretation of the ``irreducibility'' technique from the
state-of-the-art private PAC-learner for Littlestone classes [GGKM'21]. Our new
perspective also allows us to improve the PAC-learner of [GGKM'21] and give a
sample complexity upper bound of $\widetilde{O}(\frac{d^5
\log(1/\delta\beta)}{\varepsilon \alpha})$ where $\alpha$ and $\beta$ denote
the accuracy and confidence of the PAC learner, respectively. This improves
over [GGKM'21] by factors of $\frac{d}{\alpha}$ and attains an optimal
dependence on $\alpha$.
  Our algorithm uses a private sparse selection algorithm to \emph{sample} from
a pool of strongly input-dependent candidates. However, unlike most previous
uses of sparse selection algorithms, where one only cares about the utility of
output, our algorithm requires understanding and manipulating the actual
distribution from which an output is drawn. In the proof, we use a sparse
version of the Exponential Mechanism from [GKM'21] which behaves nicely under
our framework and is amenable to a very easy utility proof.