Lightnews — Scholar-powered news

Differential Privacy @differentialprivacy.org · 41m

Evaluation of Differential Privacy Mechanisms on Federated Learning

Tejash Varsani

http://arxiv.org/abs/2510.09691

Differential Privacy @differentialprivacy.org · 41m

An information theorist's tour of differential privacy

Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar

http://arxiv.org/abs/2510.10316

Differential Privacy @differentialprivacy.org · 41m

Secret-Protected Evolution for Differentially Private Synthetic Text Generation

Tianze Wang, Zhaoyu Chen, Jian Du, Yingtai Xiao, Linjun Zhang, Qiang Yan

http://arxiv.org/abs/2510.10990

$Secret-Protected Evolution for Differentially Private Synthetic Text Generation Tianze Wang, Zhaoyu Chen, Jian Du, Yingtai Xiao, Linjun Zhang, Qiang Yan http://arxiv.org/abs/2510.10990 Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns. Therefore, differentially private (DP) synthetic text generation has been proposed, aiming to produce high-utility synthetic data while protecting sensitive information. However, existing DP synthetic text generation imposes uniform guarantees that often overprotect non-sensitive content, resulting in substantial utility loss and computational overhead. Therefore, we propose Secret-Protected Evolution (SecPE), a novel framework that extends private evolution with secret-aware protection. Theoretically, we show that SecPE satisfies $(\mathrm{p}, \mathrm{r})$-secret protection, constituting a relaxation of Gaussian DP that enables tighter utility-privacy trade-offs, while also substantially reducing computational complexity relative to baseline methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE consistently achieves lower Fr\'echet Inception Distance (FID) and higher downstream task accuracy than GDP-based Aug-PE baselines, while requiring less noise to attain the same level of protection. Our results highlight that secret-aware guarantees can unlock more practical and effective privacy-preserving synthetic text generation.$

Differential Privacy @differentialprivacy.org · 42m

N-output Mechanism: Estimating Statistical Information from Numerical Data under Local Differential Privacy

Incheol Baek, Yon Dohn Chung

http://arxiv.org/abs/2510.11116

$N-output Mechanism: Estimating Statistical Information from Numerical Data under Local Differential Privacy Incheol Baek, Yon Dohn Chung http://arxiv.org/abs/2510.11116 Local Differential Privacy (LDP) addresses significant privacy concerns in sensitive data collection. In this work, we focus on numerical data collection under LDP, targeting a significant gap in the literature: existing LDP mechanisms are optimized for either a very small ($|\Omega| \in \{2, 3\}$) or infinite output spaces. However, no generalized method for constructing an optimal mechanism for an arbitrary output size $N$ exists. To fill this gap, we propose the \textbf{N-output mechanism}, a generalized framework that maps numerical data to one of $N$ discrete outputs. We formulate the mechanism's design as an optimization problem to minimize estimation variance for any given $N \geq 2$ and develop both numerical and analytical solutions. This results in a mechanism that is highly accurate and adaptive, as its design is determined by solving an optimization problem for any chosen $N$. Furthermore, we extend our framework and existing mechanisms to the task of distribution estimation. Empirical evaluations show that the N-output mechanism achieves state-of-the-art accuracy for mean, variance, and distribution estimation with small communication costs.$

Differential Privacy @differentialprivacy.org · 42m

How to Get Actual Privacy and Utility from Privacy Models: the k-Anonymity and Differential Privacy Families

Josep Domingo-Ferrer, David Sánchez

http://arxiv.org/abs/2510.11299

Differential Privacy @differentialprivacy.org · 42m

Continual Release of Densest Subgraphs: Privacy Amplification & Sublinear Space via Subsampling

Felix Zhou

http://arxiv.org/abs/2510.11640

$Continual Release of Densest Subgraphs: Privacy Amplification & Sublinear Space via Subsampling Felix Zhou http://arxiv.org/abs/2510.11640 We study the sublinear space continual release model for edge-differentially private (DP) graph algorithms, with a focus on the densest subgraph problem (DSG) in the insertion-only setting. Our main result is the first continual release DSG algorithm that matches the additive error of the best static DP algorithms and the space complexity of the best non-private streaming algorithms, up to constants. The key idea is a refined use of subsampling that simultaneously achieves privacy amplification and sparsification, a connection not previously formalized in graph DP. Via a simple black-box reduction to the static setting, we obtain both pure and approximate-DP algorithms with $O(\log n)$ additive error and $O(n\log n)$ space, improving both accuracy and space complexity over the previous state of the art. Along the way, we introduce graph densification in the graph DP setting, adding edges to trigger earlier subsampling, which removes the extra logarithmic factors in error and space incurred by prior work [ELMZ25]. We believe this simple idea may be of independent interest.$

1

Differential Privacy @differentialprivacy.org · 12h

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Patrick Mesana, Gilles Caporossi, Sebastien Gambs

http://arxiv.org/abs/2510.08869

Differential Privacy @differentialprivacy.org · 12h

On the Fairness of Privacy Protection: Measuring and Mitigating the Disparity of Group Privacy Risks for Differentially Private Machine Learning

Zhi Yang, Changwu Huang, Ke Tang, Xin Yao

http://arxiv.org/abs/2510.09114

Differential Privacy @differentialprivacy.org · 12h

Locally Optimal Private Sampling: Beyond the Global Minimax

Hrad Ghoukasian, Bonwoo Lee, Shahab Asoodeh

http://arxiv.org/abs/2510.09485

$Locally Optimal Private Sampling: Beyond the Global Minimax Hrad Ghoukasian, Bonwoo Lee, Shahab Asoodeh http://arxiv.org/abs/2510.09485 We study the problem of sampling from a distribution under local differential privacy (LDP). Given a private distribution $P \in \mathcal{P}$, the goal is to generate a single sample from a distribution that remains close to $P$ in $f$-divergence while satisfying the constraints of LDP. This task captures the fundamental challenge of producing realistic-looking data under strong privacy guarantees. While prior work by Park et al. (NeurIPS'24) focuses on global minimax-optimality across a class of distributions, we take a local perspective. Specifically, we examine the minimax risk in a neighborhood around a fixed distribution $P_0$, and characterize its exact value, which depends on both $P_0$ and the privacy level. Our main result shows that the local minimax risk is determined by the global minimax risk when the distribution class $\mathcal{P}$ is restricted to a neighborhood around $P_0$. To establish this, we (1) extend previous work from pure LDP to the more general functional LDP framework, and (2) prove that the globally optimal functional LDP sampler yields the optimal local sampler when constrained to distributions near $P_0$. Building on this, we also derive a simple closed-form expression for the locally minimax-optimal samplers which does not depend on the choice of $f$-divergence. We further argue that this local framework naturally models private sampling with public data, where the public data distribution is represented by $P_0$. In this setting, we empirically compare our locally optimal sampler to existing global methods, and demonstrate that it consistently outperforms global minimax samplers.$

Differential Privacy @differentialprivacy.org · 4d

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

Anthony Hughes, Vasisht Duddu, N. Asokan, Nikolaos Aletras, Ning Ma

http://arxiv.org/abs/2510.07452

Differential Privacy @differentialprivacy.org · 5d

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma

http://arxiv.org/abs/2510.06719

Differential Privacy @differentialprivacy.org · 5d

Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency

Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith

http://arxiv.org/abs/2510.07136

Differential Privacy @differentialprivacy.org · 5d

Cocoon: A System Architecture for Differentially Private Training with Correlated Noises

Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng

http://arxiv.org/abs/2510.07304

Differential Privacy @differentialprivacy.org · 6d

DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping

Ruoxing Yang

http://arxiv.org/abs/2510.05288

Differential Privacy @differentialprivacy.org · 6d

Correlating Cross-Iteration Noise for DP-SGD using Model Curvature

Xin Gu, Yingtai Xiao, Guanlin He, Jiamu Bai, Daniel Kifer, Kiwan Maeng

http://arxiv.org/abs/2510.05416

1

Differential Privacy @differentialprivacy.org · 6d

Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

http://arxiv.org/abs/2510.05581

Differential Privacy @differentialprivacy.org · 6d

DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets

Shadi Rahimian, Mario Fritz

http://arxiv.org/abs/2510.05777

$DP-SNP-TIHMM: Differentially Private, Time-Inhomogeneous Hidden Markov Models for Synthesizing Genome-Wide Association Datasets Shadi Rahimian, Mario Fritz http://arxiv.org/abs/2510.05777 Single nucleotide polymorphism (SNP) datasets are fundamental to genetic studies but pose significant privacy risks when shared. The correlation of SNPs with each other makes strong adversarial attacks such as masked-value reconstruction, kin, and membership inference attacks possible. Existing privacy-preserving approaches either apply differential privacy to statistical summaries of these datasets or offer complex methods that require post-processing and the usage of a publicly available dataset to suppress or selectively share SNPs. In this study, we introduce an innovative framework for generating synthetic SNP sequence datasets using samples derived from time-inhomogeneous hidden Markov models (TIHMMs). To preserve the privacy of the training data, we ensure that each SNP sequence contributes only a bounded influence during training, enabling strong differential privacy guarantees. Crucially, by operating on full SNP sequences and bounding their gradient contributions, our method directly addresses the privacy risks introduced by their inherent correlations. Through experiments conducted on the real-world 1000 Genomes dataset, we demonstrate the efficacy of our method using privacy budgets of $\varepsilon \in [1, 10]$ at $\delta=10^{-4}$. Notably, by allowing the transition models of the HMM to be dependent on the location in the sequence, we significantly enhance performance, enabling the synthetic datasets to closely replicate the statistical properties of non-private datasets. This framework facilitates the private sharing of genomic data while offering researchers exceptional flexibility and utility.$

Differential Privacy @differentialprivacy.org · 6d

The Five Safes as a Privacy Context

James Bailie, Ruobin Gong

http://arxiv.org/abs/2510.05803

$The Five Safes as a Privacy Context James Bailie, Ruobin Gong http://arxiv.org/abs/2510.05803 The Five Safes is a framework used by national statistical offices (NSO) for assessing and managing the disclosure risk of data sharing. This paper makes two points: Firstly, the Five Safes can be understood as a specialization of a broader concept $\unicode{x2013}$ contextual integrity $\unicode{x2013}$ to the situation of statistical dissemination by an NSO. We demonstrate this by mapping the five parameters of contextual integrity onto the five dimensions of the Five Safes. Secondly, the Five Safes contextualizes narrow, technical notions of privacy within a holistic risk assessment. We demonstrate this with the example of differential privacy (DP). This contextualization allows NSOs to place DP within their Five Safes toolkit while also guiding the design of DP implementations within the broader privacy context, as delineated by both their regulation and the relevant social norms.$

Differential Privacy @differentialprivacy.org · 6d

Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation

Kaixiang Zhang, Zhaojian Li, Wei Lin

http://arxiv.org/abs/2510.05959

Differential Privacy @differentialprivacy.org · 7d

Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling

Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

http://arxiv.org/abs/2510.03860

$Privacy Enhancement in Over-the-Air Federated Learning via Adaptive Receive Scaling Faeze Moradi Kalarde, Ben Liang, Min Dong, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng http://arxiv.org/abs/2510.03860 In Federated Learning (FL) with over-the-air aggregation, the quality of the signal received at the server critically depends on the receive scaling factors. While a larger scaling factor can reduce the effective noise power and improve training performance, it also compromises the privacy of devices by reducing uncertainty. In this work, we aim to adaptively design the receive scaling factors across training rounds to balance the trade-off between training convergence and privacy in an FL system under dynamic channel conditions. We formulate a stochastic optimization problem that minimizes the overall R\'enyi differential privacy (RDP) leakage over the entire training process, subject to a long-term constraint that ensures convergence of the global loss function. Our problem depends on unknown future information, and we observe that standard Lyapunov optimization is not applicable. Thus, we develop a new online algorithm, termed AdaScale, based on a sequence of novel per-round problems that can be solved efficiently. We further derive upper bounds on the dynamic regret and constraint violation of AdaSacle, establishing that it achieves diminishing dynamic regret in terms of time-averaged RDP leakage while ensuring convergence of FL training to a stationary point. Numerical experiments on canonical classification tasks show that our approach effectively reduces RDP and DP leakages compared with state-of-the-art benchmarks without compromising learning performance.$

Differential Privacy @differentialprivacy.org · 7d

Multi-Class Support Vector Machine with Differential Privacy

Jinseong Park, Yujin Choi, Jaewook Lee

http://arxiv.org/abs/2510.04027

1

Differential Privacy @differentialprivacy.org · 7d

DP-HYPE: Distributed Differentially Private Hyperparameter Search

Johannes Liebenow, Thorsten Peinemann, Esfandiar Mohammadi

http://arxiv.org/abs/2510.04902

Differential Privacy @differentialprivacy.org · 7d

Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

http://arxiv.org/abs/2510.04979

Differential Privacy @differentialprivacy.org · 8d

Differentially Private Wasserstein Barycenters

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, Kristjan Greenewald

http://arxiv.org/abs/2510.03021

Differential Privacy @differentialprivacy.org · 11d

Private Learning of Littlestone Classes, Revisited

Xin Lyu

http://arxiv.org/abs/2510.00076

$Private Learning of Littlestone Classes, Revisited Xin Lyu http://arxiv.org/abs/2510.00076 We consider online and PAC learning of Littlestone classes subject to the constraint of approximate differential privacy. Our main result is a private learner to online-learn a Littlestone class with a mistake bound of $\tilde{O}(d^{9.5}\cdot \log(T))$ in the realizable case, where $d$ denotes the Littlestone dimension and $T$ the time horizon. This is a doubly-exponential improvement over the state-of-the-art [GL'21] and comes polynomially close to the lower bound for this task. The advancement is made possible by a couple of ingredients. The first is a clean and refined interpretation of the ``irreducibility'' technique from the state-of-the-art private PAC-learner for Littlestone classes [GGKM'21]. Our new perspective also allows us to improve the PAC-learner of [GGKM'21] and give a sample complexity upper bound of $\widetilde{O}(\frac{d^5 \log(1/\delta\beta)}{\varepsilon \alpha})$ where $\alpha$ and $\beta$ denote the accuracy and confidence of the PAC learner, respectively. This improves over [GGKM'21] by factors of $\frac{d}{\alpha}$ and attains an optimal dependence on $\alpha$. Our algorithm uses a private sparse selection algorithm to \emph{sample} from a pool of strongly input-dependent candidates. However, unlike most previous uses of sparse selection algorithms, where one only cares about the utility of output, our algorithm requires understanding and manipulating the actual distribution from which an output is drawn. In the proof, we use a sparse version of the Exponential Mechanism from [GKM'21] which behaves nicely under our framework and is amenable to a very easy utility proof.$

2