Talks

Paper Deep Dive – FedRecovery

August 20, 2025

Summary:
For the lab meeting, I prepared a review of FedRecovery: Differentially Private Machine Unlearning for Federated Learning Frameworks. Machine unlearning aims to make models “forget” specific client data upon deletion requests. Unlike retraining-based solutions, which are often infeasible or risky in federated learning, FedRecovery introduces an efficient method to erase a client’s influence from the global model using a weighted sum of gradient residuals and differential privacy noise, without assuming convexity.

🔑 Research Question:

Can we efficiently find a model that performs similarly to the retrained one?

⚙️ Key Mechanism:

Removes client contributions via weighted gradient residual subtraction.
Adds carefully tailored Gaussian noise to guarantee indistinguishability between unlearned and retrained models.
Does not require retraining-based calibration or convexity assumptions.

📊 Main Results:

Achieves statistical indistinguishability between unlearned and retrained models.
Experimental results on real-world datasets show comparable accuracy to retrained models.
Significantly more efficient than retraining-based approaches.

⚠️ Limitations / Open Questions:

Trade-offs in noise calibration vs. model utility.
Applicability to very large-scale, complex neural networks not fully explored.

❓ Data Privacy Problem

Paper Assumption: Unlearning requires the server to identify which client’s updates to remove. Local DP allows this with noisy updates, but homomorphic encryption makes it infeasible, since encrypted updates are indistinguishable, deletion requests lose meaning.
Naive Idea: Instead of abandoning encryption, the deletion-requesting client could send its past updates multiplied by -1, encrypted under homomorphic encryption. This would effectively cancel its previous contribution without revealing raw gradients, offering a potential direction for client-assisted unlearning under encryption.

Slides:
PDF (Korean) Download

Federated Unlearning: Concept & Challenges

August 13, 2025

Summary:
In preparation for a lab meeting, I studied the concept of Federated Unlearning, which extends the idea of “machine unlearning” to federated learning environments. While federated learning protects raw data by keeping it on clients, requests such as the “Right to be Forgotten” raise a crucial question: How can we safely remove the influence of specific data or clients from a trained federated model? This summary is based on the PEPR ’24 talk Learning and Unlearning Your Data in Federated Settings (USENIX).

🔑 Research Question:

Can federated learning systems support safe and efficient data deletion without full retraining?

⚙️ Conceptual Approaches:

Passive Unlearning:
- Server-only (leveraging stored updates)
- Client-aided (clients assist with gradient/history)
Active Unlearning:
- Server and clients collaboratively remove the influence of target data.
Level of Unlearning:
- Record-level, class-level, or client-level unlearning

📊 Key Insightes:

Retraining is reliable but computationally prohibitive.
Approximate unlearning can preserve accuracy but may weaken guarantees.
Privacy, consistency, and efficiency must be carefully balanced.
Lack of formal proof of unlearning remains a major challenge.

⚠️ Limitations & Open Challenges:

Verifiability: proving that unlearning actually occurred.
Dynamic participation: handling clients joining or leaving.
Fairness and explainability remain underexplored.
New privacy risks may arise during the unlearning process itself.

🎥 Reference:

PEPR ‘24 - Learning and Unlearning Your Data in Federated Settings

Slides:
PDF (Korean) Download

Paper Deep Dive – MaskCRYPT

August 06, 2025

Summary:
In this lab meeting, I reviewed MaskCRYPT: Federated Learning With Selective Homomorphic Encryption for Federated Learning. While federated learning protects data from direct leakage, exposing model weights can still lead to serious privacy risks such as membership inference attacks. MASKCRYPT addresses this challenge by selectively encrypting only a small fraction of model updates, striking a balance between security and efficiency under homomorphic encryption.

🔑 Research Question:

Do we have to encrypt all the model weights?

⚙️ Key Mechanism: Selective Homomorphic Encryption

Gradient-guided priority list to identify which weights to encrypt.
Clients generate individual masks, which are then aggregated on the server to form a final Mask Consensus shared with all clients.
Encrypt only selected weights, while the rest are transmitted as plaintext averages.

📊 Main Results:

Encrypting as little as 1% of weights effectively defends against membership inference and reconstruction attacks.
Reduced communication overhead by up to 4.15× compared with encrypting all model updates.
Improved wall-clock training time.
Maintained accuracy comparable to non-encrypted training.

⚠️ Limitations:

Clients must exchange local priority lists, introducing overhead.
Correctness and fairness of the Mask Consensus mechanism must be guaranteed.
Evaluated only on moderate-sized models/datasets.

Slides:
PDF (Korean) Download

Paper Deep Dive – HETAL

July 16, 2025

Summary:
In this lab meeting, I reviewed HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption. Transfer learning is widely used for data-scarce problems by fine-tuning pre-trained models. While previous studies focused mainly on encrypted inference, HETAL is the first practical scheme that enables encrypted training under homomorphic encryption.

🔑 Research Question:

How can transfer learning be made both privacy-preserving and efficient when client data must remain encrypted?

⚙️ Key Mechanism:

Encrypted Softmax Approximation: Designed a highly precise softmax approximation algorithm compatible with HE constraints.
Efficient Matrix Multiplication: Introduced an encrypted matrix multiplication algorithm, 1.8×–323× faster than prior methods.
End-to-end Encrypted Training: Adopted validation-based early stopping, achieving accuracy comparable to plaintext training.

📊 Main Results:

Fine-tuning on encrypted models succeeded (a new milestone compared to prior work).
Training times ranged 567–3442 seconds (< 1 hour) across five benchmark datasets.
Accuracy comparable to non-encrypted training was achieved.

⚠️ Limitations:

Accuracy degradation in certain tasks due to approximation constraints.
Evaluation limited to moderate-sized models/datasets.

Slides:
PDF (Korean) Download

Jihyung Kook (국지형)

Talks

Paper Deep Dive – FedRecovery

Federated Unlearning: Concept & Challenges

Paper Deep Dive – MaskCRYPT

Paper Deep Dive – HETAL