-
SOSP'25 COpter: Efficient Large-Scale Resource-Allocation via Continual Optimization
Suhas Jayaram Subramanya, Don Kurian Dennis, Gregory R. Ganger, Virginia Smith
Keywords: Resource-Allocation, MILP.
Motivation: 1) Exsiting solvers are not scalable: Extremely slow with large scales. 2) Exsitings accelerating techniques cannot keep efficiency and optimality. 3) Accelerate and keep optimality .
Design: 1) Solving a standard LP: Utilize standard LP to unify different problems. 2) System-level problem parametere manipulation for acceleration. 3) Eliminate slow post-processes from related LP solutions to MILP solutions by simple rounding.
-
SOSP'24 Tiered Memory Management: Access Latency is the Key!
Midhul Vuppalapati, Rachit Agarwal
Keywords: Measurement, Page placement
Motivation: 1) Hotest pages may be not expected: Demonstrated by experiments. 3) Palcement should minimize expected latency.
Design: 1) Measure expected queue with small effort: CPU-to-memory datapath. 2) Algorithm (Or principle): Change type (hot or alternative) of pages by measured latencies.
-
SIGCOMM'25 SaTE: Low-Latency Traffic Engineering for Satellite Networks
Hao Wu, Yizhan Han, Mohit Rajpal, Qizhen Zhang, Jingxian Wang
Keywords: Satellite Network, Traffic Engineering, ML Generalization
Motivation: 1) Dynamic: Satellite Networks' topology change frequently. 2) Generalization: NN-like approaches' general problem. 3) Large Scale: Numerous nodes and links.
Design: 1) GNN-only: Elinimate DNN to improve generalizability. 2) Graph pruning based on topology similarity (to an existing baseline topology): improve generalizability. 3) Supervisely learn Gurobi's solution.
-
SIGCOMM'25Centralium: A Hybrid Route-Planning Framework for Large-Scale Data Center Network Migrations
Yikai Lin, Mohab Gawish(Meta)
Keywords: Centralized Routing, Distributed Routing, Network Migration, Route Planning, BGP
Motivation: 1)Data centers frequently undergo large-scale network migrations 2)BGP cannot encode the sequential and conditional routing behaviors required during transitional migration phases.
Design:1)Route Planning Abstraction (RPA);2)Centralium Architecture;3)Two protection Mechanisms.
-
SIGCOMM'25From ATOP to ZCube: Automated Topology Optimization Pipeline and a Highly Cost-Effective Network Topology for Large Model Training
Zihan Yan, Dan Li (Tsinghua University)
Keywords: Data center networks, Network topology, AI infrastructure
Motivation: 1)The explosive growth in LLM training scales requires new large-scale network topology designs. 2)Expert-designed topologies overlook potential asymmetric structures and struggle to balance multi-objective performance; existing automated approaches are not mature enough.
Design:1)Insight-Driven Hyperparameterization;2)Multi-Objective Optimization Engine;3)High-Performance Evaluation Pipeline.
-
DAC'21 NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning
Haoxing Ren, Matthew Fojtik
Keywords: Standard Cell Layout, RL, DRC, Placement and Routing
Motivation:1) Advanced technology nodes face DRC explosion (2000+ rules) with conditional and multi-pattern correlation, hard to model analytically. 2) Traditional methods (simulated annealing) suffer from long runtime, variable explosion, and poor scalability. 3) Need automated layout generation with competitive area and DRC compliance.
Design: 1) Two-stage framework: Placement (simulated annealing + RL + ML routability predictor) + Routing (genetic algorithm + RL DRC fixer). 2) RL for placement: Pre-trained with simulated annealing samples, learns device pairing/ordering to speed up runtime. 3) RL for DRC fixing: Trained on one cell, transferable to all cells by identifying local DRC patterns. 4) ML routability predictor: Two-step (simple + precise) to optimize placement for routing.
-
DAC'20 GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning
Hanrui Wang, Kuan Wang, Jiacheng Yang, et al.
Keywords: Transistor Sizing, Graph Convolutional Network (GCN), RL, Transfer Learning
Motivation: 1) Analog circuit sizing relies on human experts, time-consuming with complex performance tradeoffs. 2) Traditional black-box methods (BO, ES) ignore circuit topology and cannot transfer knowledge across technology nodes/topologies. 3) Need transferable automated sizing with superior performance.
Design: 1) Graph representation: Circuit modeled as graph (nodes=components, edges=wires) to capture topology. 2) GCN-RL agent: 7-layer GCN aggregates neighbor features, Actor-Critic architecture with DDPG for continuous action space. 3) Action space: Component-specific continuous parameters to avoid discrete space explosion.