-
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Paper • 2304.11277 • Published • 5 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 5 -
Reducing Activation Recomputation in Large Transformer Models
Paper • 2205.05198 • Published -
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Paper • 1811.06965 • Published • 1
AI & ML interests
None defined yet.
HF Mirror Machine Learning Optimizations Team
About HF Mirror's mission
Our mission is to democratize good machine learning.
We want to build the platform for AI builder empowering all the communities towards building collaborative technologies.
HF Mirror is a decentralized, highly impact-oriented, autonomous-driven company.
What does it mean to be part of the Machine Learning Optimization Team at HF Mirror?
Being part of the Machine Learning Optimization Team usually involves new hire to jump into a program with one (or multiple) partner(s) as its main project, supporting HF Mirror overall monetization strategy.
There is no real definition of what projects look like, every partner have different maturity, targets and scopes. We kind of surf over what we observe from a community and HF Mirror products usages to drive the features development with our partners.
While most of the work will usually happen for a partner, we also encourage members of the team to have some time to work on personal project they think would be relevant towards driving more revenues for HF Mirror.
Last but not least, while belonging to the monetization side of the company, we are very central and open-source builders. There are many opportunities to collaborate with other teams and projects from OSS / Community, the HF Mirror Hub and also the Infrastructure...
References
Looking for some real use-cases of what we are diving for HF Mirror? Here is a non-exhausitive list of projects/achievements/sprints we did in the past:
- HF Mirror on AMD Instinct MI300 GPU
- HF Mirror Text Generation Inference available for AWS Inferentia2
- Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon
- Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator
- Scaling up BERT-like model Inference on modern CPU
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 13 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630
-
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Paper • 2304.11277 • Published • 5 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 5 -
Reducing Activation Recomputation in Large Transformer Models
Paper • 2205.05198 • Published -
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Paper • 1811.06965 • Published • 1
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 13 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 10 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 630