Why AMD's CEO Says Prompt Engineering is Dead: The Shift to AI Strategy

Why AMD's CEO Says Prompt Engineering is Dead: The Shift to AI Strategy

I was setting up my environment last week and honestly, it was a headache. Not the fun kind of headache where you learn something. The kind where you waste three hours on a config file that turned out to have a typo.

The Rise of Prompt Engineering

Prompt engineering started as a hobby for data scientists in 2020 when OpenAI released GPT‑3.5. They found they could tweak the wording of a prompt and get noticeably better results, so they began treating prompts like code snippets that needed debugging. That simple idea spread fast; by 2022 almost every AI product was built around crafting the right prompt. Prompt engineering became a new skill set for ML engineers, and companies hired “prompt designers” to squeeze out performance from large language models.

A good example is a fintech startup in Bangalore called FinWise that launched in January 2023. They used the OpenAI API to generate personalized investment advice. Instead of building a custom model, they invested heavily in prompt tuning and got responses that matched their compliance rules. The startup claimed a 15% lift in user engagement after refining prompts for different financial products.

I tried to fine‑tune a similar model for my own project, but the dataset had missing values and the training script kept freezing at epoch 3. I ended up writing a small cleanup utility that filled NaNs with zeros, which only helped a little. It was frustrating because it felt like we were chasing a moving target—every time you tweak the prompt, new problems pop up.

We need better tools.

AMD's CEO Perspective: Why He Calls It Dead

In June 2024, Lisa Su spoke at the AMD Summit in San Jose. She said that prompt engineering is dead because AI workloads are moving toward model training and inference on custom hardware. She highlighted how AMD’s EPYC 7763, released July 2023 with 64 cores and 2.0 GHz base clock, can now train large models locally, reducing dependency on cloud prompts.

The CEO referenced a case where a research lab in Pune used the EPYC 7763 to train an internal LLM from scratch, cutting training time by 30% compared to using AWS EC2 instances. He also pointed out that the cost of running those cloud instances is rising faster than the price of on‑prem hardware.

I attended the session and the slides had broken links; I couldn't see the data chart for latency improvements. I tried to ask a question, but the mic was muted so I left the room without knowing the exact numbers. The experience showed me how even top executives can stumble over presentation glitches.

And the CEO added that the future of AI is about building hardware that speaks directly to the model, not just the prompt.

Real‑World Shift: From Prompts to Model Training at NVIDIA

NVIDIA’s A100 GPU launched in 2020 and quickly became the go‑to card for large‑scale training. In 2022, DeepMind used a cluster of 512 A100s to train AlphaFold, the protein‑folding model that took months on older hardware. The shift was clear: researchers moved from tweaking prompts on pre‑trained models to building and fine‑tuning their own weights.

Companies now use NVIDIA’s TensorRT and Triton Inference Server to deploy these custom models at scale. For example, Tesla deployed a self‑driving stack that runs inference on 8 A100s in each vehicle, eliminating the need for cloud prompts during critical maneuvers.

I tried to run a training job on an older GTX 1080 and the memory overflow happened after just two epochs. The error logs were unreadable and I had to swap GPUs mid‑night. It was a pain that made me realize how hardware matters more than prompt tricks.

But the real benefit of this shift is that developers can now control every layer of the model, from architecture to hyperparameters, instead of being limited by what a pre‑trained model allows.

Practical Implications for Developers and Businesses

For developers, moving away from prompt engineering means learning new skills. You need to understand GPU programming, distributed training frameworks like PyTorch Distributed or Horovod, and how to optimize batch sizes for memory usage. The cost of GPUs is a factor too; an AMD EPYC 7763 server with 8 A100s can cost around $50,000 upfront but saves up to $20,000 per month in cloud compute if you run heavy training jobs on‑prem. Training time shrinks because data stays local, reducing network latency and improving data privacy. Data volume also matters; you need to store terabytes of training examples, which requires fast NVMe storage or a high‑speed interconnect like AMD Infinity Fabric. Finally, the skill set shifts toward system architecture—knowing how to balance CPU, GPU, and memory bandwidth is now critical.

A retail chain in Mumbai, RetailX, switched from using OpenAI prompts for product recommendations to training its own recommendation model on a 32‑core EPYC 7763 with 2.5 TB of RAM. The team had to redesign their data pipeline to feed the GPU efficiently, but after three months they reported a 12% boost in conversion rate.

The migration was rough.

And the ROI measured

Post a Comment

Previous Post Next Post