# Table of Contents

_[PDF version](ToC.pdf)_
|                                                                                      |  |
|---------------------------------------------------------------------------------------------|-----:|
| **Preface**                                                                                 |    ix |
| **1. Introduction to Building AI Applications with Foundation Models**                     |     1 |
| The Rise of AI Engineering                                                                  |     2 |
| - From Language Models to Large Language Models                                               |     2 |
| - From Large Language Models to Foundation Models                                             |     8 |
| - From Foundation Models to AI Engineering                                                   |    12 |
| Foundation Model Use Cases                                                                  |    16 |
| - Coding                                                                                    |    20 |
| - Image and Video Production                                                                |    22 |
| - Writing                                                                                   |    22 |
| - Education                                                                                 |    24 |
| - Conversational Bots                                                                       |    26 |
| - Information Aggregation                                                                   |    26 |
| - Data Organization                                                                         |    27 |
| - Workflow Automation                                                                       |    28 |
| Planning AI Applications                                                                    |    28 |
| - Use Case Evaluation                                                                       |    29 |
| - Setting Expectations                                                                      |    32 |
| - Milestone Planning                                                                        |    33 |
| - Maintenance                                                                               |    34 |
| The AI Engineering Stack                                                                    |    35 |
| - Three Layers of the AI Stack                                                              |    37 |
| - AI Engineering Versus ML Engineering                                                     |    39 |
| - AI Engineering Versus Full-Stack Engineering                                             |    46 |
| Summary                                                                                     |    47 |
| **2. Understanding Foundation Models**                                                     |    49 |
| Training Data                                                                               |    50 |
| - Multilingual Models                                                                       |    51 |
| - Domain-Specific Models                                                                    |    56 |
| Modeling                                                                                    |    58 |
| - Model Architecture                                                                        |    58 |
| - Model Size                                                                                |    67 |
| Post-Training                                                                               |    78 |
| - Supervised Finetuning                                                                     |    80 |
| - Preference Finetuning                                                                     |    83 |
| Sampling                                                                                    |    88 |
| - Sampling Fundamentals                                                                     |    88 |
| - Sampling Strategies                                                                       |    90 |
| - Test Time Compute                                                                         |    96 |
| - Structured Outputs                                                                        |    99 |
| - The Probabilistic Nature of AI                                                           |   105 |
| Summary                                                                                     |   111 |
| **3. Evaluation Methodology**                                                              |   113 |
| Challenges of Evaluating Foundation Models                                                 |   114 |
| Understanding Language Modeling Metrics                                                    |   118 |
| - Entropy                                                                                   |   119 |
| - Cross Entropy                                                                             |   120 |
| - Bits-per-Character and Bits-per-Byte                                                     |   121 |
| - Perplexity                                                                                |   121 |
| - Perplexity Interpretation and Use Cases                                                  |   122 |
| Exact Evaluation                                                                           |   125 |
| - Functional Correctness                                                                    |   126 |
| - Similarity Measurements Against Reference Data                                           |   127 |
| - Introduction to Embedding                                                                |   134 |
| AI as a Judge                                                                              |   136 |
| - Why AI as a Judge?                                                                        |   137 |
| - How to Use AI as a Judge                                                                  |   138 |
| - Limitations of AI as a Judge                                                              |   141 |
| - What Models Can Act as Judges?                                                           |   145 |
| Ranking Models with Comparative Evaluation                                                 |   148 |
| - Challenges of Comparative Evaluation                                                     |   152 |
| - The Future of Comparative Evaluation                                                     |   155 |
| Summary                                                                                     |   156 |
| **4. Evaluate AI Systems**                                                                 |   159 |
| Evaluation Criteria                                                                         |   160 |
| - Domain-Specific Capability                                                                |   161 |
| - Generation Capability                                                                     |   163 |
| - Instruction-Following Capability                                                         |   172 |
| - Cost and Latency                                                                          |   177 |
| Model Selection                                                                            |   179 |
| - Model Selection Workflow                                                                  |   179 |
| - Model Build Versus Buy                                                                    |   181 |
| - Navigate Public Benchmarks                                                               |   191 |
| Design Your Evaluation Pipeline                                                            |   200 |
| - Step 1. Evaluate All Components in a System                                              |   200 |
| - Step 2. Create an Evaluation Guideline                                                   |   202 |
| - Step 3. Define Evaluation Methods and Data                                               |   204 |
| Summary                                                                                     |   208 |
| **5. Prompt Engineering**                                                                  |   211 |
| Introduction to Prompting                                                                  |   212 |
| - In-Context Learning: Zero-Shot and Few-Shot                                              |   213 |
| - System Prompt and User Prompt                                                            |   215 |
| - Context Length and Context Efficiency                                                    |   218 |
| Prompt Engineering Best Practices                                                          |   220 |
| - Write Clear and Explicit Instructions                                                    |   220 |
| - Provide Sufficient Context                                                               |   223 |
| - Break Complex Tasks into Simpler Subtasks                                                |   224 |
| - Give the Model Time to Think                                                             |   227 |
| - Iterate on Your Prompts                                                                  |   229 |
| - Evaluate Prompt Engineering Tools                                                        |   230 |
| - Organize and Version Prompts                                                             |   233 |
| Defensive Prompt Engineering                                                               |   235 |
| - Proprietary Prompts and Reverse Prompt Engineering                                       |   236 |
| - Jailbreaking and Prompt Injection                                                        |   238 |
| - Information Extraction                                                                    |   243 |
| - Defenses Against Prompt Attacks                                                          |   248 |
| Summary                                                                                     |   251 |
| **6. RAG and Agents**                                                                      |   253 |
| RAG                                                                                         |   253 |
| - RAG Architecture                                                                         |   256 |
| - Retrieval Algorithms                                                                     |   257 |
| - Retrieval Optimization                                                                   |   268 |
| - RAG Beyond Texts                                                                         |   273 |
| Agents                                                                                     |   275 |
| - Agent Overview                                                                           |   276 |
| - Tools                                                                                    |   278 |
| - Planning                                                                                 |   281 |
| - Agent Failure Modes and Evaluation                                                       |   298 |
| Memory                                                                                     |   300 |
| Summary                                                                                     |   305 |
| **7. Finetuning**                                                                          |   307 |
| Finetuning Overview                                                                        |   308 |
| When to Finetune                                                                         |   311 |
| - Reasons to Finetune                                                                      |   311 |
| - Reasons Not to Finetune                                                                  |   312 |
| - Finetuning and RAG                                                                       |   316 |
| Memory Bottlenecks                                                                         |   319 |
| - Backpropagation and Trainable Parameters                                                 |   320 |
| - Memory Math                                                                              |   322 |
| - Numerical Representations                                                                |   325 |
| - Quantization                                                                             |   328 |
| Finetuning Techniques                                                                      |   332 |
| - Parameter-Efficient Finetuning                                                           |   333 |
| - Model Merging and Multi-Task Finetuning                                                  |   347 |
| - Finetuning Tactics                                                                       |   357 |
| Summary                                                                                     |   361 |
| **8. Dataset Engineering**                                                                 |   363 |
| Data Curation                                                                              |   365 |
| - Data Quality                                                                             |   368 |
| - Data Coverage                                                                            |   370 |
| - Data Quantity                                                                            |   372 |
| - Data Acquisition and Annotation                                                          |   377 |
| Data Augmentation and Synthesis                                                            |   380 |
| - Why Data Synthesis                                                                       |   381 |
| - Traditional Data Synthesis Techniques                                                   |   383 |
| - AI-Powered Data Synthesis                                                                |   386 |
| - Model Distillation                                                                       |   395 |
| Data Processing                                                                            |   396 |
| - Inspect Data                                                                             |   397 |
| - Deduplicate Data                                                                         |   399 |
| - Clean and Filter Data                                                                    |   401 |
| - Format Data                                                                                |   401 |
| Summary                                                                                     |   403 |
| **9. Inference Optimization**                                                              |   405 |
| Understanding Inference Optimization                                                       |   406 |
| - Inference Overview                                                                       |   406 |
| - Inference Performance Metrics                                                            |   412 |
| - AI Accelerators                                                                          |   419 |
| Inference Optimization                                                                     |   426 |
| - Model Optimization                                                                       |   426 |
| - Inference Service Optimization                                                           |   440 |
| Summary                                                                                     |   447 |
| **10. AI Engineering Architecture and User Feedback**                                      |   449 |
| AI Engineering Architecture                                                                |   449 |
| - Step 1. Enhance Context                                                                  |   450 |
| - Step 2. Put in Guardrails                                                                |   451 |
| - Step 3. Add Model Router and Gateway                                                    |   456 |
| - Step 4. Reduce Latency with Caches                                                      |   460 |
| - Step 5. Add Agent Patterns                                                               |   463 |
| - Monitoring and Observability                                                             |   465 |
| - AI Pipeline Orchestration                                                                |   472 |
| User Feedback                                                                              |   474 |
| - Extracting Conversational Feedback                                                      |   475 |
| - Feedback Design                                                                          |   480 |
| - Feedback Limitations                                                                     |   490 |
| Summary                                                                                     |   492 |
| **Epilogue**                                                                               |   495 |
| **Index**                                                                                  |   497 |