Rajiv Shah | data science & AI


Explaining data science 🚀 🤣 "criminally underrated" 🤗 works @huggingface




Making efficient use of GPU Memory when training transformer models. This video covers, the Kernel Overhead, Optimizer states, Activation memory, and Gradient memory. Efficient Training on a Single GPU: https://huggingface.co/docs/transformers/perf_train_gpu_one
Let's dig into the detail for building your own large language model on a custom domain. The LLaVA-Med does a great breakdown of how they built their model. The video goes through their data preparation, training, and evaluation of the model.

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day: https://arxiv.org/pdf/2306.00890.pdf

Open source LLMs why they seem popular are not easy to get running in production settings. The current open source LLMs, while getting better, still lag behind the commercial APIs in many areas. This video highlights a few of them.

Japan said it was acceptable to use copyrighted material such as text and images to train AI. This has the approach of United States and other countries like Israel have also followed the US. All of this makes it much easier for people to train AI models within these countries.

QLoRA allows for an efficient finetuning approach that supports using a 4-bit approach. This allows people to fine models using a single GPU. It's possible to now fine tune a 33B parameter model in less than 24 GB.

Uncensored models are here. Eric Hartford has been building the WizardLM series of models and sharing how he has been training the models. These models remove a lot of insttructions that are perceived to carry certain values. Once consequence is models that are less aligned may actually perform better.

An emerging trend of using large language models like GPT-4 for labeling data instead of using humans to annotate data:

Thinking about the size of numbers becomes important when working with neural networks. This video touches about different techniques like using bfloat16 and quantization.

LangChain added a new agent, Plan and Execute. Looking forward to the more advanced use cases people will build with it. This was inspired by BabyAGI and the "Plan and Solve" paper.
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
Reasoning by Large Language Models: https://arxiv.org/pdf/2305.04091.pdf

