Log in to follow creators, like videos, and view comments.

© 2023 TikTok


Rajiv Shah | data science & AI


Explaining data science 🚀 🤣 “criminally underrated” 🤗 works @huggingface




Skits - LLMs

10 posts

Skits - News

6 posts

LLM Deep Dives

2 posts

Skits ML

20 posts


11 posts


18 posts


Making efficient use of GPU Memory when training transformer models. This video covers, the Kernel Overhead, Optimizer states, Activation memory, and Gradient memory. #machinelearning #transformers #datascience #deeplearning #nvidia #huggingface Efficient Training on a Single GPU: https://huggingface.co/docs/transformers/perf_train_gpu_one created by Rajiv Shah | data science & AI with Rajiv Shah | data science & AI's original sound
Let's dig into the detail for building your own large language model on a custom domain. The LLaVA-Med does a great breakdown of how they built their model. The video goes through their data preparation, training, and evaluation of the model.

#datascience #machinelearning #largelanguagemodel #vicuna #llava -med

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day: https://arxiv.org/pdf/2306.00890.pdf

Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis: https://arxiv.org/pdf/2305.13230.pdf
Annotated graph by Sebastian Raschka

Background by R O: https://unsplash.com/photos/FFA8yd4OynY
Open source LLMs why they seem popular are not easy to get running in production settings. The current open source LLMs, while getting better, still lag behind the commercial APIs in many areas. This video highlights a few of them.

#datascience #machinelearning #largelanguagemodels #openai #anthropic #flant5

MMLU Leaderboard: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

The False Promise of Imitating Proprietary LLMs: https://arxiv.org/abs/2305.15717

Background by Qiming Chen: https://unsplash.com/photos/lzCH2_8qRH8
Japan said it was acceptable to use copyrighted material such as text and images to train AI. This has the approach of United States and other countries like Israel have also followed the US. All of this makes it much easier for people to train AI models within these countries.

#datascience #machinelearning #copyright #fairuse

Israel: https://www.project-disco.org/intellectual-property/011823-israel-ministry-of-justice-issues-opinion-supporting-the-use-of-copyrighted-works-for-machine-learning/

Japan: https://technomancers.ai/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/

Background by Dario Seretin: https://unsplash.com/photos/AGgOAqqGlT4
QLoRA allows for an efficient finetuning approach that supports using a 4-bit approach. This allows people to fine models using a single GPU. It's possible to now fine tune a 33B parameter model in less than 24 GB.

#datascience #machinelearning #lora #peft #qlora #finetuning #largelanguagemodels

Paper: https://arxiv.org/abs/2305.14314
Code+Demo: https://github.com/artidoro/qlora
Samples: https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing
Colab: https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing

Background by Vishnu Mohanan: https://unsplash.com/collections/1779288/lb---brain-dump
Uncensored models are here. Eric Hartford has been building the WizardLM series of models and sharing how he has been training the models. These models remove a lot of insttructions that are perceived to carry certain values. Once consequence is models that are less aligned may actually perform better.

#datascience #machinelearning #wizardlm #uncensoredmodels

Uncensored Models: https://erichartford.com/uncensored-models

WizardLM: https://huggingface.co/ehartford/WizardLM-7B-Uncensored

Vicuna Unfiltered: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

Sparks of AGI: https://www.youtube.com/watch?v=qbIk7-JPB2c

Background by Jean Carlo Emer: https://unsplash.com/photos/5o1YssX5naM
An emerging trend of using large language models like GPT-4 for labeling data instead of using humans to annotate data:

#datascience #machinelearning #gpt4 #alpaca #labelingdata #annotatingdata

Background by Erol Ahmed: https://unsplash.com/photos/Y3KEBQlB1Zk

ChatDoctor: https://github.com/Kent0n-Li/ChatDoctor

GPT-4 Labeling: https://www.artisana.ai/articles/gpt-4-outperforms-elite-crowdworkers-saving-researchers-usd500-000-and-20
Thinking about the size of numbers becomes important when working with neural networks. This video touches about different techniques like using bfloat16 and quantization.

#datascience #machinelearning #bfloat16 #quantization #largelanguagemodels

Accelerating Large Language Models with Mixed-Precision Techniques: https://lightning.ai/pages/community/tutorial/accelerating-large-language-models-with-mixed-precision-techniques/

BFloat16: The secret to high performance on Cloud TPUs: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus

Llama.cpp: https://github.com/ggerganov/llama.cpp/

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes: https://huggingface.co/blog/hf-bitsandbytes-integration

Background by Umberto: https://unsplash.com/photos/jXd2FSvcRr8
LangChain added a new agent, Plan and Execute. Looking forward to the more advanced use cases people will build with it. This was inspired by BabyAGI and the "Plan and Solve" paper.
#datascience #machinelearning #largelanguagemodels #langchain

Lang Chain Agent: https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
Reasoning by Large Language Models: https://arxiv.org/pdf/2305.04091.pdf

Background by charlesdeluvio: https://unsplash.com/photos/OWkXt1ikC5g
Get TikTok App