
rajistics
Rajiv Shah | data science & AI
133Following18.9KFollowers210.2KLikes
Explaining data science đ đ€Ł âcriminally underratedâ đ€ works @huggingface
Videos
Liked
Playlists
Videos
Making efficient use of GPU Memory when training transformer models. This video covers, the Kernel Overhead, Optimizer states, Activation memory, and Gradient memory.
#machinelearning #transformers #datascience #deeplearning #nvidia #huggingface
Efficient Training on a Single GPU: https://huggingface.co/docs/transformers/perf_train_gpu_one
#machinelearning #transformers #datascience #deeplearning #nvidia #huggingface
Efficient Training on a Single GPU: https://huggingface.co/docs/transformers/perf_train_gpu_one
Let's dig into the detail for building your own large language model on a custom domain. The LLaVA-Med does a great breakdown of how they built their model. The video goes through their data preparation, training, and evaluation of the model.
#datascience #machinelearning #largelanguagemodel #vicuna #llava -med
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day: https://arxiv.org/pdf/2306.00890.pdf
Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis: https://arxiv.org/pdf/2305.13230.pdf
Annotated graph by Sebastian Raschka
Background by R O: https://unsplash.com/photos/FFA8yd4OynY
#datascience #machinelearning #largelanguagemodel #vicuna #llava -med
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day: https://arxiv.org/pdf/2306.00890.pdf
Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis: https://arxiv.org/pdf/2305.13230.pdf
Annotated graph by Sebastian Raschka
Background by R O: https://unsplash.com/photos/FFA8yd4OynY
Open source LLMs why they seem popular are not easy to get running in production settings. The current open source LLMs, while getting better, still lag behind the commercial APIs in many areas. This video highlights a few of them.
#datascience #machinelearning #largelanguagemodels #openai #anthropic #flant5
MMLU Leaderboard: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
The False Promise of Imitating Proprietary LLMs: https://arxiv.org/abs/2305.15717
Background by Qiming Chen: https://unsplash.com/photos/lzCH2_8qRH8
#datascience #machinelearning #largelanguagemodels #openai #anthropic #flant5
MMLU Leaderboard: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
The False Promise of Imitating Proprietary LLMs: https://arxiv.org/abs/2305.15717
Background by Qiming Chen: https://unsplash.com/photos/lzCH2_8qRH8
Deciding whether to use a Large Language Model or a smaller model? This video explores the tradeoffs between both approaches based on the latest research (May 2023) on the performance of these models. The video covers the effectiveness of LLMs, where smaller models best LLMs, and criteria for deciding between the two.
#machinelearning #datascience #largelanguagemodels
#machinelearning #datascience #largelanguagemodels
Japan said it was acceptable to use copyrighted material such as text and images to train AI. This has the approach of United States and other countries like Israel have also followed the US. All of this makes it much easier for people to train AI models within these countries.
#datascience #machinelearning #copyright #fairuse
Israel: https://www.project-disco.org/intellectual-property/011823-israel-ministry-of-justice-issues-opinion-supporting-the-use-of-copyrighted-works-for-machine-learning/
Japan: https://technomancers.ai/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/
Background by Dario Seretin: https://unsplash.com/photos/AGgOAqqGlT4
#datascience #machinelearning #copyright #fairuse
Israel: https://www.project-disco.org/intellectual-property/011823-israel-ministry-of-justice-issues-opinion-supporting-the-use-of-copyrighted-works-for-machine-learning/
Japan: https://technomancers.ai/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/
Background by Dario Seretin: https://unsplash.com/photos/AGgOAqqGlT4
GPUs power a lot of deep learning and large language models. A key is the use of linear algebra like matrix multiplication that can be parallelized across all the cores in a GPU.
#datascience #machinelearning #deeplearning #nvidia #matrixmultiplication
Pie example from: https://www.mathsisfun.com/algebra/matrix-multiplying.html
Bruna Branco background: https://unsplash.com/photos/FWaV69D5b8k
#datascience #machinelearning #deeplearning #nvidia #matrixmultiplication
Pie example from: https://www.mathsisfun.com/algebra/matrix-multiplying.html
Bruna Branco background: https://unsplash.com/photos/FWaV69D5b8k
Deepmind and OpenAI want everyone to focus on extreme risks of AI. This helps them hype up AI and make themselves more attractive. The reality is there are far greater and more mundance risks that are occuring today. Let's talk about the data these models are trained on, the biases in these models, how the models are being used, and the social and economic implications of these models.
#datascience #machinelearning #modelbias #modelrisk
Model evaluation for extreme risks: https://arxiv.org/pdf/2305.15324.pdf
Github Copilot Litigation: https://githubcopilotlitigation.com/
Stable Diffusion Lawsuit: https://stablediffusionlitigation.com/
#datascience #machinelearning #modelbias #modelrisk
Model evaluation for extreme risks: https://arxiv.org/pdf/2305.15324.pdf
Github Copilot Litigation: https://githubcopilotlitigation.com/
Stable Diffusion Lawsuit: https://stablediffusionlitigation.com/
QLoRA allows for an efficient finetuning approach that supports using a 4-bit approach. This allows people to fine models using a single GPU. It's possible to now fine tune a 33B parameter model in less than 24 GB.
#datascience #machinelearning #lora #peft #qlora #finetuning #largelanguagemodels
Paper: https://arxiv.org/abs/2305.14314
Code+Demo: https://github.com/artidoro/qlora
Samples: https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing
Colab: https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing
Background by Vishnu Mohanan: https://unsplash.com/collections/1779288/lb---brain-dump
#datascience #machinelearning #lora #peft #qlora #finetuning #largelanguagemodels
Paper: https://arxiv.org/abs/2305.14314
Code+Demo: https://github.com/artidoro/qlora
Samples: https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing
Colab: https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing
Background by Vishnu Mohanan: https://unsplash.com/collections/1779288/lb---brain-dump
Uncensored models are here. Eric Hartford has been building the WizardLM series of models and sharing how he has been training the models. These models remove a lot of insttructions that are perceived to carry certain values. Once consequence is models that are less aligned may actually perform better.
#datascience #machinelearning #wizardlm #uncensoredmodels
Uncensored Models: https://erichartford.com/uncensored-models
WizardLM: https://huggingface.co/ehartford/WizardLM-7B-Uncensored
Vicuna Unfiltered: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
Sparks of AGI: https://www.youtube.com/watch?v=qbIk7-JPB2c
Background by Jean Carlo Emer: https://unsplash.com/photos/5o1YssX5naM
#datascience #machinelearning #wizardlm #uncensoredmodels
Uncensored Models: https://erichartford.com/uncensored-models
WizardLM: https://huggingface.co/ehartford/WizardLM-7B-Uncensored
Vicuna Unfiltered: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
Sparks of AGI: https://www.youtube.com/watch?v=qbIk7-JPB2c
Background by Jean Carlo Emer: https://unsplash.com/photos/5o1YssX5naM
Andrew Ng wrote recently on this no test set approach that he is seeing when people are using prompt engineering. This is very different than traditional machine learning approaches that rely on a test set. The video reviews some of the tradeoffs around this approach.
#datascience #machinelearning #promptengineering #validation
Andrew Ng Batch: https://www.deeplearning.ai/the-batch/issue-197/
#datascience #machinelearning #promptengineering #validation
Andrew Ng Batch: https://www.deeplearning.ai/the-batch/issue-197/
An emerging trend of using large language models like GPT-4 for labeling data instead of using humans to annotate data:
#datascience #machinelearning #gpt4 #alpaca #labelingdata #annotatingdata
Background by Erol Ahmed: https://unsplash.com/photos/Y3KEBQlB1Zk
ChatDoctor: https://github.com/Kent0n-Li/ChatDoctor
GPT-4 Labeling: https://www.artisana.ai/articles/gpt-4-outperforms-elite-crowdworkers-saving-researchers-usd500-000-and-20
#datascience #machinelearning #gpt4 #alpaca #labelingdata #annotatingdata
Background by Erol Ahmed: https://unsplash.com/photos/Y3KEBQlB1Zk
ChatDoctor: https://github.com/Kent0n-Li/ChatDoctor
GPT-4 Labeling: https://www.artisana.ai/articles/gpt-4-outperforms-elite-crowdworkers-saving-researchers-usd500-000-and-20
Bias in Generative AI. This post is based on a blog post by text.io on bias in generative AI using an example of job postings. A great reminder that it's very easy for generative models to introduce bias and problematic outputs.
#datascience #machinelearning #bias #generativeai #openai
Textio blog post: https://textio.com/blog/mindful-ai-crafting-prompts-to-mitigate-the-bias-in-generative-ai/115959775665
Background by Manuel: https://unsplash.com/photos/CANL3bzp6wU
#datascience #machinelearning #bias #generativeai #openai
Textio blog post: https://textio.com/blog/mindful-ai-crafting-prompts-to-mitigate-the-bias-in-generative-ai/115959775665
Background by Manuel: https://unsplash.com/photos/CANL3bzp6wU
Active learning uses an algorithm to help select what data to label. Ideally, using this approach, people can get comparable model results using less labeled data.
#datascience #machinelearning #activelearning #datalabeling
Active Learning Strategies from Neptune.ai: https://neptune.ai/blog/active-learning-strategies-tools-use-cases
#datascience #machinelearning #activelearning #datalabeling
Active Learning Strategies from Neptune.ai: https://neptune.ai/blog/active-learning-strategies-tools-use-cases
Thinking about the size of numbers becomes important when working with neural networks. This video touches about different techniques like using bfloat16 and quantization.
#datascience #machinelearning #bfloat16 #quantization #largelanguagemodels
Links:
Accelerating Large Language Models with Mixed-Precision Techniques: https://lightning.ai/pages/community/tutorial/accelerating-large-language-models-with-mixed-precision-techniques/
BFloat16: The secret to high performance on Cloud TPUs: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
Llama.cpp: https://github.com/ggerganov/llama.cpp/
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes: https://huggingface.co/blog/hf-bitsandbytes-integration
Background by Umberto: https://unsplash.com/photos/jXd2FSvcRr8
#datascience #machinelearning #bfloat16 #quantization #largelanguagemodels
Links:
Accelerating Large Language Models with Mixed-Precision Techniques: https://lightning.ai/pages/community/tutorial/accelerating-large-language-models-with-mixed-precision-techniques/
BFloat16: The secret to high performance on Cloud TPUs: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
Llama.cpp: https://github.com/ggerganov/llama.cpp/
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes: https://huggingface.co/blog/hf-bitsandbytes-integration
Background by Umberto: https://unsplash.com/photos/jXd2FSvcRr8
LangChain added a new agent, Plan and Execute. Looking forward to the more advanced use cases people will build with it. This was inspired by BabyAGI and the "Plan and Solve" paper.
#datascience #machinelearning #largelanguagemodels #langchain
Lang Chain Agent: https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
Reasoning by Large Language Models: https://arxiv.org/pdf/2305.04091.pdf
Background by charlesdeluvio: https://unsplash.com/photos/OWkXt1ikC5g
#datascience #machinelearning #largelanguagemodels #langchain
Lang Chain Agent: https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought
Reasoning by Large Language Models: https://arxiv.org/pdf/2305.04091.pdf
Background by charlesdeluvio: https://unsplash.com/photos/OWkXt1ikC5g