#python #ai #llm_evaluation #llm_security #security_scanners #vulnerability_assessment
`garak` is a free tool that helps check if large language models (LLMs) have weaknesses or can be made to fail in unwanted ways. It tests for issues like hallucinations, data leaks, prompt injections, misinformation, and more. You can use it like `nmap` but for LLMs. To use `garak`, you install it with `pip` and specify the LLM model you want to test. It runs various probes to see if the model behaves incorrectly and gives you detailed reports on any vulnerabilities found. This helps ensure your LLMs are safe and reliable. You can get started by following the user guide and joining their Discord community for support.
https://github.com/NVIDIA/garak
`garak` is a free tool that helps check if large language models (LLMs) have weaknesses or can be made to fail in unwanted ways. It tests for issues like hallucinations, data leaks, prompt injections, misinformation, and more. You can use it like `nmap` but for LLMs. To use `garak`, you install it with `pip` and specify the LLM model you want to test. It runs various probes to see if the model behaves incorrectly and gives you detailed reports on any vulnerabilities found. This helps ensure your LLMs are safe and reliable. You can get started by following the user guide and joining their Discord community for support.
https://github.com/NVIDIA/garak
GitHub
GitHub - NVIDIA/garak: the LLM vulnerability scanner
the LLM vulnerability scanner. Contribute to NVIDIA/garak development by creating an account on GitHub.
#typescript #agent_monitoring #analytics #evaluation #gpt #langchain #large_language_models #llama_index #llm #llm_cost #llm_evaluation #llm_observability #llmops #monitoring #open_source #openai #playground #prompt_engineering #prompt_management #ycombinator
Helicone is an all-in-one, open-source platform for developing and managing Large Language Models (LLMs). It allows you to integrate with various LLM providers like OpenAI, Anthropic, and more with just one line of code. You can observe and debug your model's performance, analyze metrics such as cost and latency, and fine-tune your models easily. The platform also offers a playground to test and iterate on prompts and sessions, and it supports prompt management and automatic evaluations. Helicone is enterprise-ready, compliant with SOC 2 and GDPR, and offers a generous free tier of 100k requests per month. This makes it easier to manage and optimize your LLM projects efficiently.
https://github.com/Helicone/helicone
Helicone is an all-in-one, open-source platform for developing and managing Large Language Models (LLMs). It allows you to integrate with various LLM providers like OpenAI, Anthropic, and more with just one line of code. You can observe and debug your model's performance, analyze metrics such as cost and latency, and fine-tune your models easily. The platform also offers a playground to test and iterate on prompts and sessions, and it supports prompt management and automatic evaluations. Helicone is enterprise-ready, compliant with SOC 2 and GDPR, and offers a generous free tier of 100k requests per month. This makes it easier to manage and optimize your LLM projects efficiently.
https://github.com/Helicone/helicone
GitHub
GitHub - Helicone/helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC…
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓 - Helicone/helicone
❤1
#typescript #ci #ci_cd #cicd #evaluation #evaluation_framework #llm #llm_eval #llm_evaluation #llm_evaluation_framework #llmops #pentesting #prompt_engineering #prompt_testing #prompts #rag #red_teaming #testing #vulnerability_scanners
Promptfoo is a tool that helps developers test and improve AI applications using Large Language Models (LLMs). It allows you to **test prompts and models** automatically, **secure your apps** by finding vulnerabilities, and **compare different models** side-by-side. You can use it on your computer or integrate it into your development workflow. This tool helps you make sure your AI apps work well and are secure before you release them. It saves time and ensures quality by using data instead of guessing.
https://github.com/promptfoo/promptfoo
Promptfoo is a tool that helps developers test and improve AI applications using Large Language Models (LLMs). It allows you to **test prompts and models** automatically, **secure your apps** by finding vulnerabilities, and **compare different models** side-by-side. You can use it on your computer or integrate it into your development workflow. This tool helps you make sure your AI apps work well and are secure before you release them. It saves time and ensures quality by using data instead of guessing.
https://github.com/promptfoo/promptfoo
GitHub
GitHub - promptfoo/promptfoo: Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs.…
Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with co...
#python #evaluation_framework #evaluation_metrics #llm_evaluation #llm_evaluation_framework #llm_evaluation_metrics
DeepEval is an open-source tool that makes it easy to test and improve large language model (LLM) applications, much like how Pytest works for regular software, but focused on LLM outputs. It offers over 30 ready-to-use metrics—such as answer relevancy, faithfulness, and hallucination—to check if your LLM is accurate, safe, and reliable. You can test your whole application or just parts of it, and even generate synthetic data for better testing. DeepEval works locally or in the cloud, letting you compare results, share reports, and keep improving your models. This helps you build better, safer, and more trustworthy LLM apps with less effort[1][2][3].
https://github.com/confident-ai/deepeval
DeepEval is an open-source tool that makes it easy to test and improve large language model (LLM) applications, much like how Pytest works for regular software, but focused on LLM outputs. It offers over 30 ready-to-use metrics—such as answer relevancy, faithfulness, and hallucination—to check if your LLM is accurate, safe, and reliable. You can test your whole application or just parts of it, and even generate synthetic data for better testing. DeepEval works locally or in the cloud, letting you compare results, share reports, and keep improving your models. This helps you build better, safer, and more trustworthy LLM apps with less effort[1][2][3].
https://github.com/confident-ai/deepeval
GitHub
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
The LLM Evaluation Framework. Contribute to confident-ai/deepeval development by creating an account on GitHub.