#python #datasets #evaluation #metrics #natural_language_processing #nlp #numpy #pandas #pytorch #tensorflow
https://github.com/huggingface/nlp
https://github.com/huggingface/nlp
GitHub
GitHub - huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data…
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools - GitHub - huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for...
#typescript #agent_monitoring #analytics #evaluation #gpt #langchain #large_language_models #llama_index #llm #llm_cost #llm_evaluation #llm_observability #llmops #monitoring #open_source #openai #playground #prompt_engineering #prompt_management #ycombinator
Helicone is an all-in-one, open-source platform for developing and managing Large Language Models (LLMs). It allows you to integrate with various LLM providers like OpenAI, Anthropic, and more with just one line of code. You can observe and debug your model's performance, analyze metrics such as cost and latency, and fine-tune your models easily. The platform also offers a playground to test and iterate on prompts and sessions, and it supports prompt management and automatic evaluations. Helicone is enterprise-ready, compliant with SOC 2 and GDPR, and offers a generous free tier of 100k requests per month. This makes it easier to manage and optimize your LLM projects efficiently.
https://github.com/Helicone/helicone
Helicone is an all-in-one, open-source platform for developing and managing Large Language Models (LLMs). It allows you to integrate with various LLM providers like OpenAI, Anthropic, and more with just one line of code. You can observe and debug your model's performance, analyze metrics such as cost and latency, and fine-tune your models easily. The platform also offers a playground to test and iterate on prompts and sessions, and it supports prompt management and automatic evaluations. Helicone is enterprise-ready, compliant with SOC 2 and GDPR, and offers a generous free tier of 100k requests per month. This makes it easier to manage and optimize your LLM projects efficiently.
https://github.com/Helicone/helicone
GitHub
GitHub - Helicone/helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC…
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓 - Helicone/helicone
❤1
#python #agent #agentops #agents_sdk #ai #anthropic #autogen #cost_estimation #crewai #evals #evaluation_metrics #groq #langchain #llm #mistral #ollama #openai #openai_agents
AgentOps is a tool that helps developers monitor and improve AI agents. It provides features like session replays, cost management for Large Language Models (LLMs), and security checks to prevent data leaks. This platform allows you to track how your agents perform, interact with users, and use external tools. By using AgentOps, you can quickly identify problems, optimize agent performance, and ensure compliance with safety standards. It integrates well with popular platforms like OpenAI and AutoGen, making it easy to set up and use[1][3][5].
https://github.com/AgentOps-AI/agentops
AgentOps is a tool that helps developers monitor and improve AI agents. It provides features like session replays, cost management for Large Language Models (LLMs), and security checks to prevent data leaks. This platform allows you to track how your agents perform, interact with users, and use external tools. By using AgentOps, you can quickly identify problems, optimize agent performance, and ensure compliance with safety standards. It integrates well with popular platforms like OpenAI and AutoGen, making it easy to set up and use[1][3][5].
https://github.com/AgentOps-AI/agentops
GitHub
GitHub - AgentOps-AI/agentops: Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most…
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and...
#typescript #ai #analytics #datasets #dspy #evaluation #gpt #llm #llmops #low_code #observability #openai #prompt_engineering
LangWatch helps you monitor, test, and improve AI applications by tracking performance, comparing different setups, and optimizing prompts automatically. It works with any AI tool or framework, keeps your data secure, and lets you collaborate with experts to fix issues quickly, making your AI more reliable and efficient.
https://github.com/langwatch/langwatch
LangWatch helps you monitor, test, and improve AI applications by tracking performance, comparing different setups, and optimizing prompts automatically. It works with any AI tool or framework, keeps your data secure, and lets you collaborate with experts to fix issues quickly, making your AI more reliable and efficient.
https://github.com/langwatch/langwatch
GitHub
GitHub - langwatch/langwatch: The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨
The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and Prompt Optimization ✨ - langwatch/langwatch
#typescript #ci #ci_cd #cicd #evaluation #evaluation_framework #llm #llm_eval #llm_evaluation #llm_evaluation_framework #llmops #pentesting #prompt_engineering #prompt_testing #prompts #rag #red_teaming #testing #vulnerability_scanners
Promptfoo is a tool that helps developers test and improve AI applications using Large Language Models (LLMs). It allows you to **test prompts and models** automatically, **secure your apps** by finding vulnerabilities, and **compare different models** side-by-side. You can use it on your computer or integrate it into your development workflow. This tool helps you make sure your AI apps work well and are secure before you release them. It saves time and ensures quality by using data instead of guessing.
https://github.com/promptfoo/promptfoo
Promptfoo is a tool that helps developers test and improve AI applications using Large Language Models (LLMs). It allows you to **test prompts and models** automatically, **secure your apps** by finding vulnerabilities, and **compare different models** side-by-side. You can use it on your computer or integrate it into your development workflow. This tool helps you make sure your AI apps work well and are secure before you release them. It saves time and ensures quality by using data instead of guessing.
https://github.com/promptfoo/promptfoo
GitHub
GitHub - promptfoo/promptfoo: Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs.…
Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with co...
#python #agents #document_search #evaluation #guardrails #llms #optimization #prompts #rag #vector_stores
Ragbits is a tool that helps build and deploy GenAI applications quickly. It offers features like swapping between many language models, ensuring safe interactions with these models, and connecting to various data storage systems. Ragbits also includes tools for managing data and testing prompts, making it easier to develop reliable AI applications. This helps users create more accurate and efficient AI systems by integrating the latest data and reducing errors. Overall, Ragbits makes it faster and more efficient to develop and deploy AI applications.
https://github.com/deepsense-ai/ragbits
Ragbits is a tool that helps build and deploy GenAI applications quickly. It offers features like swapping between many language models, ensuring safe interactions with these models, and connecting to various data storage systems. Ragbits also includes tools for managing data and testing prompts, making it easier to develop reliable AI applications. This helps users create more accurate and efficient AI systems by integrating the latest data and reducing errors. Overall, Ragbits makes it faster and more efficient to develop and deploy AI applications.
https://github.com/deepsense-ai/ragbits
GitHub
GitHub - deepsense-ai/ragbits: Building blocks for rapid development of GenAI applications
Building blocks for rapid development of GenAI applications - GitHub - deepsense-ai/ragbits: Building blocks for rapid development of GenAI applications
#python #evaluation_framework #evaluation_metrics #llm_evaluation #llm_evaluation_framework #llm_evaluation_metrics
DeepEval is an open-source tool that makes it easy to test and improve large language model (LLM) applications, much like how Pytest works for regular software, but focused on LLM outputs. It offers over 30 ready-to-use metrics—such as answer relevancy, faithfulness, and hallucination—to check if your LLM is accurate, safe, and reliable. You can test your whole application or just parts of it, and even generate synthetic data for better testing. DeepEval works locally or in the cloud, letting you compare results, share reports, and keep improving your models. This helps you build better, safer, and more trustworthy LLM apps with less effort[1][2][3].
https://github.com/confident-ai/deepeval
DeepEval is an open-source tool that makes it easy to test and improve large language model (LLM) applications, much like how Pytest works for regular software, but focused on LLM outputs. It offers over 30 ready-to-use metrics—such as answer relevancy, faithfulness, and hallucination—to check if your LLM is accurate, safe, and reliable. You can test your whole application or just parts of it, and even generate synthetic data for better testing. DeepEval works locally or in the cloud, letting you compare results, share reports, and keep improving your models. This helps you build better, safer, and more trustworthy LLM apps with less effort[1][2][3].
https://github.com/confident-ai/deepeval
GitHub
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
The LLM Evaluation Framework. Contribute to confident-ai/deepeval development by creating an account on GitHub.
#go #agent #agentic #ai #chatbot #chatbots #embeddings #evaluation #generative_ai #golang #knowledge_base #llm #multi_tenant #multimodel #ollama #openai #question_answering #rag #reranking #semantic_search #vector_search
WeKnora is a powerful tool that helps you understand and find answers in complex documents like PDFs and Word files. It uses advanced AI to read documents, understand what they mean, and answer your questions in a simple way. This tool is useful for businesses and researchers because it can quickly find information from many documents, making it easier to manage knowledge and make decisions. It also supports multiple languages and can be used privately, ensuring your data stays safe.
https://github.com/Tencent/WeKnora
WeKnora is a powerful tool that helps you understand and find answers in complex documents like PDFs and Word files. It uses advanced AI to read documents, understand what they mean, and answer your questions in a simple way. This tool is useful for businesses and researchers because it can quickly find information from many documents, making it easier to manage knowledge and make decisions. It also supports multiple languages and can be used privately, ensuring your data stays safe.
https://github.com/Tencent/WeKnora
GitHub
GitHub - Tencent/WeKnora: LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers…
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm. - Tencent/WeKnora