#cplusplus #cuda #deep_learning #deep_neural_networks #distributed #machine_learning #ml #neural_network
https://github.com/Oneflow-Inc/oneflow
https://github.com/Oneflow-Inc/oneflow
GitHub
GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. - Oneflow-Inc/oneflow
#cplusplus #cuda #deep_learning #gpu #mlp #nerf #neural_network #real_time #rendering
https://github.com/NVlabs/tiny-cuda-nn
https://github.com/NVlabs/tiny-cuda-nn
GitHub
GitHub - NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework
Lightning fast C++/CUDA neural network framework. Contribute to NVlabs/tiny-cuda-nn development by creating an account on GitHub.
#cuda #3d_reconstruction #computer_graphics #computer_vision #function_approximation #machine_learning #nerf #neural_network #real_time #real_time_rendering #realtime #signed_distance_functions
https://github.com/NVlabs/instant-ngp
https://github.com/NVlabs/instant-ngp
GitHub
GitHub - NVlabs/instant-ngp: Instant neural graphics primitives: lightning fast NeRF and more
Instant neural graphics primitives: lightning fast NeRF and more - NVlabs/instant-ngp
#python #cublas #cuda #cudnn #cupy #curand #cusolver #cusparse #cusparselt #cutensor #gpu #nccl #numpy #nvrtc #nvtx #rocm #scipy #tensor
https://github.com/cupy/cupy
https://github.com/cupy/cupy
GitHub
GitHub - cupy/cupy: NumPy & SciPy for GPU
NumPy & SciPy for GPU. Contribute to cupy/cupy development by creating an account on GitHub.
#jupyter_notebook #3d_reconstruction #cuda #instant_ngp #nerf #pytorch #pytorch_lightning
https://github.com/kwea123/ngp_pl
https://github.com/kwea123/ngp_pl
GitHub
GitHub - kwea123/ngp_pl: Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few…
Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few lines of legible code) - kwea123/ngp_pl
#python #command_line_tool #console #cuda #curses #gpu #gpu_monitoring #htop #monitoring #monitoring_tool #nvidia #nvidia_smi #nvml #process_monitoring #resource_monitor #top
https://github.com/XuehaiPan/nvitop
https://github.com/XuehaiPan/nvitop
GitHub
GitHub - XuehaiPan/nvitop: An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management. - XuehaiPan/nvitop
#cplusplus #compiler #cuda #jax #machine_learning #mlir #pytorch #runtime #spirv #tensorflow #vulkan
IREE is a tool that helps run Machine Learning (ML) models on different devices, from big data centers to small mobile and edge devices. It uses a special way to convert ML models into a uniform format, making it easier to deploy them anywhere. This tool is still in the early stages but is being actively improved. Using IREE can help you scale your ML models efficiently across various platforms, making it beneficial for developers who need to deploy models in different environments.
https://github.com/iree-org/iree
IREE is a tool that helps run Machine Learning (ML) models on different devices, from big data centers to small mobile and edge devices. It uses a special way to convert ML models into a uniform format, making it easier to deploy them anywhere. This tool is still in the early stages but is being actively improved. Using IREE can help you scale your ML models efficiently across various platforms, making it beneficial for developers who need to deploy models in different environments.
https://github.com/iree-org/iree
GitHub
GitHub - iree-org/iree: A retargetable MLIR-based machine learning compiler and runtime toolkit.
A retargetable MLIR-based machine learning compiler and runtime toolkit. - iree-org/iree
#python #amd #cuda #gpt #inference #inferentia #llama #llm #llm_serving #llmops #mlops #model_serving #pytorch #rocm #tpu #trainium #transformer #xpu
vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.
https://github.com/vllm-project/vllm
vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.
https://github.com/vllm-project/vllm
GitHub
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
❤1
#cplusplus #cuda #d3d12 #glsl #hlsl #shaders #vulkan
Slang is a shading language that helps developers create and manage large shader codebases easily and efficiently. It allows you to write shaders once and run them on various platforms like D3D12, Vulkan, Metal, and more, without needing to rewrite the code. Slang also lets you use the latest GPU features and supports neural graphics with automatic differentiation, making it useful for machine learning. It has a module system for organizing code, generics for specializing shaders, and easy integration with existing HLSL and GLSL codebases. Additionally, Slang offers comprehensive tooling support, including IntelliSense and debugging capabilities. This makes it easier to develop high-performance graphics applications across different platforms.
https://github.com/shader-slang/slang
Slang is a shading language that helps developers create and manage large shader codebases easily and efficiently. It allows you to write shaders once and run them on various platforms like D3D12, Vulkan, Metal, and more, without needing to rewrite the code. Slang also lets you use the latest GPU features and supports neural graphics with automatic differentiation, making it useful for machine learning. It has a module system for organizing code, generics for specializing shaders, and easy integration with existing HLSL and GLSL codebases. Additionally, Slang offers comprehensive tooling support, including IntelliSense and debugging capabilities. This makes it easier to develop high-performance graphics applications across different platforms.
https://github.com/shader-slang/slang
GitHub
GitHub - shader-slang/slang: Making it easier to work with shaders
Making it easier to work with shaders. Contribute to shader-slang/slang development by creating an account on GitHub.
5
#cplusplus #cublas #cuda #cudnn #gpu #mlops #networking #nvml #remote_access
SCUDA is a tool that lets you use GPUs from other computers over the internet. This means you can run programs that need powerful GPUs on your local machine, even if it doesn't have one. Here’s how it helps: You can test and develop applications using remote GPUs, train machine learning models from your laptop, perform complex data processing tasks, and even fine-tune pre-trained models without needing a powerful GPU locally. This makes it easier to work with GPUs without having to physically have one, saving time and resources.
https://github.com/kevmo314/scuda
SCUDA is a tool that lets you use GPUs from other computers over the internet. This means you can run programs that need powerful GPUs on your local machine, even if it doesn't have one. Here’s how it helps: You can test and develop applications using remote GPUs, train machine learning models from your laptop, perform complex data processing tasks, and even fine-tune pre-trained models without needing a powerful GPU locally. This makes it easier to work with GPUs without having to physically have one, saving time and resources.
https://github.com/kevmo314/scuda
GitHub
GitHub - kevmo314/scuda: SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines. - kevmo314/scuda
#python #cuda #deepseek #deepseek_llm #deepseek_v3 #inference #llama #llama2 #llama3 #llama3_1 #llava #llm #llm_serving #moe #pytorch #transformer #vlm
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
GitHub
GitHub - sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models.
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
#cplusplus #cpp #cuda #deep_learning #deep_learning_library #gpu #nvidia
CUTLASS is a powerful tool for high-performance matrix operations on NVIDIA GPUs. It helps developers create efficient code by breaking down complex tasks into reusable parts, making it easier to build custom applications. CUTLASS supports various data types and architectures, including the new Blackwell SM100 architecture, which means users can optimize their programs for different hardware. This flexibility and support for advanced features like Tensor Cores improve performance significantly, benefiting users who need fast computations in fields like AI and scientific computing.
https://github.com/NVIDIA/cutlass
CUTLASS is a powerful tool for high-performance matrix operations on NVIDIA GPUs. It helps developers create efficient code by breaking down complex tasks into reusable parts, making it easier to build custom applications. CUTLASS supports various data types and architectures, including the new Blackwell SM100 architecture, which means users can optimize their programs for different hardware. This flexibility and support for advanced features like Tensor Cores improve performance significantly, benefiting users who need fast computations in fields like AI and scientific computing.
https://github.com/NVIDIA/cutlass
GitHub
GitHub - NVIDIA/cutlass: CUDA Templates and Python DSLs for High-Performance Linear Algebra
CUDA Templates and Python DSLs for High-Performance Linear Algebra - NVIDIA/cutlass
👍1
#cplusplus #cuda #cutlass #gpu #pytorch
Flux is a library that helps speed up machine learning on GPUs by overlapping communication and computation tasks. It supports various parallelisms in model training and inference, making it compatible with PyTorch and different Nvidia GPU architectures. This means you can train models faster because Flux combines the steps of sending data between GPUs (communication) and doing calculations (computation), allowing them to happen at the same time. This overlap reduces overall training time, which is beneficial for users working with large or complex models.
https://github.com/bytedance/flux
Flux is a library that helps speed up machine learning on GPUs by overlapping communication and computation tasks. It supports various parallelisms in model training and inference, making it compatible with PyTorch and different Nvidia GPU architectures. This means you can train models faster because Flux combines the steps of sending data between GPUs (communication) and doing calculations (computation), allowing them to happen at the same time. This overlap reduces overall training time, which is beneficial for users working with large or complex models.
https://github.com/bytedance/flux
GitHub
GitHub - bytedance/flux: A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A fast communication-overlapping library for tensor/expert parallelism on GPUs. - bytedance/flux
#cplusplus #cuda #gpu #machine_learning #machine_learning_algorithms #nvidia
cuML - RAPIDS Machine Learning Library
https://github.com/rapidsai/cuml
cuML - RAPIDS Machine Learning Library
https://github.com/rapidsai/cuml
GitHub
GitHub - rapidsai/cuml: cuML - RAPIDS Machine Learning Library
cuML - RAPIDS Machine Learning Library. Contribute to rapidsai/cuml development by creating an account on GitHub.
#cplusplus #assembly #assembly_language #avx512 #benchmark #coroutines #cpp #cpp_programming #cpp17 #cpp20 #cuda #gcc #google_benchmark #hpc #io_uring #linux_kernel #llvm #ptx #ranges #tutorial #tutorials
This repository helps developers improve their coding skills by showing how to write faster and more efficient code. It includes examples for C++, CUDA, and Assembly, focusing on performance optimization techniques. By using this resource, developers can learn how to avoid common pitfalls like performance bottlenecks and improve their coding habits. It also provides benchmarks to compare different coding methods, helping users choose the best approach for their projects. This can lead to significant speed improvements and better use of computer resources.
https://github.com/ashvardanian/less_slow.cpp
This repository helps developers improve their coding skills by showing how to write faster and more efficient code. It includes examples for C++, CUDA, and Assembly, focusing on performance optimization techniques. By using this resource, developers can learn how to avoid common pitfalls like performance bottlenecks and improve their coding habits. It also provides benchmarks to compare different coding methods, helping users choose the best approach for their projects. This can lead to significant speed improvements and better use of computer resources.
https://github.com/ashvardanian/less_slow.cpp
GitHub
GitHub - ashvardanian/less_slow.cpp: Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics…
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and use...
#cuda
DeepEP is a special communication library for Mixture-of-Experts (MoE) models. It helps these models work faster and more efficiently by improving how data is shared between different parts of the system. DeepEP supports low-precision operations and can handle data transfer between different types of connections, like NVLink and RDMA. This makes it very useful for both training and using AI models, especially when speed is important. Users benefit from faster processing times and better performance overall.
https://github.com/deepseek-ai/DeepEP
DeepEP is a special communication library for Mixture-of-Experts (MoE) models. It helps these models work faster and more efficiently by improving how data is shared between different parts of the system. DeepEP supports low-precision operations and can handle data transfer between different types of connections, like NVLink and RDMA. This makes it very useful for both training and using AI models, especially when speed is important. Users benefit from faster processing times and better performance overall.
https://github.com/deepseek-ai/DeepEP
GitHub
GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library
DeepEP: an efficient expert-parallel communication library - deepseek-ai/DeepEP
#rust #cuda #rust
ZLUDA is a software that lets you run CUDA programs, originally made for NVIDIA GPUs, on AMD Radeon RX 5000 series and newer GPUs without changing the programs. It aims to give near-native performance on non-NVIDIA hardware, making CUDA applications more accessible. Currently, ZLUDA is still being developed and mainly supports Geekbench tests, so it might not work perfectly with all applications yet. It works on Windows and Linux but not on MacOS. If you have an AMD GPU and want to try running CUDA apps without an NVIDIA card, ZLUDA could be very useful as it opens up more hardware options for CUDA software[3][5].
https://github.com/vosen/ZLUDA
ZLUDA is a software that lets you run CUDA programs, originally made for NVIDIA GPUs, on AMD Radeon RX 5000 series and newer GPUs without changing the programs. It aims to give near-native performance on non-NVIDIA hardware, making CUDA applications more accessible. Currently, ZLUDA is still being developed and mainly supports Geekbench tests, so it might not work perfectly with all applications yet. It works on Windows and Linux but not on MacOS. If you have an AMD GPU and want to try running CUDA apps without an NVIDIA card, ZLUDA could be very useful as it opens up more hardware options for CUDA software[3][5].
https://github.com/vosen/ZLUDA
GitHub
GitHub - vosen/ZLUDA: CUDA on non-NVIDIA GPUs
CUDA on non-NVIDIA GPUs. Contribute to vosen/ZLUDA development by creating an account on GitHub.
#c_lang #cuda #cuda_driver_api #cuda_kernels #cuda_opengl
You can use the CUDA Samples from NVIDIA to learn and test CUDA Toolkit 12.9 features by downloading them from GitHub or as a ZIP file. These samples show how to use CUDA for GPU programming, including utilities, concepts, libraries, and performance optimization. You build them with CMake on Linux, Windows, or Tegra devices, and can run tests automatically with a provided Python script. This helps you understand CUDA programming, debug GPU code, and optimize your applications for better performance on NVIDIA GPUs. It’s a practical way to develop and improve GPU-accelerated software efficiently.
https://github.com/NVIDIA/cuda-samples
You can use the CUDA Samples from NVIDIA to learn and test CUDA Toolkit 12.9 features by downloading them from GitHub or as a ZIP file. These samples show how to use CUDA for GPU programming, including utilities, concepts, libraries, and performance optimization. You build them with CMake on Linux, Windows, or Tegra devices, and can run tests automatically with a provided Python script. This helps you understand CUDA programming, debug GPU code, and optimize your applications for better performance on NVIDIA GPUs. It’s a practical way to develop and improve GPU-accelerated software efficiently.
https://github.com/NVIDIA/cuda-samples
GitHub
GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Samples for CUDA Developers which demonstrates features in CUDA Toolkit - NVIDIA/cuda-samples