#other #analytics #api #artificial_intelligence #aws #cheatsheet #data_science #data_wrangling #database #deep_learning #linux #machine_learning #neural_network #online_course #python #r #reinforcement_learning #scikit_learn #sql #statistics #visualization
https://github.com/tirthajyoti/Data-science-best-resources
https://github.com/tirthajyoti/Data-science-best-resources
GitHub
GitHub - tirthajyoti/Data-science-best-resources: Carefully curated resource links for data science in one place
Carefully curated resource links for data science in one place - tirthajyoti/Data-science-best-resources
#java #data_analysis #data_science #data_wrangling #datacleaning #datacleansing #datajournalism #datamining #journalism #opendata #reconciliation #wikidata
https://github.com/OpenRefine/OpenRefine
https://github.com/OpenRefine/OpenRefine
GitHub
GitHub - OpenRefine/OpenRefine: OpenRefine is a free, open source power tool for working with messy data and improving it
OpenRefine is a free, open source power tool for working with messy data and improving it - OpenRefine/OpenRefine
#rust #ckan #cli #csv #data_engineering #data_wrangling #datapackage #excel #geocode #luau #opendata #parquet #polars #postgresql #snappy #sql #sqlite #tsv
https://github.com/jqnatividad/qsv
https://github.com/jqnatividad/qsv
GitHub
GitHub - dathere/qsv: Blazing-fast Data-Wrangling toolkit
Blazing-fast Data-Wrangling toolkit. Contribute to dathere/qsv development by creating an account on GitHub.
❤1
#python #ai #cv #data_analytics #data_wrangling #embeddings #llm #llm_eval #machine_learning #mlops #multimodal
DataChain is a powerful tool for managing and processing large amounts of data, especially useful for artificial intelligence tasks. It helps you organize unstructured data from various sources like cloud storage or local files into structured datasets. You can process this data efficiently using Python, without needing SQL or Spark, and even use local AI models or APIs to enrich your data. Key benefits include parallel processing, out-of-memory computing, and optimized vector searches, making it faster and more efficient. Additionally, DataChain integrates well with popular libraries like PyTorch and TensorFlow, allowing you to easily export data for further analysis or training models. This makes it easier to handle complex data tasks and improves your overall workflow.
https://github.com/iterative/datachain
DataChain is a powerful tool for managing and processing large amounts of data, especially useful for artificial intelligence tasks. It helps you organize unstructured data from various sources like cloud storage or local files into structured datasets. You can process this data efficiently using Python, without needing SQL or Spark, and even use local AI models or APIs to enrich your data. Key benefits include parallel processing, out-of-memory computing, and optimized vector searches, making it faster and more efficient. Additionally, DataChain integrates well with popular libraries like PyTorch and TensorFlow, allowing you to easily export data for further analysis or training models. This makes it easier to handle complex data tasks and improves your overall workflow.
https://github.com/iterative/datachain
GitHub
GitHub - datachain-ai/datachain: Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images
Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images - datachain-ai/datachain