跳转到主要内容

标签(标签)

资源精选(342) Go开发(108) Go语言(103) Go(99) angular(82) LLM(75) 大语言模型(63) 人工智能(53) 前端开发(50) LangChain(43) golang(43) 机器学习(39) Go工程师(38) Go程序员(38) Go开发者(36) React(33) Go基础(29) Python(24) Vue(22) Web开发(20) Web技术(19) 精选资源(19) 深度学习(19) Java(18) ChatGTP(17) Cookie(16) android(16) 前端框架(13) JavaScript(13) Next.js(12) 安卓(11) 聊天机器人(10) typescript(10) 资料精选(10) NLP(10) 第三方Cookie(9) Redwoodjs(9) LLMOps(9) Go语言中级开发(9) 自然语言处理(9) PostgreSQL(9) 区块链(9) mlops(9) 安全(9) 全栈开发(8) ChatGPT(8) OpenAI(8) Linux(8) AI(8) GraphQL(8) iOS(8) 软件架构(7) Go语言高级开发(7) AWS(7) C++(7) 数据科学(7) whisper(6) Prisma(6) 隐私保护(6) RAG(6) JSON(6) DevOps(6) 数据可视化(6) wasm(6) 计算机视觉(6) 算法(6) Rust(6) 微服务(6) 隐私沙盒(5) FedCM(5) 语音识别(5) Angular开发(5) 快速应用开发(5) 提示工程(5) Agent(5) LLaMA(5) 低代码开发(5) Go测试(5) gorm(5) REST API(5) 推荐系统(5) WebAssembly(5) GameDev(5) CMS(5) CSS(5) machine-learning(5) 机器人(5) 游戏开发(5) Blockchain(5) Web安全(5) Kotlin(5) 低代码平台(5) 机器学习资源(5) Go资源(5) Nodejs(5) PHP(5) Swift(5) 智能体(4) devin(4) Blitz(4) javascript框架(4) Redwood(4) GDPR(4) 生成式人工智能(4) Angular16(4) Alpaca(4) 编程语言(4) SAML(4) JWT(4) JSON处理(4) Go并发(4) kafka(4) 移动开发(4) 移动应用(4) security(4) 隐私(4) spring-boot(4) 物联网(4) nextjs(4) 网络安全(4) API(4) Ruby(4) 信息安全(4) flutter(4) 专家智能体(3) Chrome(3) CHIPS(3) 3PC(3) SSE(3) 人工智能软件工程师(3) LLM Agent(3) Remix(3) Ubuntu(3) GPT4All(3) 软件开发(3) 问答系统(3) 开发工具(3) 最佳实践(3) RxJS(3) SSR(3) Node.js(3) Dolly(3) 移动应用开发(3) 低代码(3) IAM(3) Web框架(3) CORS(3) 基准测试(3) Go语言数据库开发(3) Oauth2(3) 并发(3) 主题(3) Theme(3) earth(3) nginx(3) 软件工程(3) azure(3) keycloak(3) 生产力工具(3) gpt3(3) 工作流(3) C(3) jupyter(3) 认证(3) prometheus(3) GAN(3) Spring(3) 逆向工程(3) 应用安全(3) Docker(3) Django(3) R(3) .NET(3) 大数据(3) Hacking(3) 渗透测试(3) C++资源(3) Mac(3) 微信小程序(3) Python资源(3) JHipster(3) 大型语言模型(2) 语言模型(2) 可穿戴设备(2) JDK(2) SQL(2) Apache(2) Hashicorp Vault(2) Spring Cloud Vault(2) Go语言Web开发(2) Go测试工程师(2) WebSocket(2) 容器化(2) AES(2) 加密(2) 输入验证(2) ORM(2) Fiber(2) Postgres(2) Gorilla Mux(2) Go数据库开发(2) 模块(2) 泛型(2) 指针(2) HTTP(2) PostgreSQL开发(2) Vault(2) K8s(2) Spring boot(2) R语言(2) 深度学习资源(2) 半监督学习(2) semi-supervised-learning(2) architecture(2) 普罗米修斯(2) 嵌入模型(2) productivity(2) 编码(2) Qt(2) 前端(2) Rust语言(2) NeRF(2) 神经辐射场(2) 元宇宙(2) CPP(2) 数据分析(2) spark(2) 流处理(2) Ionic(2) 人体姿势估计(2) human-pose-estimation(2) 视频处理(2) deep-learning(2) kotlin语言(2) kotlin开发(2) burp(2) Chatbot(2) npm(2) quantum(2) OCR(2) 游戏(2) game(2) 内容管理系统(2) MySQL(2) python-books(2) pentest(2) opengl(2) IDE(2) 漏洞赏金(2) Web(2) 知识图谱(2) PyTorch(2) 数据库(2) reverse-engineering(2) 数据工程(2) swift开发(2) rest(2) robotics(2) ios-animation(2) 知识蒸馏(2) 安卓开发(2) nestjs(2) solidity(2) 爬虫(2) 面试(2) 容器(2) C++精选(2) 人工智能资源(2) Machine Learning(2) 备忘单(2) 编程书籍(2) angular资源(2) 速查表(2) cheatsheets(2) SecOps(2) mlops资源(2) R资源(2) DDD(2) 架构设计模式(2) 量化(2) Hacking资源(2) 强化学习(2) flask(2) 设计(2) 性能(2) Sysadmin(2) 系统管理员(2) Java资源(2) 机器学习精选(2) android资源(2) android-UI(2) Mac资源(2) iOS资源(2) Vue资源(2) flutter资源(2) JavaScript精选(2) JavaScript资源(2) Rust开发(2) deeplearning(2) RAD(2)

Probably the best curated list of data science software in Python

Contents

Machine Learning

General Purpouse Machine Learning

  • scikit-learn - Machine learning in Python. sklearn
  • Shogun - Machine learning toolbox.
  • xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
  • cuML - RAPIDS Machine Learning Library. sklearn GPU accelerated
  • modAL - Modular active learning framework for Python3. sklearn
  • Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. sklearn Apache Spark based
  • mlpack - A scalable C++ machine learning library (Python bindings).
  • dlib - Toolkit for making real world machine learning and data analysis applications in C++ (Python bindings).
  • MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. sklearn
  • hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. sklearn PyTorch based/compatible
  • Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. sklearn
  • scikit-multilearn - Multi-label classification for python. sklearn
  • seqlearn - Sequence classification toolkit for Python. sklearn
  • pystruct - Simple structured learning framework for Python. sklearn
  • sklearn-expertsys - Highly interpretable classifiers for scikit learn. sklearn
  • RuleFit - Implementation of the rulefit. sklearn
  • metric-learn - Metric learning algorithms in Python. sklearn
  • pyGAM - Generalized Additive Models in Python.
  • Karate Club - An unsupervised machine learning library for graph structured data.
  • Little Ball of Fur - A library for sampling graph structured data.
  • causalml - Uplift modeling and causal inference with machine learning algorithms. sklearn
  • Deepchecks - Validation & testing of ML models and data during model development, deployment, and production. sklearn

Automated Machine Learning

  • TPOT - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. sklearn
  • auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. sklearn
  • MLBox - A powerful Automated Machine Learning python library.

Ensemble Methods

  • ML-Ensemble - High performance ensemble learning. sklearn
  • Stacking - Simple and useful stacking library, written in Python. sklearn
  • stacked_generalization - Library for machine learning stacking generalization. sklearn
  • vecstack - Python package for stacking (machine learning technique). sklearn

Imbalanced Datasets

  • imbalanced-learn - Module to perform under sampling and over sampling with various techniques. sklearn
  • imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. sklearn sklearn

Random Forests

Extreme Learning Machine

  • Python-ELM - Extreme Learning Machine implementation in Python. sklearn
  • Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
  • hpelm - High performance implementation of Extreme Learning Machines (fast randomized neural networks). GPU accelerated

Kernel Methods

  • pyFM - Factorization machines in python. sklearn
  • fastFM - A library for Factorization Machines. sklearn
  • tffm - TensorFlow implementation of an arbitrary order Factorization Machine. sklearn sklearn
  • liquidSVM - An implementation of SVMs.
  • scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. sklearn
  • ThunderSVM - A fast SVM Library on GPUs and CPUs. sklearn GPU accelerated

Gradient Boosting

  • XGBoost - Scalable, Portable and Distributed Gradient Boosting. sklearn GPU accelerated
  • LightGBM - A fast, distributed, high performance gradient boosting. sklearn GPU accelerated
  • CatBoost - An open-source gradient boosting on decision trees library. sklearn GPU accelerated
  • ThunderGBM - Fast GBDTs and Random Forests on GPUs. sklearn GPU accelerated

Deep Learning

PyTorch

  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch based/compatible
  • torchvision - Datasets, Transforms and Models specific to Computer Vision. PyTorch based/compatible
  • torchtext - Data loaders and abstractions for text and NLP. PyTorch based/compatible
  • torchaudio - An audio library for PyTorch. PyTorch based/compatible
  • ignite - High-level library to help with training neural networks in PyTorch. PyTorch based/compatible
  • PyToune - A Keras-like framework and utilities for PyTorch.
  • skorch - A scikit-learn compatible neural network library that wraps pytorch. sklearn PyTorch based/compatible
  • PyTorchNet - An abstraction to train neural networks. PyTorch based/compatible
  • pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. PyTorch based/compatible
  • Catalyst - High-level utils for PyTorch DL & RL research. PyTorch based/compatible
  • pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. PyTorch based/compatible

TensorFlow

  • TensorFlow - Computation using data flow graphs for scalable machine learning by Google. sklearn
  • TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. sklearn
  • TFLearn - Deep learning library featuring a higher-level API for TensorFlow. sklearn
  • Sonnet - TensorFlow-based neural network library. sklearn
  • tensorpack - A Neural Net Training Interface on TensorFlow. sklearn
  • Polyaxon - A platform that helps you build, manage and monitor deep learning models. sklearn
  • NeuPy - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: Theano compatible). sklearn
  • tfdeploy - Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. sklearn
  • tensorflow-upstream - TensorFlow ROCm port. sklearn Possible to run on AMD GPU
  • TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. sklearn
  • tensorlm - Wrapper library for text generation / language models at char and word level with RNN. sklearn
  • TensorLight - A high-level framework for TensorFlow. sklearn
  • Mesh TensorFlow - Model Parallelism Made Easier. sklearn
  • Ludwig - A toolbox, that allows to train and test deep learning models without the need to write code. sklearn
  • Keras - A high-level neural networks API running on top of TensorFlow. Keras compatible
  • keras-contrib - Keras community contributions. Keras compatible
  • Hyperas - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter. Keras compatible
  • Elephas - Distributed Deep learning with Keras & Spark. Keras compatible
  • Hera - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Keras compatible
  • Spektral - Deep learning on graphs. Keras compatible
  • qkeras - A quantization deep learning library. Keras compatible

MXNet

  • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. MXNet based
  • Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). MXNet based
  • MXbox - Simple, efficient and flexible vision toolbox for mxnet framework. MXNet based
  • gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. MXNet based
  • gluon-nlp - NLP made easy. MXNet based
  • Xfer - Transfer Learning library for Deep Neural Networks. MXNet based
  • MXNet - HIP Port of MXNet. MXNet based Possible to run on AMD GPU

Others

  • Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
  • autograd - Efficiently computes derivatives of numpy code.
  • Myia - Deep Learning framework (pre-alpha).
  • nnabla - Neural Network Libraries by Sony.
  • Caffe - A fast open framework for deep learning.
  • hipCaffe - The HIP port of Caffe. Possible to run on AMD GPU

DISCONTINUED PROJECTS

Web Scraping

  • BeautifulSoup: The easiest library to scrape static websites for beginners
  • Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the coure
  • Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
  • Pattern: High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
  • twitterscraper: Efficient library to scrape twitter

Data Manipulation

Data Containers

  • pandas - Powerful Python data analysis toolkit.
  • pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
  • cuDF - GPU DataFrame Library. pandas compatible GPU accelerated
  • blaze - NumPy and pandas interface to Big Data. pandas compatible
  • pandasql - Allows you to query pandas DataFrames using SQL syntax. pandas compatible
  • pandas-gbq - pandas Google Big Query. pandas compatible
  • xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
  • pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. Apache Spark based
  • Arctic - High performance datastore for time series and tick data.
  • datatable - Data.table for Python. R inspired/ported lib
  • koalas - pandas API on Apache Spark. pandas compatible
  • modin - Speed up your pandas workflows by changing a single line of code. pandas compatible
  • swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
  • pandas_flavor - A package which allow to write your own flavor of Pandas easily.
  • pandas-log - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues.
  • vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.

Pipelines

  • pdpipe - Sasy pipelines for pandas DataFrames.
  • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.
  • pandas-ply - Functional data manipulation for pandas. pandas compatible
  • Dplython - Dplyr for Python. R inspired/ported lib
  • sklearn-pandas - pandas integration with sklearn. sklearn pandas compatible
  • Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
  • pyjanitor - Clean APIs for data cleaning. pandas compatible
  • meza - A Python toolkit for processing tabular data.
  • Prodmodel - Build system for data science pipelines.
  • dopanda - Hints and tips for using pandas in an analysis environment. pandas compatible
  • CircleCi: Automates your software builds, tests, and deployments.

Feature Engineering

General

  • Featuretools - Automated feature engineering.
  • skl-groups - A scikit-learn addon to operate on set/"group"-based features. sklearn
  • Feature Forge - A set of tools for creating and testing machine learning feature. sklearn
  • few - A feature engineering wrapper for sklearn. sklearn
  • scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. sklearn
  • tsfresh - Automatic extraction of relevant features from time series. sklearn

Feature Selection

  • scikit-feature - Feature selection repository in python.
  • boruta_py - Implementations of the Boruta all-relevant feature selection method. sklearn
  • BoostARoota - A fast xgboost feature selection algorithm. sklearn
  • scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. sklearn

Visualization

General Purposes

  • Matplotlib - Plotting with Python.
  • seaborn - Statistical data visualization using matplotlib.
  • prettyplotlib - Painlessly create beautiful matplotlib plots.
  • python-ternary - Ternary plotting library for python with matplotlib.
  • missingno - Missing data visualization module for Python.
  • chartify - Python library that makes it easy for data scientists to create charts.
  • physt - Improved histograms.

Interactive plots

  • animatplot - A python package for animating plots build on matplotlib.
  • plotly - A Python library that makes interactive and publication-quality graphs.
  • Bokeh - Interactive Web Plotting for Python.
  • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
  • bqplot - Plotting library for IPython/Jupyter notebooks
  • pyecharts - Migrated from Echarts, a charting and visualization library, to Python's interactive visual drawing library.pyecharts echarts

Map

  • folium - Makes it easy to visualize data on an interactive open street map
  • geemap - Python package for interactive mapping with Google Earth Engine (GEE)

Automatic Plotting

  • HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
  • AutoViz: Visualize data automatically with 1 line of code (ideal for machine learning)
  • SweetViz: Visualize and compare datasets, target values and associations, with one line of code.

NLP

  • pyLDAvis: Visualize interactive topic model

Deployment

  • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
  • binder - Enable sharing and execute Jupyter Notebooks
  • fastapi - Modern, fast (high-performance), web framework for building APIs with Python
  • streamlit - Make it easy to deploy machine learning model

Model Explanation

  • Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
  • Alibi - Algorithms for monitoring and explaining machine learning models.
  • anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
  • aequitas - Bias and Fairness Audit Toolkit.
  • Contrastive Explanation - Contrastive Explanation (Foil Trees). sklearn
  • yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. sklearn
  • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. sklearn
  • shap - A unified approach to explain the output of any machine learning model. sklearn
  • ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
  • Lime - Explaining the predictions of any machine learning classifier. sklearn
  • FairML - FairML is a python toolbox auditing the machine learning models for bias. sklearn
  • L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
  • PDPbox - Partial dependence plot toolbox.
  • pyBreakDown - Python implementation of R package breakDown. sklearnR inspired/ported lib
  • PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
  • Skater - Python Library for Model Interpretation.
  • model-analysis - Model analysis tools for TensorFlow. sklearn
  • themis-ml - A library that implements fairness-aware machine learning algorithms. sklearn
  • treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. sklearn
  • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
  • Auralisation - Auralisation of learned features in CNN (for audio).
  • CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
  • lucid - A collection of infrastructure and tools for research in neural network interpretability.
  • Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
  • FlashLight - Visualization Tool for your NeuralNetwork.
  • tensorboard-pytorch - Tensorboard for pytorch (and chainer, mxnet, numpy, ...).
  • mxboard - Logging MXNet data for visualization in TensorBoard. MXNet based

Reinforcement Learning

  • OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • Coach - Easy experimentation with state of the art Reinforcement Learning algorithms.
  • garage - A toolkit for reproducible reinforcement learning research.
  • OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
  • Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
  • RLlib - Scalable Reinforcement Learning.
  • Horizon - A platform for Applied Reinforcement Learning.
  • TF-Agents - A library for Reinforcement Learning in TensorFlow. sklearn
  • TensorForce - A TensorFlow library for applied reinforcement learning. sklearn
  • TRFL - TensorFlow Reinforcement Learning. sklearn
  • Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
  • keras-rl - Deep Reinforcement Learning for Keras. Keras compatible
  • ChainerRL - A deep reinforcement learning library built on top of Chainer.

Probabilistic Methods

  • pomegranate - Probabilistic and graphical models for Python. GPU accelerated
  • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. PyTorch based/compatible
  • ZhuSuan - Bayesian Deep Learning. sklearn
  • PyMC - Bayesian Stochastic Modelling in Python.
  • PyMC3 - Python package for Bayesian statistical modeling and Probabilistic Machine Learning. Theano compatible
  • sampled - Decorator for reusable models in PyMC3.
  • Edward - A library for probabilistic modeling, inference, and criticism. sklearn
  • InferPy - Deep Probabilistic Modelling Made Easy. sklearn
  • GPflow - Gaussian processes in TensorFlow. sklearn
  • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
  • sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. sklearn
  • skggm - Estimation of general graphical models. sklearn
  • pgmpy - A python library for working with Probabilistic Graphical Models.
  • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institutesklearn
  • Aboleth - A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation. sklearn
  • PtStat - Probabilistic Programming and Statistical Inference in PyTorch. PyTorch based/compatible
  • PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. PyTorch based/compatible
  • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
  • hsmmlearn - A library for hidden semi-Markov models with explicit durations.
  • pyhsmm - Bayesian inference in HSMMs and HMMs.
  • GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. PyTorch based/compatible
  • MXFusion - Modular Probabilistic Programming on MXNet. MXNet based
  • sklearn-crfsuite - A scikit-learn inspired API for CRFsuite. sklearn

Genetic Programming

  • gplearn - Genetic Programming in Python. sklearn
  • DEAP - Distributed Evolutionary Algorithms in Python.
  • karoo_gp - A Genetic Programming platform for Python with GPU support. sklearn
  • monkeys - A strongly-typed genetic programming framework for Python.
  • sklearn-genetic - Genetic feature selection module for scikit-learn. sklearn

Optimization

  • Spearmint - Bayesian optimization.
  • BoTorch - Bayesian optimization in PyTorch. PyTorch based/compatible
  • scikit-opt - Heuristic Algorithms for optimization.
  • SMAC3 - Sequential Model-based Algorithm Configuration.
  • Optunity - Is a library containing various optimizers for hyperparameter tuning.
  • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn. sklearn
  • sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. sklearn
  • sigopt_sklearn - SigOpt wrappers for scikit-learn methods. sklearn
  • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
  • SafeOpt - Safe Bayesian Optimization.
  • scikit-optimize - Sequential model-based optimization with a scipy.optimize interface.
  • Solid - A comprehensive gradient-free optimization framework written in Python.
  • PySwarms - A research toolkit for particle swarm optimization in Python.
  • Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
  • GPflowOpt - Bayesian Optimization using GPflow. sklearn
  • POT - Python Optimal Transport library.
  • Talos - Hyperparameter Optimization for Keras Models.
  • nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).

Time Series

  • sktime - A unified framework for machine learning with time series. sklearn
  • tslearn - Machine learning toolkit dedicated to time-series data. sklearn
  • tick - Module for statistical learning, with a particular emphasis on time-dependent modelling. sklearn
  • Prophet - Automatic Forecasting Procedure.
  • PyFlux - Open source time series library for Python.
  • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
  • luminol - Anomaly Detection and Correlation library.
  • dateutil - Powerful extensions to the standard datetime module
  • maya - makes it very easy to parse a string and for changing timezones

Natural Language Processing

  • NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
  • CLTK - The Classical Language Toolkik.
  • gensim - Topic Modelling for Humans.
  • PSI-Toolkit - A natural language processing toolkit.
  • pyMorfologik - Python binding for Morfologik.
  • skift - Scikit-learn wrappers for Python fastText. sklearn
  • Phonemizer - Simple text to phonemes converter for multiple languages.
  • flair - Very simple framework for state-of-the-art NLP.
  • spaCy - Industrial-Strength Natural Language Processing.

Computer Audition

  • librosa - Python library for audio and music analysis.
  • Yaafe - Audio features extraction.
  • aubio - A library for audio and music analysis.
  • Essentia - Library for audio and music analysis, description and synthesis.
  • LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
  • Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals.
  • muda - A library for augmenting annotated audio data.
  • madmom - Python audio and music signal processing library.

Computer Vision

  • OpenCV - Open Source Computer Vision Library.
  • scikit-image - Image Processing SciKit (Toolbox for SciPy).
  • imgaug - Image augmentation for machine learning experiments.
  • imgaug_extension - Additional augmentations for imgaug.
  • Augmentor - Image augmentation library in Python for machine learning.
  • albumentations - Fast image augmentation library and easy to use wrapper around other libraries.

Statistics

  • pandas_summary - Extension to pandas dataframes describe function. pandas compatible
  • Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. pandas compatible
  • statsmodels - Statistical modeling and econometrics in Python.
  • stockstats - Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.
  • weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
  • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
  • Alphalens - Performance analysis of predictive (alpha) stock factors.

Distributed Computing

  • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. sklearn
  • PySpark - Exposes the Spark programming model to Python. Apache Spark based
  • Veles - Distributed machine learning platform.
  • Jubatus - Framework and Library for Distributed Online Machine Learning.
  • DMTK - Microsoft Distributed Machine Learning Toolkit.
  • PaddlePaddle - PArallel Distributed Deep LEarning.
  • dask-ml - Distributed and parallel machine learning. sklearn
  • Distributed - Distributed computation in Python.

Experimentation

  • Sacred - A tool to help you configure, organize, log and reproduce experiments.
  • Xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
  • Persimmon - A visual dataflow programming language for sklearn.
  • Ax - Adaptive Experimentation Platform. sklearn
  • Neptune - A lightweight ML experiment tracking, results visualization and management tool.

Evaluation

  • recmetrics - Library of useful metrics and plots for evaluating recommender systems.
  • Metrics - Machine learning evaluation metric.
  • sklearn-evaluation - Model evaluation made easy: plots, tables and markdown reports. sklearn
  • AI Fairness 360 - Fairness metrics for datasets and ML models, explanations and algorithms to mitigate bias in datasets and models.

Computations

  • numpy - The fundamental package needed for scientific computing with Python.
  • Dask - Parallel computing with task scheduling. pandas compatible
  • bottleneck - Fast NumPy array functions written in C.
  • CuPy - NumPy-like API accelerated with CUDA.
  • scikit-tensor - Python library for multilinear algebra and tensor factorizations.
  • numdifftools - Solve automatic numerical differentiation problems in one or more variables.
  • quaternion - Add built-in support for quaternions to numpy.
  • adaptive - Tools for adaptive and parallel samping of mathematical functions.

Spatial Analysis

  • GeoPandas - Python tools for geographic data. pandas compatible
  • PySal - Python Spatial Analysis Library.

Quantum Computing

  • PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
  • QML - A Python Toolkit for Quantum Machine Learning.

Conversion

  • sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
  • ONNX - Open Neural Network Exchange.
  • MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.

原文:https://github.com/krzjoa/awesome-python-data-science