🧰️
PRO
Introducing: DAT Linux PRO tools. Enhance your DAT Linux with extra power-tools including back-up/restore, app update notifications, app monitoring, custom links tab, dark theme, etc. One payment, perpetual license. Get PRO now!

Introduction


DAT Linux is a Linux distribution for data science. It brings together all your favourite open-source data science tools and apps into a ready-to-run desktop environment. It’s based on Ubuntu 22.04, so it’s easy to install and use. The custom DAT Linux Control Panel provides a centralised one-stop-shop for running and managing dozens of data science programs. Read the FAQ.

📚️ Check out the DAT Linux curated list of free online data science e-books!

DAT Linux is perfect for students, professionals, academics, or anyone interested in data science who doesn’t want to spend endless hours downloading, installing, configuring, and maintaining applications from a range of sources, each with different technical requirements and set-up challenges.

👍 Recommend DAT Linux on DistroWatch

Get started:



Need a customised DAT Linux ISO for your school/college/university? Get your own branded, customised data science distro here.

List of supported data science apps:


💳️ Please subscribe/donate to help support DAT Linux development

App Description
/img/birt.png
BiRT Eclipse BIRT™ is an open source reporting system for producing compelling BI reports
/img/clickhouse.png
ClickHouse ClickHouse is an open-source column-oriented DBMS for online analytical processing
/img/datacleaner.png
Data Cleaner Data Quality toolkit that allows you to profile, correct, and enrich your data
/img/datasette.png
Datasette Datasette is a tool for exploring and publishing data visually and with SQL
/img/dbbrowser.png
DB Browser DB Browser for SQLite is a visual, open source tool to create, design, and edit database files compatible with SQLite
/img/dbeaver.png
DBeaver Free multi-platform database tool for developers, database administrators, analysts and all people who need to work with databases
/img/druid.png
Druid Apache Druid is a real-time database to power modern analytics applications
/img/dsearch.png
D-Search Convenient interface to the “webtools” R package to search for datasets in –all– CRAN packages
/img/duckdb.png
DuckDB DuckDB is an in-process SQL OLAP database management system
/img/egit.png
E-Git EGit is an Eclipse based GUI for the Git version control system
/img/emacs.png
Emacs+ESS Emacs Speaks Statistics (ESS) is an add-on package for GNU Emacs to interact with statistical analysis programs such as R, S-Plus, SAS, Stata and OpenBUGS/JAGS
/img/gephi.png
Gephi Gephi is the leading visualization and exploration software for all kinds of graphs and networks
/img/glueviz.png
Glue-viz Glue is a UI and Python library to explore relationships within and among related datasets
/img/gnumeric.png
Gnumeric Gnumeric is a spreadsheet program that is part of the GNOME Free Software Desktop Project
/img/gnuplot.png
GNU Plot gnuplot is a command-line and GUI program that can generate two- and three-dimensional plots of functions, data, and data fits
/img/grafana.png
Grafana Grafana is a popular open-source platform for data visualization and monitoring
/img/gvim.png
G-Vim A GUI wraper for the Vim screen-based text editor program, with plugins for R installed
/img/ipython.png
IPython A command shell for interactive computing with a convenient console launcher
/img/julia.png
Julia Julia is a high-level, high-performance, dynamic programming language
/img/jupyter.png
Jupyter Notebook The Jupyter Notebook is a web-based interactive, scientific computing platform
/img/jupyter_lab.png
Jupyter Lab JupyterLab is the latest web-based interactive development environment for notebooks, code, and data
/img/knime.png
KNIME KNIME Analytics Platform is open source software for data science
/img/labplot.png
LabPlot Free, open source and cross-platform Data Visualization and Analysis software accessible to everyone
/img/librecalc.png
LibreOffice Calc LibreOffice Calc is the spreadsheet component of the LibreOffice software package
/img/luigi.png
Luigi Luigi provides a framework to develop and manage data processing pipelines
/img/meld.png
Meld Meld is a visual file diff and merge tool
/img/metabase.png
Metabase Metabase is an open-source business intelligence tool
/img/moa.png
MOA MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms
/img/openrefine.png
OpenRefine OpenRefine is an open-source desktop application for data cleanup and transformation to other formats
/img/orange.png
Orange Orange is a powerful platform to perform data analysis and visualization
/img/paraview.png
Paraview ParaView is an open-source, multi-platform data analysis and visualization application
/img/pluto.png
Pluto A Pluto notebook is made up of small blocks of Julia code (cells) and together they form a reactive notebook
/img/pspp.png
PSPP GNU PSPP is a program for statistical analysis of sampled data. It is a free as in freedom replacement for the proprietary program SPSS
/img/qgis.png
QGIS QGIS is a Free and Open Source Geographic Information System
/img/quarto.png
Quarto Quarto® is an open-source scientific and technical publishing system built on Pandoc
/img/R.png
R R is a free software environment for statistical computing and graphics
/img/rstudio.png
R-Studio RStudio is an Integrated Development Environment (IDE) for R
/img/scilab.png
Scilab Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language
/img/spyder.png
Spyder Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts
/img/superset.png
Superset Apache Superset is a modern, enterprise-ready business intelligence web application
/img/tabula.png
Tabula Tabula is a free tool for extracting data from PDF files into CSV and Excel files
/img/veusz.png
Veusz Veusz is a scientific plotting and graphing program with a graphical user interface, designed to produce publication-ready 2D and 3D plots
/img/visidata.png
Visidata Visidata is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, which can handle millions of rows with ease
/img/vscodium.png
VSCodium VSCodium is a community-driven, freely-licensed binary distribution of Microsoft’s editor VS Code (ready with plugins for R/RMarkdown, Python/Jupyter, Julia)
/img/weka.png
Weka Weka is a GUI and collection of machine learning algorithms for data mining tasks
/img/wxmaxima.png
WxMaxima wxMaxima is a document based interface for the computer algebra system Maxima
/img/zeppelin.png
Zeppelin Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more

NUMPY BY EXAMPLE - A Beginner's Guide to Learning NumPy by the DAT Linux team.
📖️ Read it free online
or 🛒️ BUY the PDF or EPUB e-book from Leanpub.