Tags
-
plotting
Pandas allows us to create a number of different plot types, using an API that wraps around Matplotlib. We can create line, bar, pie, scatter, and box plots, among others.
241 issues
-
grouping
Using Pandas groupby to split data into groups, apply aggregate functions, and combine results for powerful data summarization.
167 issues
-
datetime
Pandas offers a wide set of functionality having to do with dates and times, from extracting values to calculating the differences between dates, to time series, to resampling.
139 issues
-
csv
Reading from and writing to CSV files with Pandas, including options for delimiters, encoding, headers, and handling large files.
107 issues
-
pivot-table
A pivot table redisplays information from a data frame in a new way: It uses the unique values from one column as its index, the unique values from a second column for its columns, and then applies an aggregation function to all of the values in each row-column intersection.
103 issues
-
excel
Reading from and writing to Excel spreadsheets with Pandas, including working with multiple sheets, formatting, and large workbooks.
101 issues
-
window-functions
Window functions perform aggregate operations across either a growing number of rows or a "sliding window" of rows.
82 issues
-
joins
Joining data frames together with "join" and "merge"
77 issues
-
strings
Working with text data in Pandas using the str accessor for operations like splitting, stripping, case conversion, and pattern matching.
71 issues
-
cleaning
Techniques for cleaning messy data in Pandas, including handling duplicates, fixing data types, correcting errors, and standardizing values.
68 issues
-
multi-index
In Pandas, the rows and columns are labeled with an index. But each of them can have more than one dimension, known as a "multi-index."
65 issues
-
multiple-files
Loading and combining data from multiple files into a single Pandas data frame using concat, glob patterns, and loops.
58 issues
-
plotly
Creating interactive, web-based visualizations from Pandas data using Plotly and Plotly Express.
57 issues
-
regular-expressions
Using regular expressions with Pandas string methods to search, extract, replace, and validate text patterns in data.
57 issues
-
filtering
Selecting subsets of data in Pandas using boolean indexing, loc, iloc, query, and other filtering techniques.
54 issues
-
web-scraping
Extracting data from websites and HTML pages into Pandas data frames using tools like read_html, Beautiful Soup, and requests.
35 issues
-
correlations
Calculating and interpreting correlations between columns in Pandas, using methods like Pearson, Spearman, and Kendall.
30 issues
-
sorting
Sorting data frames and series in Pandas by values or index, including multi-column sorts and custom sort orders.
27 issues
-
api
Retrieving data from web APIs and loading it into Pandas data frames for analysis, including REST APIs and public data sources.
25 issues
-
pyarrow
Using PyArrow as a backend for Pandas to improve performance, memory efficiency, and support for additional data types.
24 issues
-
memory-optimization
Techniques for reducing memory usage in Pandas, including choosing efficient dtypes, chunked reading, and categorical data.
24 issues
-
office-hours
Questions and answers from Bamboo Weekly office hours sessions, covering a variety of Pandas topics raised by subscribers.
23 issues
-
stack-unstack
Reshaping data frames using Pandas stack and unstack to pivot between long and wide formats via multi-level indexes.
21 issues
-
pipe
Using the Pandas pipe method to chain custom functions into data processing pipelines for cleaner, more modular code.
20 issues
-
formatting
Controlling how Pandas displays data, including number formatting, column widths, decimal places, and string representation of data frames.
18 issues
-
comprehensions
Using Python list, dict, and set comprehensions to create and transform data efficiently alongside Pandas.
18 issues
-
apply-function
Using Pandas' apply method to run custom functions across rows, columns, or entire data frames for flexible data transformations.
16 issues
-
missing-data
Detecting, handling, and filling missing (NaN/None) values in Pandas using methods like fillna, dropna, and isna.
16 issues
-
json
Reading, writing, and normalizing JSON data with Pandas, including handling nested structures and JSON lines format.
12 issues
-
index-operations
Working with Pandas indexes, including setting, resetting, reindexing, and manipulating row and column labels.
12 issues
-
speed-optimization
Techniques for making Pandas code run faster, including vectorization, avoiding loops, and leveraging optimized operations.
12 issues
-
seaborn
Creating statistical visualizations from Pandas data using Seaborn, a high-level plotting library built on Matplotlib.
11 issues
-
interpolation
Filling in missing values in Pandas using interpolation methods such as linear, polynomial, and time-based techniques.
8 issues
-
geopandas
Working with geospatial data using GeoPandas, which extends Pandas with support for geographic operations, shapefiles, and spatial joins.
8 issues
-
styling
Applying visual styles to Pandas data frames using the Styler API, including conditional formatting, color maps, and bar charts in cells.
8 issues
-
text
Reading and processing plain text files and text data with Pandas, including parsing, tokenizing, and text analysis.
6 issues
-
polars
Using Polars, a fast DataFrame library written in Rust, as an alternative or complement to Pandas for high-performance data analysis.
6 issues
-
renaming-columns
Renaming columns in Pandas data frames using rename, set_axis, and direct assignment for clearer, more consistent column names.
4 issues
-
pdf-extraction
Extracting tables and data from PDF files and loading them into Pandas data frames using libraries like tabula-py and camelot.
4 issues
-
marimo
Using Marimo, a reactive Python notebook, as an alternative to Jupyter for interactive data exploration with Pandas.
4 issues
-
where
Using the Pandas where and mask methods to conditionally replace values in a data frame based on boolean conditions.
2 issues
-
fireducks
Using FireDucks, a high-performance drop-in replacement for Pandas that accelerates data frame operations.
2 issues
-
xarray
Using xarray for labeled, multi-dimensional arrays that extend Pandas-like functionality to higher-dimensional data.
2 issues
-
parquet
Reading and writing Parquet files with Pandas, a columnar storage format that offers fast I/O and efficient compression for large datasets.
2 issues
-
ai-queries
Using AI and large language models to query, analyze, and transform data in Pandas data frames.
2 issues
-
duckdb
Using DuckDB, an in-process analytical database, alongside Pandas for fast SQL-based querying of data frames and files.
2 issues
-
maps
Creating geographic maps and visualizations from Pandas data using libraries like Folium, Plotly, and GeoPandas.
2 issues
-
yaml
Reading and parsing YAML configuration and data files into Python structures for use with Pandas.
2 issues
-
agentic-coding
2 issues
-
cutting
Using Pandas cut and qcut to bin continuous data into discrete intervals or quantile-based buckets.
2 issues
-
outlier-detection
Identifying and handling outliers in Pandas data using statistical methods like z-scores, IQR, and visualization techniques.
2 issues
-
functions
Writing and using Python functions to organize and streamline data analysis workflows in Pandas.
2 issues
-
fixed-width-fields
Reading and parsing fixed-width text files with Pandas using read_fwf, where columns are defined by character positions rather than delimiters.
2 issues
-
chatbot
Building chatbots and conversational interfaces that use Pandas for data retrieval and analysis.
2 issues
-
dummy-values
Creating dummy (indicator) variables from categorical data using Pandas get_dummies for use in statistical modeling and machine learning.
2 issues
-
regression
Performing linear and other regression analyses on Pandas data to model relationships between variables and make predictions.
2 issues
-
encoding
Handling text encoding issues when reading and writing data in Pandas, including UTF-8, Latin-1, and other character encodings.
2 issues
-
sas
Reading SAS data files (.sas7bdat, .xpt) into Pandas and transitioning from SAS workflows to Python-based data analysis.
2 issues
-
method-chaining
Chaining multiple Pandas methods together in a single expression for cleaner, more readable data transformation pipelines.
2 issues
-
crosstabs
Using Pandas crosstab to compute cross-tabulations of two or more factors, summarizing relationships between categorical variables.
2 issues
-
great-tables
Using the Great Tables library to create beautiful, publication-quality tables from Pandas data frames.
2 issues
-
streamlit
Building interactive data apps and dashboards with Streamlit, powered by Pandas data frames.
2 issues
-
timing
Measuring and benchmarking the execution time of Pandas operations to identify bottlenecks and compare approaches.
1 issue