Friday, 9 August 2024

5 Hidden Gems in Pandas You Should Start Using Today

Python Coding August 09, 2024 Data Science, Python No comments

1. query() Method for Filtering Data

What it is: The query() method allows you to filter data in a DataFrame using a more readable and concise string-based expression.

Why it's useful: It avoids the verbosity of standard indexing and makes the code more readable, especially for complex conditions.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4],

'B': [10, 20, 30, 40]})

result = df.query('A > 2 & B < 40')

print(result)

#clcoding.com

A B

2 3 30

2. eval() Method for Efficient Calculations

What it is: The eval() method evaluates a string expression within the context of a DataFrame, allowing for efficient computation.

Why it's useful: It can speed up operations involving arithmetic or logical operations on DataFrame columns, especially with large datasets.

df['C'] = df.eval('A + B')

print(df)

#clcoding.com

A B C

0 1 10 11

1 2 20 22

2 3 30 33

3 4 40 44

3. at and iat for Fast Access

What it is: at and iat are optimized methods for accessing scalar values in a DataFrame.

Why it's useful: These methods are much faster than using .loc[] or .iloc[] for individual cell access, making them ideal for performance-critical code.

value = df.at[2, 'B']

print(value)

#clcoding.com

4. pipe() Method for Method Chaining

What it is: The pipe() method allows you to apply a function or sequence of functions to a DataFrame within a method chain.

Why it's useful: It improves code readability by keeping the DataFrame operations within a single fluent chain.

def add_constant(df, value):

return df + value

df = df.pipe(add_constant, 10)

print(df)

#clcoding.com

A B C

0 11 20 21

1 12 30 32

2 13 40 43

3 14 50 54

5. explode() for Expanding Lists in Cells

What it is: The explode() method expands a list-like column into separate rows.

Why it's useful: This is particularly useful when working with data that has embedded lists within cells and you need to analyze or visualize each item individually.

df = pd.DataFrame({'A': [1, 2],

'B': [[10, 20, 30], [40, 50]]})

df_exploded = df.explode('B')

print(df_exploded)

#clcoding.com

A B

0 1 10

0 1 20

0 1 30

1 2 40

1 2 50