Friday, 9 August 2024

5 Hidden Gems in Pandas You Should Start Using Today

1. query() Method for Filtering Data
What it is: The query() method allows you to filter data in a DataFrame using a more readable and concise string-based expression.

Why it's useful: It avoids the verbosity of standard indexing and makes the code more readable, especially for complex conditions.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4], 
                   'B': [10, 20, 30, 40]})
result = df.query('A > 2 & B < 40')
print(result)

#clcoding.com
   A   B
2  3  30
2. eval() Method for Efficient Calculations
What it is: The eval() method evaluates a string expression within the context of a DataFrame, allowing for efficient computation.

Why it's useful: It can speed up operations involving arithmetic or logical operations on DataFrame columns, especially with large datasets.

df['C'] = df.eval('A + B')
print(df)

#clcoding.com
   A   B   C
0  1  10  11
1  2  20  22
2  3  30  33
3  4  40  44


3. at and iat for Fast Access
What it is: at and iat are optimized methods for accessing scalar values in a DataFrame.

Why it's useful: These methods are much faster than using .loc[] or .iloc[] for individual cell access, making them ideal for performance-critical code.

value = df.at[2, 'B']  
print(value)
#clcoding.com
30

4. pipe() Method for Method Chaining
What it is: The pipe() method allows you to apply a function or sequence of functions to a DataFrame within a method chain.

Why it's useful: It improves code readability by keeping the DataFrame operations within a single fluent chain.

def add_constant(df, value):
    return df + value

df = df.pipe(add_constant, 10)
print(df)

#clcoding.com
    A   B   C
0  11  20  21
1  12  30  32
2  13  40  43
3  14  50  54
5. explode() for Expanding Lists in Cells
What it is: The explode() method expands a list-like column into separate rows.

Why it's useful: This is particularly useful when working with data that has embedded lists within cells and you need to analyze or visualize each item individually.

df = pd.DataFrame({'A': [1, 2], 
                   'B': [[10, 20, 30], [40, 50]]})
df_exploded = df.explode('B')
print(df_exploded)

#clcoding.com
   A   B
0  1  10
0  1  20
0  1  30
1  2  40
1  2  50



0 Comments:

Post a Comment

Popular Posts

Categories

100 Python Programs for Beginner (3) AI (33) Android (24) AngularJS (1) Assembly Language (2) aws (17) Azure (7) BI (10) book (4) Books (146) C (77) C# (12) C++ (82) Course (67) Coursera (205) Cybersecurity (24) data management (11) Data Science (109) Data Strucures (8) Deep Learning (13) Django (14) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (6) flask (3) flutter (1) FPL (17) Google (27) Hadoop (3) HTML&CSS (47) IBM (25) IoT (1) IS (25) Java (93) Leet Code (4) Machine Learning (50) Meta (18) MICHIGAN (5) microsoft (4) Nvidia (1) Pandas (3) PHP (20) Projects (29) Python (898) Python Coding Challenge (285) Questions (2) R (70) React (6) Scripting (1) security (3) Selenium Webdriver (2) Software (17) SQL (42) UX Research (1) web application (8)

Followers

Person climbing a staircase. Learn Data Science from Scratch: online program with 21 courses