Friday, 20 September 2024

Convert PDF files to Excel files using Python

 

pip install pdfplumber pandas openpyxl


import pdfplumber

import pandas as pd


def pdf_to_excel(pdf_file, excel_file):

    

    with pdfplumber.open(pdf_file) as pdf:

        all_tables = []

        for page in pdf.pages:

            tables = page.extract_tables()

            for table in tables:

                if table:  

                    df = pd.DataFrame(table)

                    all_tables.append(df)


        if not all_tables:

            all_tables.append(pd.DataFrame([["No tables found"]]))


        with pd.ExcelWriter(excel_file, engine='openpyxl') as writer:

            for idx, df in enumerate(all_tables):

                df.to_excel(writer, sheet_name=f'Sheet{idx+1}', index=False)


pdf_to_excel('clcodingpdff.pdf', 'clcoding.xlsx')

0 Comments:

Post a Comment

Popular Posts

Categories

AI (32) Android (24) AngularJS (1) Assembly Language (2) aws (17) Azure (7) BI (10) book (4) Books (146) C (77) C# (12) C++ (82) Course (67) Coursera (198) Cybersecurity (24) data management (11) Data Science (106) Data Strucures (8) Deep Learning (13) Django (14) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (6) flask (3) flutter (1) FPL (17) Google (21) Hadoop (3) HTML&CSS (47) IBM (25) IoT (1) IS (25) Java (93) Leet Code (4) Machine Learning (46) Meta (18) MICHIGAN (5) microsoft (4) Nvidia (1) Pandas (3) PHP (20) Projects (29) Python (888) Python Coding Challenge (285) Questions (2) R (70) React (6) Scripting (1) security (3) Selenium Webdriver (2) Software (17) SQL (42) UX Research (1) web application (8)

Followers

Person climbing a staircase. Learn Data Science from Scratch: online program with 21 courses