Friday, 20 September 2024

Convert PDF files to Excel files using Python

 

pip install pdfplumber pandas openpyxl


import pdfplumber

import pandas as pd


def pdf_to_excel(pdf_file, excel_file):

    

    with pdfplumber.open(pdf_file) as pdf:

        all_tables = []

        for page in pdf.pages:

            tables = page.extract_tables()

            for table in tables:

                if table:  

                    df = pd.DataFrame(table)

                    all_tables.append(df)


        if not all_tables:

            all_tables.append(pd.DataFrame([["No tables found"]]))


        with pd.ExcelWriter(excel_file, engine='openpyxl') as writer:

            for idx, df in enumerate(all_tables):

                df.to_excel(writer, sheet_name=f'Sheet{idx+1}', index=False)


pdf_to_excel('clcodingpdff.pdf', 'clcoding.xlsx')

0 Comments:

Post a Comment

Popular Posts

Categories

100 Python Programs for Beginner (49) AI (34) Android (24) AngularJS (1) Assembly Language (2) aws (17) Azure (7) BI (10) book (4) Books (173) C (77) C# (12) C++ (82) Course (67) Coursera (226) Cybersecurity (24) data management (11) Data Science (128) Data Strucures (8) Deep Learning (20) Django (14) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (6) flask (3) flutter (1) FPL (17) Google (34) Hadoop (3) HTML&CSS (47) IBM (25) IoT (1) IS (25) Java (93) Leet Code (4) Machine Learning (59) Meta (22) MICHIGAN (5) microsoft (4) Nvidia (3) Pandas (4) PHP (20) Projects (29) Python (929) Python Coding Challenge (351) Python Quiz (21) Python Tips (2) Questions (2) R (70) React (6) Scripting (1) security (3) Selenium Webdriver (3) Software (17) SQL (42) UX Research (1) web application (8) Web development (2) web scraping (2)

Followers

Person climbing a staircase. Learn Data Science from Scratch: online program with 21 courses