Unanswered 'beautifulsoup+html-parsing' Questions

2 votes

0 answers

628 views

Read local html files and convert to dataframe with python

I have a local directory on my machine with multiple html files, all with the following naming format > XXXXXXXX_XXXX-XX-XX.html with the X representing numeric characters (the number of numeric ...

Simon

21

asked Jul 1, 2020 at 22:18

2 votes

2 answers

189 views

How can I extract the links from HTML?

I'm trying to get a link of every article in this category on the SF chronicle but I'm not sure as to where I should begin on extracting the URLs. Here is my progress so far: from urllib.request ...

Andy

303

asked Feb 17, 2019 at 5:55

2 votes

1 answer

2k views

Beautiful Soup can not find all image tags in html (stops exactly at 5)

I am trying to use beautifulsoup to get all the images of a site with a certain class. my issue is that when i run the code just to see if my code can find each image it only gets images 1-5. I ...

Mr.Magik

21

asked Apr 25, 2018 at 5:13

2 votes

2 answers

966 views

Web parsing with python beautifulsoup producing inconsistent result

I am trying to parse the table of this site. I am using python beautiful soup to do that. While it's producing correct output in my Ubuntu 14.04 machine, it's producing wrong output in my friend's ...

Shafaet

435

asked Mar 31, 2017 at 10:07

2 votes

1 answer

96 views

How to click one of the href links from output that doesn't have a particular word in it?

I've parsed a list of href links and it's titles from a webpage. I want to click all the links that don't have the word "[$]". Here is my code. from selenium.webdriver.common.keys import Keys from ...

Vasanth Prabakar

441

asked Mar 29, 2016 at 13:00

2 votes

1 answer

928 views

How to parse a web page containing CSS and HTML using python

Am trying to parse and extract some information from a web page that contains CSS and of course HTML. I am using cssutils and beatifulsoup for this. Lets say I want to find out the font size used for ...

R11

415

asked Jul 3, 2013 at 21:56

2 votes

1 answer

2k views

using lxml with beautiful soup

I'm having trouble making lxml work with beautiful soup. Running on osx 10.8.4. To install lxml, i did port install py25-lxml and it installed fine. Now I'm getting this error when I try to use lxml ...

shivsta

145

asked Jun 21, 2013 at 21:24

1 vote

1 answer

47 views

Beautiful Soup only gets header of table

I am trying to import the data from a table on this website to a csv:http://www.ameren.com/illinois/residential/supply-choice/renewables/interconnection-queue. I have tried many different solutions, ...

user22062084

11

asked Jun 12, 2023 at 16:40

1 vote

1 answer

32 views

Python: How can i get a list of li tags in BeautifulSoup4

I'm trying to scrape a persian webpage and i want to get 3 li tags from a ul containing 6 of them. my problem is that every li, has nested li tags in it and when i use soup.find_all('li'), it finds ...

Seyedmahdi moosavyan

107

asked Jun 12, 2023 at 15:42

1 vote

1 answer

45 views

Why is my code giving me an AttributeError?

I am trying to iterate through a couple levels of html to retrieve links associated with legislation. However, once I reach the 2nd level of links, instead of retrieving a list of links associated ...

justjanga

11

asked Apr 22, 2023 at 3:14

1 vote

1 answer

360 views

Trying to use pd.read_html to extract information and export data to a Pandas dataframe

I am trying to extract the information from the table on this Wikipedia page to automate data collection. Link to webpage: https://en.wikipedia.org/wiki/List_of_members_of_the_17th_Lok_Sabha I am ...

edgestorm517

11

asked Dec 31, 2022 at 1:44

1 vote

0 answers

284 views

Word count of text extracted from URL in Python

I am working on this NLP project that takes URL as an input and summarizes it using gensim library, But as for metrics of the summary that comes as output I want to calculate the word count of the ...

Param Dhingana

63

asked Nov 17, 2022 at 21:16

1 vote

0 answers

40 views

segmenting bs4.element.Tag

Is it possible to segment a bs4.element.Tag into several bs4.element.Tag? You can think of an application as the following: 1- The original bs4.element.Tag contains a paragraph. 2- We want to segment ...

A.M.

1,797

asked Feb 8, 2022 at 21:32

1 vote

1 answer

487 views

How to parse HTML with source mapping?

I want to use Python to parse HTML markup, and given one of the resultant DOM tree elements, get the start and end offsets of that element within the original, unmodified markup. For example, given ...

midrare

2,684

asked Sep 26, 2021 at 4:23

1 vote

0 answers

144 views

Beautifulsoup only returning metadata

Can someone help me understand why beautifulsoup seems to only be returning metadata? Here's my code: import requests from bs4 import BeautifulSoup #create a session client = requests.Session() #...

max

683

asked Jun 25, 2020 at 22:05

Collectives™ on Stack Overflow

All Questions

Read local html files and convert to dataframe with python

How can I extract the links from HTML?

Beautiful Soup can not find all image tags in html (stops exactly at 5)

Web parsing with python beautifulsoup producing inconsistent result

How to click one of the href links from output that doesn't have a particular word in it?

How to parse a web page containing CSS and HTML using python

using lxml with beautiful soup

Beautiful Soup only gets header of table

Python: How can i get a list of li tags in BeautifulSoup4

Why is my code giving me an AttributeError?

Trying to use pd.read_html to extract information and export data to a Pandas dataframe

Word count of text extracted from URL in Python

segmenting bs4.element.Tag

How to parse HTML with source mapping?

Beautifulsoup only returning metadata

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags