Skip to main content

All Questions

161 questions with no upvoted or accepted answers
2 votes
0 answers
628 views

Read local html files and convert to dataframe with python

I have a local directory on my machine with multiple html files, all with the following naming format > XXXXXXXX_XXXX-XX-XX.html with the X representing numeric characters (the number of numeric ...
Simon's user avatar
  • 21
2 votes
2 answers
189 views

How can I extract the links from HTML?

I'm trying to get a link of every article in this category on the SF chronicle but I'm not sure as to where I should begin on extracting the URLs. Here is my progress so far: from urllib.request ...
Andy's user avatar
  • 303
2 votes
1 answer
2k views

Beautiful Soup can not find all image tags in html (stops exactly at 5)

I am trying to use beautifulsoup to get all the images of a site with a certain class. my issue is that when i run the code just to see if my code can find each image it only gets images 1-5. I ...
Mr.Magik's user avatar
2 votes
2 answers
966 views

Web parsing with python beautifulsoup producing inconsistent result

I am trying to parse the table of this site. I am using python beautiful soup to do that. While it's producing correct output in my Ubuntu 14.04 machine, it's producing wrong output in my friend's ...
Shafaet's user avatar
  • 435
2 votes
1 answer
96 views

How to click one of the href links from output that doesn't have a particular word in it?

I've parsed a list of href links and it's titles from a webpage. I want to click all the links that don't have the word "[$]". Here is my code. from selenium.webdriver.common.keys import Keys from ...
Vasanth Prabakar's user avatar
2 votes
1 answer
928 views

How to parse a web page containing CSS and HTML using python

Am trying to parse and extract some information from a web page that contains CSS and of course HTML. I am using cssutils and beatifulsoup for this. Lets say I want to find out the font size used for ...
R11's user avatar
  • 415
2 votes
1 answer
2k views

using lxml with beautiful soup

I'm having trouble making lxml work with beautiful soup. Running on osx 10.8.4. To install lxml, i did port install py25-lxml and it installed fine. Now I'm getting this error when I try to use lxml ...
shivsta's user avatar
  • 145
1 vote
1 answer
47 views

Beautiful Soup only gets header of table

I am trying to import the data from a table on this website to a csv:http://www.ameren.com/illinois/residential/supply-choice/renewables/interconnection-queue. I have tried many different solutions, ...
user22062084's user avatar
1 vote
1 answer
32 views

Python: How can i get a list of li tags in BeautifulSoup4

I'm trying to scrape a persian webpage and i want to get 3 li tags from a ul containing 6 of them. my problem is that every li, has nested li tags in it and when i use soup.find_all('li'), it finds ...
Seyedmahdi moosavyan's user avatar
1 vote
1 answer
45 views

Why is my code giving me an AttributeError?

I am trying to iterate through a couple levels of html to retrieve links associated with legislation. However, once I reach the 2nd level of links, instead of retrieving a list of links associated ...
justjanga's user avatar
1 vote
1 answer
360 views

Trying to use pd.read_html to extract information and export data to a Pandas dataframe

I am trying to extract the information from the table on this Wikipedia page to automate data collection. Link to webpage: https://en.wikipedia.org/wiki/List_of_members_of_the_17th_Lok_Sabha I am ...
edgestorm517's user avatar
1 vote
0 answers
284 views

Word count of text extracted from URL in Python

I am working on this NLP project that takes URL as an input and summarizes it using gensim library, But as for metrics of the summary that comes as output I want to calculate the word count of the ...
Param Dhingana's user avatar
1 vote
0 answers
40 views

segmenting bs4.element.Tag

Is it possible to segment a bs4.element.Tag into several bs4.element.Tag? You can think of an application as the following: 1- The original bs4.element.Tag contains a paragraph. 2- We want to segment ...
A.M.'s user avatar
  • 1,797
1 vote
1 answer
487 views

How to parse HTML with source mapping?

I want to use Python to parse HTML markup, and given one of the resultant DOM tree elements, get the start and end offsets of that element within the original, unmodified markup. For example, given ...
midrare's user avatar
  • 2,684
1 vote
0 answers
144 views

Beautifulsoup only returning metadata

Can someone help me understand why beautifulsoup seems to only be returning metadata? Here's my code: import requests from bs4 import BeautifulSoup #create a session client = requests.Session() #...
max's user avatar
  • 683

15 30 50 per page
1
2 3 4 5
11