Highest scored 'beautifulsoup+python' questions

1508 votes

35 answers

2.3m views

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. The problem is that the error is not always ...

Homunculus Reticulli

67.7k

asked Mar 30, 2012 at 12:06

684 votes

20 answers

1.2m views

How to find elements by class

I'm having trouble parsing HTML elements with "class" attribute using Beautifulsoup. The code looks like this soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["...

Neo

13.8k

asked Feb 18, 2011 at 11:58

560 votes

13 answers

1.2m views

UnicodeEncodeError: 'charmap' codec can't encode characters

I'm trying to scrape a website, but it gives me an error. I'm using the following code: import urllib.request from bs4 import BeautifulSoup get = urllib.request.urlopen("https://www.website.com/&...

SstrykerR

8,642

asked Nov 23, 2014 at 18:47

443 votes

22 answers

788k views

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't ...

user3773048

6,159

asked Jun 25, 2014 at 0:12

379 votes

16 answers

522k views

How to remove \xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...

zhuyxn

7,031

asked Jun 12, 2012 at 9:12

358 votes

1 answer

715k views

BeautifulSoup getting href [duplicate]

I have the following soup: <a href="some_url">next</a> <span class="class">...</span> From this I want to extract the href, "some_url" I can do ...

dkgirl

4,729

asked Apr 28, 2011 at 8:25

308 votes

27 answers

535k views

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org [duplicate]

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def ...

Catherine4j

3,120

asked May 8, 2018 at 14:32

229 votes

5 answers

273k views

TypeError: a bytes-like object is required, not 'str' in python and CSV

TypeError: a bytes-like object is required, not 'str' I'm getting the above error while executing the below python code to save the HTML table data in a CSV file. How do I get rid of that error? ...

ShivaGuntuku

5,434

asked Dec 15, 2015 at 7:20

221 votes

11 answers

464k views

Extracting an attribute value with beautifulsoup

I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code: import urllib f = urllib.urlopen("http://58....

Barnabe

2,335

asked Apr 10, 2010 at 6:53

219 votes

13 answers

560k views

Beautiful Soup and extracting a div and its contents by ID

soup.find("tagName", { "id" : "articlebody" }) Why does this NOT return the <div id="articlebody"> ... </div> tags and stuff in between? It returns nothing. And I know for a fact it ...

Tony Stark

25.3k

asked Jan 25, 2010 at 22:46

193 votes

26 answers

565k views

ImportError: No Module Named bs4 (BeautifulSoup) [duplicate]

I'm working in Python and using Flask. When I run my main Python file on my computer, it works perfectly, but when I activate venv and run the Flask Python file in the terminal, it says that my main ...

harryt

2,093

asked Aug 2, 2012 at 18:47

191 votes

16 answers

354k views

retrieve links from web page using python and BeautifulSoup [closed]

How can I retrieve the links of a webpage and copy the url address of the links using Python?

NepUS

1,979

asked Jul 3, 2009 at 18:29

185 votes

7 answers

356k views

How to find children of nodes using BeautifulSoup

I want to get all the <a> tags which are children of <li>: <div> <li class="test"> <a>link1</a> <ul> <li> <a>link2<...

tej.tan

4,137

asked Jun 9, 2011 at 2:40

167 votes

9 answers

96k views

Difference between BeautifulSoup and Scrapy crawler?

I want to make a website that shows the comparison between amazon and e-bay product price. Which of these will work better and why? I am somewhat familiar with BeautifulSoup but not so much with ...

Nishant Bhakta

2,967

asked Oct 30, 2013 at 15:43

166 votes

10 answers

332k views

can we use XPath with BeautifulSoup?

I am using BeautifulSoup to scrape an URL and I had the following code, to find the td tag whose class is 'empformbody': import urllib import urllib2 from BeautifulSoup import BeautifulSoup url = &...

Shiva Krishna Bavandla

26.5k

asked Jul 13, 2012 at 6:55

Collectives™ on Stack Overflow

All Questions