All Questions
Tagged with beautifulsoup python
28,252
questions
1508
votes
35
answers
2.3m
views
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.
The problem is that the error is not always ...
684
votes
20
answers
1.2m
views
How to find elements by class
I'm having trouble parsing HTML elements with "class" attribute using Beautifulsoup. The code looks like this
soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
if (div["...
560
votes
13
answers
1.2m
views
UnicodeEncodeError: 'charmap' codec can't encode characters
I'm trying to scrape a website, but it gives me an error.
I'm using the following code:
import urllib.request
from bs4 import BeautifulSoup
get = urllib.request.urlopen("https://www.website.com/&...
443
votes
22
answers
788k
views
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't ...
379
votes
16
answers
522k
views
How to remove \xa0 from string in Python?
I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...
358
votes
1
answer
715k
views
BeautifulSoup getting href [duplicate]
I have the following soup:
<a href="some_url">next</a>
<span class="class">...</span>
From this I want to extract the href, "some_url"
I can do ...
308
votes
27
answers
535k
views
Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org [duplicate]
I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def ...
229
votes
5
answers
273k
views
TypeError: a bytes-like object is required, not 'str' in python and CSV
TypeError: a bytes-like object is required, not 'str'
I'm getting the above error while executing the below python code to save the HTML table data in a CSV file. How do I get rid of that error?
...
221
votes
11
answers
464k
views
Extracting an attribute value with beautifulsoup
I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code:
import urllib
f = urllib.urlopen("http://58....
219
votes
13
answers
560k
views
Beautiful Soup and extracting a div and its contents by ID
soup.find("tagName", { "id" : "articlebody" })
Why does this NOT return the <div id="articlebody"> ... </div> tags and stuff in between? It returns nothing. And I know for a fact it ...
193
votes
26
answers
565k
views
ImportError: No Module Named bs4 (BeautifulSoup) [duplicate]
I'm working in Python and using Flask. When I run my main Python file on my computer, it works perfectly, but when I activate venv and run the Flask Python file in the terminal, it says that my main ...
191
votes
16
answers
354k
views
retrieve links from web page using python and BeautifulSoup [closed]
How can I retrieve the links of a webpage and copy the url address of the links using Python?
185
votes
7
answers
356k
views
How to find children of nodes using BeautifulSoup
I want to get all the <a> tags which are children of <li>:
<div>
<li class="test">
<a>link1</a>
<ul>
<li>
<a>link2<...
167
votes
9
answers
96k
views
Difference between BeautifulSoup and Scrapy crawler?
I want to make a website that shows the comparison between amazon and e-bay product price.
Which of these will work better and why? I am somewhat familiar with BeautifulSoup but not so much with ...
166
votes
10
answers
332k
views
can we use XPath with BeautifulSoup?
I am using BeautifulSoup to scrape an URL and I had the following code, to find the td tag whose class is 'empformbody':
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
url = &...