All Questions
Tagged with beautifulsoup regex
694
questions
1
vote
1
answer
54
views
How to handle regex in BeautifulSoup / CSS selector?
I'm looking for a solution to use regex in BeautifulSoup to find elements that may contain the text HO # with possible spaces and ignoring case sensitivity.
check_ho_number3 = soup.select_one('td:-...
0
votes
2
answers
511
views
Extract Text from Unstructured HTML using Python and Beautiful Soup
For the HTML code below, how do I extract the content below aaa, bbb after the tag using regular Expressions and Beautiful Soup with the Python Requests Library
<html>
<head></head>
...
0
votes
2
answers
114
views
How to find main price and discounted price in a webpage using selenium and python?
I am trying to find a way to find main price and also discounted price in a webpage but I can get just one of them and I need a good pattern or method to extract all price and discounted prices from ...
0
votes
1
answer
51
views
regex code to find email address within HTML script webscraping
I am trying to extract phone, address and email from couple of corporate websites through webscraping
My code for that is as follows
l = 'https://www.zimmermanfinancialgroup.com/about'
address_t = []
...
-3
votes
1
answer
43
views
Manipulate string in python
I am scraping web content with Beautifulsoup, Python and I would like to manipulate the following strings:
'Induktora 28" 36V/14 Ah | 16.5" Bordo'
'Induktora 28" 36V/14 Ah | 18" ...
-1
votes
2
answers
464
views
BeautifulSoup: Search and replace in the text parts of HTML
I want to do a search and replace on the textual part of the content of the HTML elements.
E.g., replacing foo with <b>bar</b> in
<div id="foo">foo <i>foo</i> ...
0
votes
1
answer
63
views
Python - Beautifulsoup - parse multiple span elements
I am trying to extract title from 'span'.
Using the below code as an example, the output I am looking for is 6536 and 9319, which are part of 'title'. Seen below:
span aria-label="6536 users ...
0
votes
1
answer
70
views
What's the proper way to exclude uppercase word/s in regex python
Let's say I've scrapped this from a website.
PARIS - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015). Ut enim ad minim ...
0
votes
1
answer
411
views
Regex pattern with accented characters
I am trying to get the words that start with a capital letter regardless of whether it has a special character or not in the word. Currently, my pattern only gets capital letters without accents.
I ...
1
vote
2
answers
58
views
How to extract key info from <script> tag
I'm trying to extract the user id from this link
https://www.instagram.com/design.kaf/
using bs4 and Regex
Found a JSON key inside script tag called "profile_id"
but I can't even search that ...
1
vote
1
answer
35
views
Find string between two sets of characters or 3rd and 4th quotation marks
I have been playing with Beautifulsoup and re to collect only the links I need from a webpage.
I was able to cut the page content to a <class 'bs4.element.ResultSet'>
This dataset contains the ...
0
votes
1
answer
478
views
api request parameters are ignored
This code works as expected and shows 3 recent wikipedia editors.
My question is that if I uncomment the second URL line, I should get Urmi27 three times or None if the user is not listed.
But I get ...
1
vote
2
answers
51
views
Extracting RegEx pattern across list excluding other html code
I've written a script to pull a list of available report url extensions page available for text extraction.
I've used parsing and BeautifulSoup to extract the reference area for the latest report ...
-2
votes
2
answers
51
views
How to extract text from html in Python with BeautifulSoup4
I am trying to extract its text i.e only the filename from the below html tags
So in the end I would like to have output as below-
BeforeStructure.PNG
AfterStructure.PNG
Can you please guide how to I ...
1
vote
2
answers
687
views
Adding line breaks after times in parentheses
I'm trying to clean up some data from web scraping.
This is an example of the information I'm working with:
Best Time
Adam Jones (w/ help) (6:34)Best Time
Kenny Gobbin (a) (2:38)Personal Best
Matt ...