Skip to content Skip to sidebar Skip to footer

R - Checking Html For Formatting Tags (bold, Italics Etc.)

I am using edgarWebR to parse 10K (SEC EDGAR) filings. I am trying to write an algorithm to deduce whether each HTML element is normal text, a subheading or a heading by checking h

Solution 1:

I think all you're looking for is if a particular string contains html markup that indicates something in that string should be bold and/or italics.

S <- '<p style="margin-top:18px;margin-bottom:0px"><font style="font-family:ARIAL" size="2"><b><i>Our quarterly operating results have fluctuated in the past and might continue to fluctuate, causing the value of our common stock to decline substantially. </i></b></font></p>'
grepl("<b>|<font-weight\\s*=\\s*bold", S, ignore.case = TRUE)
# [1] TRUE
grepl("<i>|<font-style\\s*=\\s*italic", S, ignore.case = TRUE)
# [1] TRUE

Post a Comment for "R - Checking Html For Formatting Tags (bold, Italics Etc.)"