|
- Method resolution order:
- HTMLParser
- sgmllib.SGMLParser
- markupbase.ParserBase
Methods defined here:
- __init__(self, formatter, verbose=0)
- Creates an instance of the HTMLParser class.
The formatter parameter is the formatter instance associated with
the parser.
- anchor_bgn(self, href, name, type)
- This method is called at the start of an anchor region.
The arguments correspond to the attributes of the <A> tag with
the same names. The default implementation maintains a list of
hyperlinks (defined by the HREF attribute for <A> tags) within
the document. The list of hyperlinks is available as the data
attribute anchorlist.
- anchor_end(self)
- This method is called at the end of an anchor region.
The default implementation adds a textual footnote marker using an
index into the list of hyperlinks created by the anchor_bgn()method.
- ddpop(self, bl=0)
- do_base(self, attrs)
- do_br(self, attrs)
- do_dd(self, attrs)
- do_dt(self, attrs)
- do_hr(self, attrs)
- do_img(self, attrs)
- do_isindex(self, attrs)
- do_li(self, attrs)
- do_link(self, attrs)
- do_meta(self, attrs)
- do_nextid(self, attrs)
- do_p(self, attrs)
- do_plaintext(self, attrs)
- end_a(self)
- end_address(self)
- end_b(self)
- end_blockquote(self)
- end_body(self)
- end_cite(self)
- end_code(self)
- end_dir(self)
- end_dl(self)
- end_em(self)
- end_h1(self)
- end_h2(self)
- end_h3(self)
- end_h4(self)
- end_h5(self)
- end_h6(self)
- end_head(self)
- end_html(self)
- end_i(self)
- end_kbd(self)
- end_listing(self)
- end_menu(self)
- end_ol(self)
- end_pre(self)
- end_samp(self)
- end_strong(self)
- end_title(self)
- end_tt(self)
- end_ul(self)
- end_var(self)
- end_xmp(self)
- error(self, message)
- handle_data(self, data)
- handle_image(self, src, alt, *args)
- This method is called to handle images.
The default implementation simply passes the alt value to the
handle_data() method.
- reset(self)
- save_bgn(self)
- Begins saving character data in a buffer instead of sending it
to the formatter object.
Retrieve the stored data via the save_end() method. Use of the
save_bgn() / save_end() pair may not be nested.
- save_end(self)
- Ends buffering character data and returns all data saved since
the preceding call to the save_bgn() method.
If the nofill flag is false, whitespace is collapsed to single
spaces. A call to this method without a preceding call to the
save_bgn() method will raise a TypeError exception.
- start_a(self, attrs)
- start_address(self, attrs)
- start_b(self, attrs)
- start_blockquote(self, attrs)
- start_body(self, attrs)
- start_cite(self, attrs)
- start_code(self, attrs)
- start_dir(self, attrs)
- start_dl(self, attrs)
- start_em(self, attrs)
- start_h1(self, attrs)
- start_h2(self, attrs)
- start_h3(self, attrs)
- start_h4(self, attrs)
- start_h5(self, attrs)
- start_h6(self, attrs)
- start_head(self, attrs)
- start_html(self, attrs)
- start_i(self, attrs)
- start_kbd(self, attrs)
- start_listing(self, attrs)
- start_menu(self, attrs)
- start_ol(self, attrs)
- start_pre(self, attrs)
- start_samp(self, attrs)
- start_strong(self, attrs)
- start_title(self, attrs)
- start_tt(self, attrs)
- start_ul(self, attrs)
- start_var(self, attrs)
- start_xmp(self, attrs)
- unknown_endtag(self, tag)
- unknown_starttag(self, tag, attrs)
Data and other attributes defined here:
- entitydefs = {'AElig': '\xc6', 'Aacute': '\xc1', 'Acirc': '\xc2', 'Agrave': '\xc0', 'Alpha': 'Α', 'Aring': '\xc5', 'Atilde': '\xc3', 'Auml': '\xc4', 'Beta': 'Β', 'Ccedil': '\xc7', ...}
Methods inherited from sgmllib.SGMLParser:
- close(self)
- Handle the remaining data.
- convert_charref(self, name)
- Convert character reference, may be overridden.
- convert_codepoint(self, codepoint)
- convert_entityref(self, name)
- Convert entity references.
As an alternative to overriding this method; one can tailor the
results by setting up the self.entitydefs mapping appropriately.
- feed(self, data)
- Feed some data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '
'). (This just saves the text,
all the processing is done by goahead().)
- finish_endtag(self, tag)
- # Internal -- finish processing of end tag
- finish_shorttag(self, tag, data)
- # Internal -- finish parsing of <tag/data/ (same as <tag>data</tag>)
- finish_starttag(self, tag, attrs)
- # Internal -- finish processing of start tag
# Return -1 for unknown tag, 0 for open-only tag, 1 for balanced tag
- get_starttag_text(self)
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_charref(self, name)
- Handle character reference, no need to override.
- handle_comment(self, data)
- # Example -- handle comment, could be overridden
- handle_decl(self, decl)
- # Example -- handle declaration, could be overridden
- handle_endtag(self, tag, method)
- # Overridable -- handle end tag
- handle_entityref(self, name)
- Handle entity references, no need to override.
- handle_pi(self, data)
- # Example -- handle processing instruction, could be overridden
- handle_starttag(self, tag, method, attrs)
- # Overridable -- handle start tag
- parse_endtag(self, i)
- # Internal -- parse endtag
- parse_pi(self, i)
- # Internal -- parse processing instr, return length or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return length or -1 if not terminated
- report_unbalanced(self, tag)
- # Example -- report an unbalanced </...> tag.
- setliteral(self, *args)
- Enter literal mode (CDATA).
Intended for derived classes only.
- setnomoretags(self)
- Enter literal mode (CDATA) till EOF.
Intended for derived classes only.
- unknown_charref(self, ref)
- unknown_entityref(self, ref)
Data and other attributes inherited from sgmllib.SGMLParser:
- entity_or_charref = <_sre.SRE_Pattern object>
Methods inherited from markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- unknown_decl(self, data)
- # To be overridden -- handlers for unknown objects
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
|