core.html
=========
.. py:module:: core.html
Attributes
----------
.. autoapisummary::
core.html._StrT
core.html.SANE_HTML_TAGS
core.html.SANE_HTML_ATTRS
core.html.VALID_PLAINTEXT_CHARACTERS
core.html.EMPTY_LINK
core.html.cleaner
Functions
---------
.. autoapisummary::
core.html.sanitize_html
core.html.sanitize_svg
core.html.html_to_text
Module Contents
---------------
.. py:data:: _StrT
.. py:data:: SANE_HTML_TAGS
:value: ['a', 'abbr', 'b', 'br', 'blockquote', 'code', 'del', 'div', 'em', 'i', 'img', 'hr', 'li', 'ol',...
.. py:data:: SANE_HTML_ATTRS
.. py:data:: VALID_PLAINTEXT_CHARACTERS
.. py:data:: EMPTY_LINK
.. py:data:: cleaner
.. py:function:: sanitize_html(html: str | None) -> markupsafe.Markup
Takes the given html and strips all but a whitelisted number of tags
from it.
.. py:function:: sanitize_svg(svg: _StrT) -> _StrT
I couldn't find a good svg sanitiser function yet, so for now
this function will be a no-op, though it will try to detect
svg files which are harmful.
I tried to go with bleach/html5lib, but the lack of xml namespace support
makes those options a no go.
In the future we want a proper SVG sanitiser here!
.. py:function:: html_to_text(html: str, *, unicode_snob: bool = True, body_width: int = 0, ignore_images: bool = True, single_line_break: bool = True, **config: Any) -> str
Takes the given HTML text and extracts the text from it.
The result is markdown. The driver behind it is html2text. Have a look
at https://github.com/Alir3z4/html2text/blob/master/html2text/__init__.py
to see all options.