core.html
=========

.. py:module:: core.html


Attributes
----------

.. autoapisummary::

   core.html._StrT
   core.html.SANE_HTML_TAGS
   core.html.SANE_HTML_ATTRS
   core.html.VALID_PLAINTEXT_CHARACTERS
   core.html.EMPTY_LINK
   core.html.cleaner


Functions
---------

.. autoapisummary::

   core.html.sanitize_html
   core.html.sanitize_svg
   core.html.html_to_text


Module Contents
---------------

.. py:data:: _StrT

.. py:data:: SANE_HTML_TAGS
   :value: ['a', 'abbr', 'b', 'br', 'blockquote', 'code', 'del', 'div', 'em', 'i', 'img', 'hr', 'li', 'ol',...


.. py:data:: SANE_HTML_ATTRS

.. py:data:: VALID_PLAINTEXT_CHARACTERS

.. py:data:: EMPTY_LINK

.. py:data:: cleaner

.. py:function:: sanitize_html(html: str | None) -> markupsafe.Markup

   Takes the given html and strips all but a whitelisted number of tags
   from it.


.. py:function:: sanitize_svg(svg: _StrT) -> _StrT

   I couldn't find a good svg sanitiser function yet, so for now
   this function will be a no-op, though it will try to detect
   svg files which are harmful.

   I tried to go with bleach/html5lib, but the lack of xml namespace support
   makes those options a no go.

   In the future we want a proper SVG sanitiser here!


.. py:function:: html_to_text(html: str, *, unicode_snob: bool = True, body_width: int = 0, ignore_images: bool = True, single_line_break: bool = True, **config: Any) -> str

   Takes the given HTML text and extracts the text from it.

   The result is markdown. The driver behind it is html2text. Have a look
   at https://github.com/Alir3z4/html2text/blob/master/html2text/__init__.py
   to see all options.