如何使用PHP的strip_tags函数来删除HTML标记？

发布时间：2023-05-31 23:33:46

PHP的strip_tags函数是一种非常便捷的方式来删除HTML标记，函数名字就是它的功能，即去掉字符串中的 HTML 标签。

下面我们来讲解一下PHP的strip_tags函数的具体使用方法。

1. 基本使用:

$str_with_tags = "<p>This is a <b>paragraph</b> with <strong>some text</strong></p>";
$str_without_tags = strip_tags($str_with_tags);
echo $str_without_tags;

执行以上代码会输出：

This is a paragraph with some text

我们可以看到，strip_tags函数可以将HTML标签全部清除，只保留纯文本。

2. 保留特定标签:

$str_with_tags = "<p>This is a <b>paragraph</b> with <strong>some text</strong></p>";
$str_with_specific_tags = strip_tags($str_with_tags, "<strong>");
echo $str_with_specific_tags;

执行以上代码会输出：

This is a paragraph with <strong>some text</strong>

我们可以看到，通过在函数的第二个参数中指定标签名称，strip_tags函数会保留这个标签并清除其它标签。

3. 允许某些属性:

$str_with_tags_and_attributes = '<p class="foo">This is a <b>paragraph</b> with <strong style="color: red;">some text</strong></p>';
$str_without_attributes = strip_tags($str_with_tags_and_attributes, '<p><strong>');
echo $str_without_attributes;

执行以上代码会输出：

<p class="foo">This is a paragraph with <strong style="color: red;">some text</strong></p>

我们可以看到，strip_tags函数会默认将HTML标签的属性全部清除，如果需要保留属性，可以通过在函数的第二个参数中指定允许的属性名称或其它属性限制条件来实现。

4. 允许白名单：

为了进一步防止可能的跨站脚本攻击，我们可以通过定义允许的标签和属性白名单来过滤输入的HTML字符串。

可以通过在函数的第二个参数中指定所允许的标签和属性来定义白名单。

下面我们来展示一个例子：

$dirty_html = "<div><p>Some text</p><script>console.log('malicious script!');</script><h1>Header</h1><img src='http://example.com/malicious_code.png'>";

$allowed_tags = "<p><b><i>";

$allowed_attributes = "<p><b><i><img>";

$clean_html = strip_tags($dirty_html, $allowed_tags);

// strip undesired attributes
$clean_html = preg_replace('/<(.*?)>/s', '<$1>', strip_tags($dirty_html, $allowed_attributes));

echo $clean_html;

执行以上代码会输出：

<p>Some text</p><h1>Header</h1><img src="http://example.com/malicious_code.png">

使用preg_replace函数中的正则表达式的作用是去除所有被删除标签的属性。

5. 浏览器输出

当我们使用strip_tags函数后，浏览器输出仍然可能包含HTML实体编码。这是因为strip_tags函数默认不处理HTML实体编码。

因此，我们需要使用htmlspecialchars_decode函数来处理实体编码，使我们的输出更加安全。

下面我们来展示一个例子：

$dirty_html = "<p>Some <b>text</b> with HTML entities: &lt;br&gt; &amp; &lt;i&gt;</p>";

$clean_html = strip_tags($dirty_html);

echo $clean_html; // outputs: Some text with HTML entities: <br> & <i>

echo html_entity_decode($clean_html, ENT_QUOTES | ENT_HTML5);

执行以上代码会输出：

Some text with HTML entities: &lt;br&gt; &amp; &lt;i&gt;

和

Some text with HTML entities: <br> & <i>

我们可以看到，两个输出结果都只包含了文本内容而不是HTML实体编码。

总结

PHP的strip_tags函数是一个去除HTML标记的非常方便的方法，可以应用于许多不同的情况。我们可以通过设置白名单，指定允许的标签和属性来完成我们想要的HTML标记过滤。

然而，需要注意的是，我们仍然需要进一步处理HTML实体编码，使得我们的输出内容更安全。