util: xml: Don't conflict with other libxml2 user callbacks

Message ID 5f7378fda41cb0f60ce1a0c59fb3717d04556d45.1519428075.git.crobinso@redhat.com
State New
Headers show
Series
  • util: xml: Don't conflict with other libxml2 user callbacks
Related show

Commit Message

Cole Robinson Feb. 23, 2018, 11:21 p.m.
lxml is a popular python XML processing library. It uses libxml2
behind the scenes, and registers custom callbacks via
xmlSetExternalEntityLoader. However this can cause crashes if
if an app uses both lxml and libxml2 together in the same process.

This is a known limitation of lxml and libxml2 generally. It also
prevents us from using lxml in virt-manager:

https://bugzilla.redhat.com/show_bug.cgi?id=1544019

However it's easy enough to work around in libvirt, by unsetting the
EntityLoader callback to a known state before we ask libxml2 to
parse a file from disk.

Signed-off-by: Cole Robinson <crobinso@redhat.com>

---
 src/util/virxml.c | 5 +++++
 1 file changed, 5 insertions(+)

-- 
2.14.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Comments

Ján Tomko Feb. 26, 2018, 8:48 a.m. | #1
On Fri, Feb 23, 2018 at 06:21:15PM -0500, Cole Robinson wrote:
>lxml is a popular python XML processing library. It uses libxml2

>behind the scenes, and registers custom callbacks via

>xmlSetExternalEntityLoader. However this can cause crashes if

>if an app uses both lxml and libxml2 together in the same process.

>

>This is a known limitation of lxml and libxml2 generally. It also

>prevents us from using lxml in virt-manager:

>

>https://bugzilla.redhat.com/show_bug.cgi?id=1544019

>

>However it's easy enough to work around in libvirt, by unsetting the

>EntityLoader callback to a known state before we ask libxml2 to

>parse a file from disk.

>

>Signed-off-by: Cole Robinson <crobinso@redhat.com>

>---

> src/util/virxml.c | 5 +++++

> 1 file changed, 5 insertions(+)

>

>diff --git a/src/util/virxml.c b/src/util/virxml.c

>index 6e87605ea..3e01794f9 100644

>--- a/src/util/virxml.c

>+++ b/src/util/virxml.c

>@@ -810,9 +810,14 @@ virXMLParseHelper(int domcode,

>     pctxt->sax->error = catchXMLError;

>

>     if (filename) {

>+        /* Reset any libxml2 file callbacks, other libs (like python lxml)

>+         * may have set their own which can get crashy */

>+        xmlExternalEntityLoader origloader = xmlGetExternalEntityLoader();

>+        xmlSetExternalEntityLoader(xmlNoNetExternalEntityLoader);

>         xml = xmlCtxtReadFile(pctxt, filename, NULL,

>                               XML_PARSE_NONET |

>                               XML_PARSE_NOWARNING);

>+        xmlSetExternalEntityLoader(origloader);


This does not look thread-safe at all - what if two libvirt threads
try to parse some XML at the same time?

Jan

>     } else {

>         xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL,

>                              XML_PARSE_NONET |

>-- 

>2.14.3

>

>--

>libvir-list mailing list

>libvir-list@redhat.com

>https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Daniel P. Berrangé Feb. 26, 2018, 9:29 a.m. | #2
On Mon, Feb 26, 2018 at 09:48:39AM +0100, Ján Tomko wrote:
> On Fri, Feb 23, 2018 at 06:21:15PM -0500, Cole Robinson wrote:
> > lxml is a popular python XML processing library. It uses libxml2
> > behind the scenes, and registers custom callbacks via
> > xmlSetExternalEntityLoader. However this can cause crashes if
> > if an app uses both lxml and libxml2 together in the same process.
> > 
> > This is a known limitation of lxml and libxml2 generally. It also
> > prevents us from using lxml in virt-manager:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1544019
> > 
> > However it's easy enough to work around in libvirt, by unsetting the
> > EntityLoader callback to a known state before we ask libxml2 to
> > parse a file from disk.
> > 
> > Signed-off-by: Cole Robinson <crobinso@redhat.com>
> > ---
> > src/util/virxml.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> > 
> > diff --git a/src/util/virxml.c b/src/util/virxml.c
> > index 6e87605ea..3e01794f9 100644
> > --- a/src/util/virxml.c
> > +++ b/src/util/virxml.c
> > @@ -810,9 +810,14 @@ virXMLParseHelper(int domcode,
> >     pctxt->sax->error = catchXMLError;
> > 
> >     if (filename) {
> > +        /* Reset any libxml2 file callbacks, other libs (like python lxml)
> > +         * may have set their own which can get crashy */
> > +        xmlExternalEntityLoader origloader = xmlGetExternalEntityLoader();
> > +        xmlSetExternalEntityLoader(xmlNoNetExternalEntityLoader);
> >         xml = xmlCtxtReadFile(pctxt, filename, NULL,
> >                               XML_PARSE_NONET |
> >                               XML_PARSE_NOWARNING);
> > +        xmlSetExternalEntityLoader(origloader);
> 
> This does not look thread-safe at all - what if two libvirt threads
> try to parse some XML at the same time?

Indeed, that is not thread safe - I checked the libxml code and it just
sets a static variable, not thread local.

Regards,
Daniel

Patch

diff --git a/src/util/virxml.c b/src/util/virxml.c
index 6e87605ea..3e01794f9 100644
--- a/src/util/virxml.c
+++ b/src/util/virxml.c
@@ -810,9 +810,14 @@  virXMLParseHelper(int domcode,
     pctxt->sax->error = catchXMLError;
 
     if (filename) {
+        /* Reset any libxml2 file callbacks, other libs (like python lxml)
+         * may have set their own which can get crashy */
+        xmlExternalEntityLoader origloader = xmlGetExternalEntityLoader();
+        xmlSetExternalEntityLoader(xmlNoNetExternalEntityLoader);
         xml = xmlCtxtReadFile(pctxt, filename, NULL,
                               XML_PARSE_NONET |
                               XML_PARSE_NOWARNING);
+        xmlSetExternalEntityLoader(origloader);
     } else {
         xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL,
                              XML_PARSE_NONET |