Category Archives: Python

Custom FileExtension Path-Based Importers in Python

Python 3.1 introduced importlib, which can be used to modify the behavior of Python’s import system. In fact, standard Python imports have also been ported to use this new library.

As stated in the documentation, one of the reasons for this libraries existence is to allow programmer’s to write their own Importer modules. As one can imagine, this functionality is not widely used, however, because most people don’t have a desire to alter the functionality of standard Python importing.

However, there are definitely use cases. One blog post describes using the system to block certain modules from being imported. Further, Python has actually used the module to write an Importer that allows one to import modules from zip files. There is also a great Usenix article that describes a lot of the functionality covered in this post.

In this post, I’d like to describe how one can use pre-existing machinery, namely importlib.machinery.FileFinder to quickly write a path-based Importer module to handle custom imports.

First, some background. Importing in Python is actually pretty straight-forward (and pretty elegant). During each import statement, a list of known-importers is consulted. Each importer returns whether it can handle the module name provided. The first importer that can handle the module name is then used to load the module.

Naturally, then, each Importer has two components, a finder and a loader:

find_loader(fullname) indicates whether a module can be loaded based on its name. If the module can be loaded, the loader is returned. If not, None is returned.

load_module(fullname) loads the module, returning the actual module and also doing some other work, such as placing it in sys.modules. All work is described here.

The Importers are loaded from two lists,  sys.meta_path and  sys.path_hooks .  The former imports modules simply based on their names, while the latter imports modules based on their names within a certain path.

Using this knowledge, our goal is to then allow something like this to happen:

In the directory of our project, there is a JSON file,  records.json , which contains customer records indexed by their full name. We want to seamlessly import this file and use it as if it were a dictionary. If the file doesn’t exist, naturally, we’d like to throw an error.

import records
print(records['Jane Doe'])

This seems pretty simple knowing what we know about how the Python import system works:

  1. Since we are operating on the FileSystem, we need to know information about paths. Therefore, we’d like to create an Importer module that can be appended to  sys.path_hooks .
  2. Our find_loader implementation should take the module name (records, in this case), append the “.json” extension to it, and then check to see if it exists on the filesystem. If it does, it should return the loader described in (3)
  3. Our load_module implementation should take the module name, append “.json” to it, read the contents from the filesystem, and load the JSON using Python’s  json module.

As you might notice, steps 1 and 2 are not necessarily JSON specific. In fact, they’re filesystem specific. Luckily, steps 1 and 2 have already been written and provided by Python in the form of importlib.machinery.FileFinder. We can then utilize this to write our JSON Importer.

FileFinder also has a nice helper function, FileFinder.path_hook, which allows us to specify a series of (loader, extensions) pairs. The function then returns a callable that is suitable to be inserted into  sys.path_hooks . We then only need to write the loader. The loader, by definition, is a callable that returns a Loader, which has a load_module(fullname) method. In our implementation, we are going to utilize a class’ constructor as this callable (as suggested in PEP302). We write our loader:

import json
import sys

class JsonLoader(object):
    def __init__(self, name, path):
        self.path = path

    def load_module(self, fullname):
        if fullname in sys.modules:
            return sys.modules[fullname]

        with open(self.path, 'r') as f:
            module = json.load(f.read())

        sys.modules[fullname] = module
        return module

Now we can use the already existing machinery to add this loader into our import system:

from importlib.machinery import FileFinder

json_hook = FileFinder.path_hook( (JsonImporter, ['.json']) )
sys.path_hooks.insert(0, json_hook)

# Need to invalidate the path hook's cache and force reload
sys.path_importer_cache.clear()

import records
print(records['Jane Doe'])

And voila! We have added our new JSON importing functionality. The most important part of the above codeblock is  sys.path_importer_cache.clear() . When your code begins running, all paths checked for imports in  sys.path have already had their hook’s cached. Therefore, in order to ensure that the newly added JSON hook is processed, we need to ensure that the cached list of Importers contains the JSON hook, so we simply invalidate the cache.

The great thing about this code is that FileFinder’s path_hook handles all of the Filesystem operations for you. It automatically traverses directories if directories are part of the import statement and automatically verifies extensions. All you have to do is worry about the loading logic!

Of course, no specific-solution is a good solution. It’s also possible to generalize what we’ve done.

from importlib.machinery import FileFinder
import json
import sys

class ExtensionImporter(object):
    def __init__(self, extension_list):
        self.extensions = extension_list

    def find_loader(self, name, path):
        self.path = path
        return self

    def load_module(self, fullname):
        if fullname in sys.modules:
            return sys.modules[fullname]
        
        return None

class JsonImporter(ExtensionImporter):
    def __init__(self):
        super(JsonImporter, self).__init__(['.json'])

    def load_module(self, fullname):
        premodule = super(JsonImporter, self).load_module(fullname)
        if premodule is None:
             with open(self.path, 'r') as f:
                module = json.load(f)
                sys.modules[fullname] = module
                return module
            raise ImportError('Couldn't open path')

extension_importers = [JsonImporter()]
hook_list = []
for importer in extension_importers:
    hook_list.append( (importer.find_loader, importer.extensions) )

sys.path_hooks.insert(0, FileFinder.path_hook(*hook_list))
sys.path_importer_cache.clear()

import records
print(records['Jane Doe'])

Now there’s no need to worry about any filesystem details. If we want a new importer based on a file extension, we simply extend the ExtensionImporter class, create a load_module method, and pop it into the extension_importers list.

And thus, a complete solution to creating custom file-extension based path-importers has been created. Two lessons I’ve learned while writing this post:

  1. Don’t forget to call  sys.path_importer_cache.clear()
  2. For some reason, appending the finder function to the end of  sys.path_hooks doesn’t seem to work. Inserting it at the beginning, however, does.

unittest.mock in Python

Python is a language that allows for fast iteration. In fact, it is this fast
iteration that makes prototyping
one of its primary use cases. Because of this fast iteration, I don’t feel as
guilty about re-writing programs in Python as I would feel rewriting programs
in C or Rust.

This also makes Python both easy and satisfying to learn. It is easy in the sense
that any new concepts can be quickly incorporated into existing projects. It is
satisfying in the same sense, where newly learned concepts can be quickly put
into practice, producing immediate results.

Thus, the Adventures with Python project is a perfect way to reinforce my
passion for coding and learn more about the language itself.

Mocking

Testing applications is something that is fascinating to me. To have a test suite
that exposes stupid mistakes and easy-to-fix errors is like a golden ticket to
a successful application. Constructing good unit tests has always been challenging
for me, however. For instance, in one of my recent applications I had code to
test whether or not file-reading function succeeded.The module looked a little something like:

import filemanager # A 3rd-party library
def read(filePath):
 return filemanager.read(filePath)[5:]

The test looked a little something like:

import util.filereader
import unittest
class TestFilereader(unittest.TestCase):
 def test_read(self):
 with open('filename', 'w+') as openFile:
   openFile.write('fakercontents')
self.assertEquals(filereader.read('filename'), 'contents')

Although there is nothing immediately wrong with this code, it produces some
issues. For instance, the test relies on file creation, meaning that if it is
run with insufficient privileges, it will fail. Further, it has the nasty side
effect of messing with the file system.

That’s when I discovered unittest.mock,
Python’s way of solving this exact issue. With Mock, I could essentially mock
and alter anything pertaining to my module that could make my tests better.
If I wanted to improve the above test using mock, for instance, I could do the
following:

import util.filereader
import unittest
import mock
class TestFilereader(unittest.TestCase):
 @mock.patch('filereader.filemanager.read')
 def test_read(self, mocked_filemanager_read):
 mocked_filemanager_read.return_value = 'fakercontents'
 self.assertEquals(filereader.read('filename'), contents')
 mocked_filemanager_read.assert_called_with('filename')

This makes the test much simpler and much more robust. The test is no longer
relying on the creation of files or on the reliance of Python file-creation
libraries. It also doesn’t have any nasty side effects.

Most of my work with Mocks has been done in a professional setting, so I cannot
share any real-world code; however, I plan to use Mocks in the testing
of PyCFramework, so be on the
lookout for that if you’re looking for real-world applications.

An Update on Python

Upon looking at the title of this post, you may be under the impression that
I am starting to learn python from scratch. Although that is most definitely
what the title implies, I am doing something a little bit different.

For the past 6 months I have been using python. I first started with a hackathon
project Recipr, where I implemented a server backend using Flask. This gave me the confidence I needed to start using python for other projects.

By its own standards, python is a language that allows you to quickly prototype
ideas. With that being said, it’s perfect for homework assignments where all
that matters is the final answer. (I don’t think many students are very proud
of the actual code that is submitted along with these assignments). Thus,
I also used python to implement several machine learning algorithms for a course
I was taking at the time.

Since then, I wrote a working replacement for my Programming Competition
framework, PCFramework, in python. I uncreatively called it PyCFramework.

I also used python for a fairly-large professional project during an internship
I held this summer.

With that being said, it is obvious that I have a decent amount of experience
with python. However, there is a clearly defined difference between knowing
a language and really knowing a language. To simply know a language is to
code in that language. To really know a language is to understand its principles,
design paradigms, and hidden features that make the language great.

Over the course of the next few months, I intend to really know and understand
python. Here is what to expect:

  1. Progression guided by Intermediate Python
  2. Refactoring and recreation of old projects, such as machine learning algorithms
  3. Creation of new projects, such as a new version of PyCFramework
  4. Discussion of the topics via this blog