Reading Python File-Like Objects from C | Python
Last Updated :
07 Jun, 2019
Writing C extension code that consumes data from any Python file-like object (e.g., normal files, StringIO objects, etc.). read()
method has to be repeatedly invoke to consume data on a file-like object and take steps to properly decode the resulting data.
Given below is a C extension function that merely consumes all of the data on a file-like object and dumps it to standard output.
Code #1 :
#define CHUNK_SIZE 8192
static PyObject* py_consume_file(PyObject* self, PyObject* args)
{
PyObject* obj;
PyObject* read_meth;
PyObject* result = NULL;
PyObject* read_args;
if (!PyArg_ParseTuple(args, "O" , &obj)) {
return NULL;
}
if ((read_meth = PyObject_GetAttrString(obj, "read" )) == NULL) {
return NULL;
}
read_args = Py_BuildValue( "(i)" , CHUNK_SIZE);
while (1) {
PyObject* data;
PyObject* enc_data;
char * buf;
Py_ssize_t len;
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
goto final;
}
if (PySequence_Length(data) == 0) {
Py_DECREF(data);
break ;
}
if ((enc_data = PyUnicode_AsEncodedString(data,
"utf-8" , "strict" )) == NULL) {
Py_DECREF(data);
goto final;
}
PyBytes_AsStringAndSize(enc_data, &buf, &len);
write(1, buf, len);
Py_DECREF(enc_data);
Py_DECREF(data);
}
result = Py_BuildValue( "" );
final:
Py_DECREF(read_meth);
Py_DECREF(read_args);
return result;
}
|
A file-like object such as a StringIO instance is prepared to test the code and then it is passed in:
Code #2 :
import io
f = io.StringIO( 'Hello\nWorld\n' )
import sample
sample.consume_file(f)
|
Output :
Hello
World
Unlike a normal system file, a file-like object is not necessarily built around a low-level file descriptor. Thus, a normal C library functions can’t be used to access it. Instead, a Python’s C API is used to manipulate the file-like object much like you would in Python.
So, the read()
method is extracted from the passed object. An argument list is built and then repeatedly passed to PyObject_Call()
to invoke the method. To detect end-of-file (EOF), PySequence_Length()
is used to see if the returned result has zero length.
For all I/O operations, the concern is underlying encoding and distinction between bytes and Unicode. This recipe shows how to read a file in text mode and decode the resulting text into a bytes encoding that can be used by C. If the file is read in binary mode, only minor changes will be made as shown in the code below.
Code #3 :
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
goto final;
}
if (PySequence_Length(data) == 0) {
Py_DECREF(data);
break ;
}
if (!PyBytes_Check(data)) {
Py_DECREF(data);
PyErr_SetString(PyExc_IOError, "File must be in binary mode" );
goto final;
}
PyBytes_AsStringAndSize(data, &buf, &len);
|
Please Login to comment...