Details about the psycopg porting

On Mon, Jan 24, 2011 at 01:33, Daniele Varrazzo

Post by Daniele Varrazzo
Hello,
I've written to the Psycopg mailing list about the details in the
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.
There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.

"Is there an interface in Python 3 to know if a file is binary or text?"

You can check if it inherits from io.TextIOBase or not. I think that's
the official way. correct me if I'm wrong.
For the other issues I guess I would have to know psycopg2 to be able
to help. :-)

//Lennart

Antoine Pitrou

2011-01-24 15:20:48 UTC

Hello,

Post by Daniele Varrazzo
I've written to the Psycopg mailing list about the details in the
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.
There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”

You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.

Post by Daniele Varrazzo
Is there an interface in Python 3 to know if a file is binary or text?

`isinstance(myfile, io.TextIOBase)` should do the trick. Or the corresponding C
call, using PyObject_IsInstance().

Post by Daniele Varrazzo
In binary mode the file always returns bytes (str in py2, unicode in py3)

I suppose you mean "str in py2, bytes in py3".

Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes

Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.

Regards

Antoine.

Daniele Varrazzo

2011-01-24 16:10:30 UTC

Post by Daniele Varrazzo
Hello,

You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.

Yes, the #s is a leftover from before the conversion: I just have to
decide whether it's better to always emit bytes and break on text
files or if to check for the file capability. Because text mode is the
default for open() I think the former would be surprising: I'll go for
the second option if not overly complex (seems trivial if
PyTextIOBase_Type is available in C without the need of importing
anything from Python, annoying otherwise).

Post by Daniele Varrazzo
In binary mode the file always returns bytes (str in py2, unicode in py3)

I suppose you mean "str in py2, bytes in py3".

Yes: fixed, thanks.

Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes

Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.

In Py2 bytea is converted to buffer objects, passing through a "chunk"
object implementing the buffer interface. so yes, MemoryView is a more
direct port.

-- Daniele

Antoine Pitrou

2011-01-24 16:21:03 UTC

Post by Daniele Varrazzo
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”

You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.

No, you'll have to import. The actual TextIOBase ABC is declared in Python.
(see Lib/io.py if you are curious)

Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes

Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.

In Py2 bytea is converted to buffer objects, passing through a "chunk"
object implementing the buffer interface. so yes, MemoryView is a more
direct port.

Well, does it point to some external memory managed by pgsql itself? Otherwise
bytes or bytearray would still be a better choice IMO (as in better-known and
more practical). In 3.x there's no confusion between 8-bit strings and unicode
strings, so use of an obscure type such as buffer() shouldn't be necessary.

Regards

Antoine.

Daniele Varrazzo

2011-01-25 00:24:11 UTC

Post by Daniele Varrazzo
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”

You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.

No, you'll have to import. The actual TextIOBase ABC is declared in Python.
(see Lib/io.py if you are curious)

Annoying, then :) Will give it a try.

Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes

Is this because it is easier for you to return a memoryview? Otherwise it

would