Discussion:
Details about the psycopg porting
Daniele Varrazzo
2011-01-24 00:33:28 UTC
Permalink
Hello,

I've written to the Psycopg mailing list about the details in the
psycopg2 porting to Python 3. You can also read everything here:
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.

There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.

Regards,

-- Daniele
Lennart Regebro
2011-01-24 07:21:02 UTC
Permalink
On Mon, Jan 24, 2011 at 01:33, Daniele Varrazzo
Post by Daniele Varrazzo
Hello,
I've written to the Psycopg mailing list about the details in the
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.
There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.
"Is there an interface in Python 3 to know if a file is binary or text?"

You can check if it inherits from io.TextIOBase or not. I think that's
the official way. correct me if I'm wrong.
For the other issues I guess I would have to know psycopg2 to be able
to help. :-)

//Lennart
Antoine Pitrou
2011-01-24 15:20:48 UTC
Permalink
Hello,
Post by Daniele Varrazzo
I've written to the Psycopg mailing list about the details in the
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.
There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”
You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.
Post by Daniele Varrazzo
Is there an interface in Python 3 to know if a file is binary or text?
`isinstance(myfile, io.TextIOBase)` should do the trick. Or the corresponding C
call, using PyObject_IsInstance().
Post by Daniele Varrazzo
In binary mode the file always returns bytes (str in py2, unicode in py3)
I suppose you mean "str in py2, bytes in py3".
Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes
Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.

Regards

Antoine.
Daniele Varrazzo
2011-01-24 16:10:30 UTC
Permalink
Post by Daniele Varrazzo
Hello,
Post by Daniele Varrazzo
I've written to the Psycopg mailing list about the details in the
<http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>.
There is a couple of points still open, so if you want to take a look
at them I'd be happy to receive comments before releasing the code.
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”
You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.
Yes, the #s is a leftover from before the conversion: I just have to
decide whether it's better to always emit bytes and break on text
files or if to check for the file capability. Because text mode is the
default for open() I think the former would be surprising: I'll go for
the second option if not overly complex (seems trivial if
PyTextIOBase_Type is available in C without the need of importing
anything from Python, annoying otherwise).
Post by Daniele Varrazzo
Post by Daniele Varrazzo
In binary mode the file always returns bytes (str in py2, unicode in py3)
I suppose you mean "str in py2, bytes in py3".
Yes: fixed, thanks.
Post by Daniele Varrazzo
Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes
Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.
In Py2 bytea is converted to buffer objects, passing through a "chunk"
object implementing the buffer interface. so yes, MemoryView is a more
direct port.


-- Daniele
Antoine Pitrou
2011-01-24 16:21:03 UTC
Permalink
Post by Daniele Varrazzo
Post by Antoine Pitrou
Post by Daniele Varrazzo
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”
You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.
Yes, the #s is a leftover from before the conversion: I just have to
decide whether it's better to always emit bytes and break on text
files or if to check for the file capability. Because text mode is the
default for open() I think the former would be surprising: I'll go for
the second option if not overly complex (seems trivial if
PyTextIOBase_Type is available in C without the need of importing
anything from Python, annoying otherwise).
No, you'll have to import. The actual TextIOBase ABC is declared in Python.
(see Lib/io.py if you are curious)
Post by Daniele Varrazzo
Post by Antoine Pitrou
Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes
Is this because it is easier for you to return a memoryview? Otherwise it would
make more sense to return a bytes object.
In Py2 bytea is converted to buffer objects, passing through a "chunk"
object implementing the buffer interface. so yes, MemoryView is a more
direct port.
Well, does it point to some external memory managed by pgsql itself? Otherwise
bytes or bytearray would still be a better choice IMO (as in better-known and
more practical). In 3.x there's no confusion between 8-bit strings and unicode
strings, so use of an obscure type such as buffer() shouldn't be necessary.

Regards

Antoine.
Daniele Varrazzo
2011-01-25 00:24:11 UTC
Permalink
Post by Antoine Pitrou
Post by Daniele Varrazzo
Post by Antoine Pitrou
Post by Daniele Varrazzo
the data (bytes) from the libpq are passed to file.write() using
PyObject_CallFunction(func, "s#", buffer, len)”
You shouldn't use "s#" as it will implicitly decode the buffer to unicode.
Instead, use "y#" to write bytes.
Yes, the #s is a leftover from before the conversion: I just have to
decide whether it's better to always emit bytes and break on text
files or if to check for the file capability. Because text mode is the
default for open() I think the former would be surprising: I'll go for
the second option if not overly complex (seems trivial if
PyTextIOBase_Type is available in C without the need of importing
anything from Python, annoying otherwise).
No, you'll have to import. The actual TextIOBase ABC is declared in Python.
(see Lib/io.py if you are curious)
Annoying, then :) Will give it a try.
Post by Antoine Pitrou
Post by Daniele Varrazzo
Post by Antoine Pitrou
Post by Daniele Varrazzo
bytea fields are returned as MemoryView, from which is easy to get bytes
Is this because it is easier for you to return a memoryview? Otherwise it
would
Post by Daniele Varrazzo
Post by Antoine Pitrou
make more sense to return a bytes object.
In Py2 bytea is converted to buffer objects, passing through a "chunk"
object implementing the buffer interface. so yes, MemoryView is a more
direct port.
Well, does it point to some external memory managed by pgsql itself? Otherwise
bytes or bytearray would still be a better choice IMO (as in better-known and
more practical). In 3.x there's no confusion between 8-bit strings and unicode
strings, so use of an obscure type such as buffer() shouldn't be necessary.
Reviewing the code, the buffer object was probably used initially
because the memory is handled by the libpq. I will have a talk with
some heavy user of the bytea types (I am not, but people such as the
gnumed developers are) about what would be best choice for the library
users.

I want to avoid to introduce unnecessary changes for Py2 users, so the
buffer should stay unless we decide there are better options and it's
time for an uncompatible change. Having a radically different
interface for Py3 I fear would be a problem for people migrating from
Py2.

Thank you very much.

-- Daniele

Loading...