Tag Archives: Glom

www.glom.org

Going to FOSDEM

I booked my flights and hotel, so I’m finally going to my first FOSDEM conference, in Brussels, Belgium on the weekend of the 24th and 25th February 2007.

Technically, I am incredibly unproductive at conferences. I much prefer email for that kind of thing. For instance, I hate feeling forced to have instant opinions without first doing research and weighing pros and cons. Sometimes I want to put in-person conversations on hold while I do some fact-checking. However, actually meeting people is very good for development momentum, and it’s a chance to meet new European open source people and maybe drum up some more work for Openismus.

There are some GNOME activities at FOSDEM, but I’m not really sure what to expect. Dodji wants me to do a gtkmm talk, but I don’t really see the point. If you like gtkmm and C++ then there’s a lot of information about it out there already. I think conference talks should be for new things. So Dodji should tell us about Nemiver and what it needs from gdb, and I should maybe do a Glom status talk.

Self-hosting Glom

Glom 1.3.5 has experimental support for self-hosting of its databases, so you should never again need to configure PosgreSQL.

It does this by starting its own PostgreSQL instance, supplying its own PostgreSQL configuration and data files, and connecting to it. Those files are stored in one directory, though I’d like to improve that directory structure. This vastly improves the user experience, so I expect this to bring a lot more users once I’ve shaken out the new bugs.

This should satisfy most people who were demanding support for SQLite instead of PostgreSQL. Unlike SQLite, this still allows you to share your database across the network with multiple users. Support for non-PostgreSQL external database servers is still possible, but that work is really not my priority.

I really do need to combine some of the dialogs. At the moment you see several dialogs, one after the other, to save a new file, choose a database name (and choose self-hosting or external hosting), then to connect or provide initial connection details.

glom_new_database_with_self_hosting

Glom: Showing related related records

The feature I mentioned in “SQL: joins and duplicates” is now implemented in Glom 1.3.3. It’s just a matter of choosing relationships from a tree rather than just a list, though it’s only 2 levels deep for now to keep it simple.

So if there are, for instance, Invoices with related Invoice Lines records, which refer to Products, then you could look at a Product details screen and see all the Invoices that use the Product (via their Invoice Lines records). If I added a 3rd level of child relationships then you could even see all the Customers (used by the Invoice table) that had ever been invoiced for the product.

Here’s a screenshot of the UI for the Licenses, Packages, Package Scans example:

glom_screenshot_related_related

The UI isn’t perfect. I don’t like that it’s enabled via a checkbox but I think the tree would be confusing if it was default. Trees in GtkComboBox widgets are also rendered as these confusing menus, but it could be replaced by a popup GtkTreeView some day. But I feel very satisfied that I’ve made it easy to do something that’s usually difficult, with only minor UI changes. Many thanks to Jerry Haltom for showing me how this could be done.

The SQL that’s generated is much the same as for regular related records (SELECT related_table.field1, related_table.field2 FROM related_table WHERE related_table_field3 = 123), but with an extra JOIN … AS … ON clause to link to the intermediate table, a slightly changed WHERE clause (to refer to that intermediate join), and a GROUP BY on the related table’s primary key to ensure that we get only one row for each related record. A sub-select query might be more efficient, but this allows me to reuse the existing code, and lets the user think in terms of the target related table rather than an intermediate one.

So using Glom’s –debug_sql now shows yet more complex SQL that you wouldn’t want to write yourself.

SQL: joins and duplicates.

Here’s a little SQL problem for the lazy web. It’s something that I’m trying to implement in Glom for the Repository Analyzer:

Let’s say we have these tables:

Packages:

package_name package_description
something something description
somethingelse something else description
somethingmore something more description

and Package Scans:

package_name version license_id
something 0.1 43
something 0.2 43
something 0.3 44
somethingelse 1.5 43
somethingmore 0.9 40

Now, I want to get the package.package_description for all packages that appear in package_scans with license_id 43, which would look like this:

‘something description’
‘something else description’

The best I can do so far is a SELECT on package_scans, doing a LEFT OUTER JOIN:

SELECT “relationship_package”.”package_description” FROM “package_scans” LEFT OUTER JOIN “packages” AS “relationship_package” ON (“package_scans”.”package_name” = “relationship_package”.”package_name”) WHERE “package_scans”.”license_id” = 43;

which gives me duplicates, like so:

‘something description’
‘something description’
‘something else description’

If possible I’d like to do this without GROUP BY. I feel there must be a simpler way to say “give me a row for each record in packages for which the (indirect) relationship is true”. If the main FROM table could somehow be packages, instead of package_scans, then the LEFT OUTER JOIN would cause me to have only one row for each relevant packages record. In general, Glom never gives you repeat rows because that’s confusing.

If I can figure out what SQL should be generated, I could imagine that I might get that result in Glom by defining a relationship in terms of a doubly-related field. So the user could say “Show me records from the packages_with_package_scan_license_id” relationship. That relationship would be defined as something like

records from packages where licenses.license_id == packages::package_scans::license_id

But maybe a GROUP BY option really is the clearest.

Update: I feel like the sub-select idea might be what I want, if I can figure the syntax out. I like the idea of showing records from a relationship that is itself defined by a link between a value and a field in another related set of records.

Embedding Python and importing from memory

Time to ask the web again:

In Glom, I use PyRun_String() to execute python scripts from memory (the scripts are never on disk), and get the result. But I’d like those scripts to be able to import Python modules that are also in memory (in a virtual library of reusable code). Python’s import statement usually just looks for files in the Python import path. I’ve looked through the Python/C API reference, but I can’t see anything suitable. Unlike C, you can’t just paste the code into the start of the script, because being in a module affects the syntax.

I have a vague idea that PyImport_ExecCodeModule might make the Python code importable under the provided name, but it seems to require a compiled object, for which I need to supply a filename to PyNode_Compile().

I’d rather not write all the files to a temporary directory just so they can be imported. That seems like a fragile hack.

Update:

object = Py_CompileString(script_text, module_name, Py_File_Input) followed by PyImport_ExecCodeModule(module_name, object) seems to work. Thanks commenters. But now I wonder how to remove modules when they have been removed from the virtual library. Maybe I could do PyImport_Cleanup() each time, but that is documented as for internal use only.

What I’m doing now

OK, so I can’t tell you much about what I’m doing, but I can tell you that I’m busy.

Tomorrow I fly to San Francisco to attend the Ubuntu Developer Summit, at the Googleplex for the week. This is unrelated to everything else that I’m doing, but should be fun.

Since the start of September I’ve been working on a small experimental project for an Openismus GmbH client. It’s secret but not currently part of any actual product or strategy for the client. It’s using the C-based GObject API, so I’m playing lots with GObjects properties, interfaces, vfuncs, inits and constructors, etc. This is full time for me until approximately early January 2007.

I’m also steering some work that Daniel and Johannes are doing for a client, which is mostly about creating documentation at the moment. That’s secret for now, though not for any particularly good reason, and will definitely be public and generally useful eventually, at a secret time in the future. They can’t say, so don’t ask.

I’m also doing bits and pieces and general support here and there for another Openismus GmbH client, and working on the repository analyzer for them.

I’ll likely have some work to do soon for yet another locally-based (more or less) client who have some long-term plans to use gktmm on embedded devices for a specialized market. They’ve already started and are doing well, but would like some advice and manpower.

In general, I’m busier than I’d like to be. I’d like to always have some excess capacity so that it’s easier to accept work for new customers, with the added avantage of being less stressed and thefore more productive. An extra full time employee would make that happen, but I’m not quite ready to do that yet, given the lack of guarantees of enough work in the longer term. Possibly I’m being too cautious.

Repository Analyzer improvements

I’ve done some more work on the debian Repository Analyzer that I mentioned before, which can help companies with license compliance by helping them to discover the licenses of their debian-based product’s software, and to navigate around the dependencies with that information.

Now it identifies some standard free-software and open-source licenses (such as GPL, LGPL, MIT, X11, Boost, etc), though you’ll have to sanity-check the results, because there are several ways that it can automatically guess wrong when looking at the tarballs and diffs. There are Open Tarball and Open Diff buttons so you can take a quick look at the files for yourself.

It still ends up with lots of apparently-unique licences – for instance, there seem to be quite a few minor variations of the BSD, MIT, and X11 licenses. It needs a human to decide whether they are really equivalent, but I added a Remove As Duplicate button to let a human do that. That was a nice exercise of Glom’s pygda and pygtk support in Glom (This requires Glom 1.2).

There’s a lot of bugfixes too. In particular, it now does a better job of handling license files in various text encodings. This is still non-expert Python code, so I’d welcome any cleanup patches.

Glom 1.2

Glom logo Glom 1.2 is now out (see the announcement). It has a few new features and less bugs than the 1.0 branch. It should be available in Ubuntu Edgy soon. Glom 1.0 is apparently now available in Fedora 5 and 6, and I expect that to be updated to 1.2 soon.

For Glom 1.4, I hope to finally port to the latest libgda API, and add a dependency on gtksourceviewmm, to do source-code highlighting of Python code in calculated fields and buttons scripts. And hopefully Rasmus Toftdahl Olesen‘s Relationships Overview feature will also be added.

Updates of non-essential stuff in Ubuntu

Ubuntu 6.06 (Dapper) will be the recommended stable and supported Ubuntu release for a while yet. It only has Glom 1.0.3, though a 1.0.8 version exists with lots of bugfixes. I can understand that a stable version of a distro doesn’t want to add features by jumping to Glom 1.2, but I don’t understand why bugfix updates aren’t allowed. Generally it seems that updates are only allowed for security bugfixes.

Sure, you have to be careful. If Ubuntu updated OpenOffice in Ubuntu Dapper to fix a crasher then they’d have to be very careful that they weren’t introducing new bugs. But nobody is using Glom for anything critical yet, and it’s not in the officially supported repository. Just because an X update broken things once, it doesn’t seem necessary to freeze the Glom version. Of course, I want people to get my bugfixes so I can get more feedback, so I can make Glom good enough to be relied upon.

Glom 1.1/1.2

Glom logoI’ve been working on a Glom 1.1/1.2 branch recently, trying to quickly add some features in time to release a stable version for the Ubuntu Edgy release. The Glom 1.1.6 announcement has a list of what’s new. My favourite new features are:

Add Related Table

This is a time-saver that lets you quickly add a table and a relationship that uses it. So, for instance, if you have

Album
Nebraska
Asbury Park
Closing Time

you can quickly add an ID field, and a related table, so you have:

Album Artist ID
Nebraska 1
Asbury Park 1
Closing Time 2

with

Artist ID Artist Name
1 Bruce Springsteen
2 Tom Waits

In future, I’d like it to take existing data from existing fields and put that into the related record, so that in one step we could get those two tables from this:

Album Artist Name
Nebraska Bruce Springsteen
Asbury Park Bruce Springsteen
Closing Time Tom Waits

Use of the pygda API

Via the records.connection object you can now use the whole pygda API (The libgda Python API) from your button scripts and calculated fields, so there is no limit to what you can do with the underlying database. Be careful with this great power. More details on the Glom API page.

This feature was easy to implement thanks to the wonderful pygobject_new() function from pygtk, which is a lot like gtkmm’s Glib::wrap().

Debian Repository Analyzer for license compliance

Over the last few weeks I’ve been using spare hours to create a utility to help companies discover the licenses of their software and to help them decide what licenses they should use for software that uses open source dependencies, and what they should do with their sources and modifications. Assuming that they are using a debian apt dependency system.

It’s a python script that uses the python-apt API to discover all the packages in a debian repository, and then looks for licenses in the source tarball or debian diff. It then tries to figure out what license texts are actually the same, using Python’s SequenceMatcher, because many license files contain short but irrelevant bits of unique text at the start. That can take an hour or so.

Then it creates a .glom file that you can open in Glom (>= 1.1.3). That creates a database and fills it, then lets you explore the data. At this point a human can give the licenses names such as “GPL”, “LGPL”, etc, which will then show up against all the packages that had that license text. And then you can see all the licenses of each package’s dependencies.

Here is version 0.1.4, under the GPL, as I believe is required by the python-apt license. Update: Here is a screenshot.

For now, I had to hardcode the base URL of the repository. I haven’t actually tested it completely with the example repository that’s in the sources.list, but it does work with the (secret) repository for which it was written.

Tips on using python-apt

Michael Vogt has been very helpful with my questions about python-apt, but he’s not to blame for my python coding. Remember, I’m a C++ coder. Python-apt is very useful, but the API is currently rather obscure and confused by the presence of two similar APIs alongside each other. Michael is working on making it sane.

I was also frustrated by the general lack of documentation of Python APIs. It’s often very difficult to know what type of object is likely to be returned by a method and to know what methods an object supports.

So here are some of the things that I learnt from Michael Vogt, so that they don’t just sit in my Inbox where nobody else can see them.

Using a local sources.list and cache instead of your system’s

You need to set a bunch of config variables. Here’s my full list so far:

apt_pkg.Config.Set("Dir::Etc::sourcelist", "./sources.list")
apt_pkg.Config.Set("Dir::Cache::archives", "./tmp_apt_archives")
apt_pkg.Config.Set("Dir::State", "./tmp_apt_varlibapt")  #usually /var/lib/apt
apt_pkg.Config.Set("Dir::State::Lists",  "./tmp_apt_varlibdpkg") #usually /var/lib/dpkg/
apt_pkg.Config.Set("Dir::State::status", "./tmp_apt_varlibdpkg/status") #If we don't set this then we will pick up packages from the local system, from the default status file.

You will need to make sure that those directories exist, along with some sub-directories.

After calling cache.update(), remember to do this, otherwise no packages will be found:

cache.open(apt.progress.OpProgress())

Getting the name of a package:

When iterating over a cache, you can do this (imagine the indenting because WordPress doesn’t want to show it to you):

for pkg in cache:
candver = cache._depcache.GetCandidateVer(pkg._pkg)
name = candver.ParentPkg.Name

Getting the full URI of a tarball:

You can get a tarball URI like so (again, imagine the indenting),

srcrec = srcrecords.Lookup(source_package_name)
if srcrec:
for (the_md5hash, the_size, the_path, the_type) in srcrecords.Files:
if(the_type == "tar"):
tarball_uri = the_path

but it only gives you the second half of the URI. To get the whole thing, if you have the latest version of python-apt from Ubuntu Edgy, do

full_uri = srcrec.Index.ArchiveURI(tarball_uri)