GeoNames Home | Postal Codes | Download / Webservice | About 

GeoNames Forum
  [Search] Search   [Recent Topics] Recent Topics   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
How to process the GeoNames RDF file?  XML
Forum Index -> General
Author Message
wouter



Joined: 25/03/2020 16:54:32
Messages: 2
Offline

I tried to use the GeoNames RDF file, but it does not seem to be valid RDF.

This can be tested with the following command:

```
$ curl 'http://download.geonames.org/all-geonames-rdf.zip' | gunzip | head
```

This shows that the GeoNames RDF file contains snippets of RDF/XML interspersed with loose URLs (see below). Since this is a non-standard format, I assume that there is a common procedure or script to transform this file into a valid RDF file.

```
https://sws.geonames.org/3/
<?xml version="1.0" encoding="UTF-8" standalone="no"?><rdf:RDF xmlns:cc="http://creativecommons.org/ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:gn="http://www.geonames.org/ontology#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#"> <gn:Feature rdf:about="https://sws.geonames.org/3/"> <rdfs:isDefinedBy rdf:resource="https://sws.geonames.org/3/about.rdf"/> <gn:name>Zamīn Sūkhteh</gn:name> <gn:alternateName xml:lang="fa">زمين سوخته</gn:alternateName> <gn:alternateName xml:lang="fa">Zamīn Sūkhteh</gn:alternateName> <gn:featureClass rdf:resource="https://www.geonames.org/ontology#S"/> <gn:featureCode rdf:resource="https://www.geonames.org/ontology#S.CRRL"/> <gn:countryCode>IR</gn:countryCode> <wgs84_pos:lat>32.45831</wgs84_pos:lat> <wgs84_pos:long>48.96335</wgs84_pos:long> <gn:parentFeature rdf:resource="https://sws.geonames.org/3202991/"/> <gn:parentCountry rdf:resource="https://sws.geonames.org/130758/"/> <gn:parentADM1 rdf:resource="https://sws.geonames.org/127082/"/> <gn:nearbyFeatures rdf:resource="https://sws.geonames.org/3/nearby.rdf"/> <gn:locationMap rdf:resource="https://www.geonames.org/3/zamin-sukhteh.html"/> </gn:Feature></rdf:RDF>
https://sws.geonames.org/4/
<?xml version="1.0" encoding="UTF-8" standalone="no"?><rdf:RDF xmlns:cc="http://creativecommons.org/ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:gn="http://www.geonames.org/ontology#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#"> <gn:Feature rdf:about="https://sws.geonames.org/4/"> <rdfs:isDefinedBy rdf:resource="https://sws.geonames.org/4/about.rdf"/> <gn:name>Rūdkhāneh-ye Āb-e Zālek</gn:name> <gn:alternateName xml:lang="fa">رودخانه آب زالک</gn:alternateName> <gn:alternateName xml:lang="fa">Āb-e Zālakī</gn:alternateName> <gn:alternateName>Rūdkhāneh-ye Āb-e Zālek</gn:alternateName> <gn:alternateName xml:lang="fa">Rūdkhāneh-ye Zākalī</gn:alternateName> <gn:alternateName>Rūdkhāneh-ye Āb-e Zālekī</gn:alternateName> <gn:alternateName xml:lang="fa">رودخانه زاکلی</gn:alternateName> <gn:alternateName xml:lang="fa">رودخانه آب زالکی</gn:alternateName> <gn:featureClass rdf:resource="https://www.geonames.org/ontology#H"/> <gn:featureCode rdf:resource="https://www.geonames.org/ontology#H.STM"/> <gn:countryCode>IR</gn:countryCode> <wgs84_pos:lat>32.93273</wgs84_pos:lat> <wgs84_pos:long>48.76505</wgs84_pos:long> <gn:parentFeature rdf:resource="https://sws.geonames.org/127082/"/> <gn:parentCountry rdf:resource="https://sws.geonames.org/130758/"/> <gn:parentADM1 rdf:resource="https://sws.geonames.org/127082/"/> <gn:nearbyFeatures rdf:resource="https://sws.geonames.org/4/nearby.rdf"/> <gn:locationMap rdf:resource="https://www.geonames.org/4/rudkhaneh-ye-ab-e-zalek.html"/> </gn:Feature></rdf:RDF>
```
zcw100



Joined: 28/09/2019 15:28:16
Messages: 18
Offline

It's a somewhat strange format. It's a url followed by rdf/xml for that url. I put together a short bash script to output it to a single n-triples file. That I'll post when I find it. It takes a while, maybe a day, and results in about a 600mb file compressed.

I've asked for the mappings used to generate the rdf several times but for some inexplicable reason they won't share it.
zcw100



Joined: 28/09/2019 15:28:16
Messages: 18
Offline

Here's that script. You'll need the raptor2 library for the rapper parser.

#!/bin/bash

while read file; do
rapper --quiet <(echo $file) 2> errors
done < <(awk ‘NR % 2 == 0 { print; }’ $*)
wouter



Joined: 25/03/2020 16:54:32
Messages: 2
Offline

Thank you zcw100!

I guess we will have to add a feature to our LOD Cloud download script that allows custom code to be applied to the downloaded source data file, prior to uploading it to our triple store. This will allow non-RDF to still be uploaded as RDF.

I'm not sure what the benefit is of publishing data in formats that are not standardized, but at the same time GeoNames is still a great resource, so I guess we must invest this extra development effort in order to be able to include it.
 
Forum Index -> General
Go to:   
Powered by JForum 2.1.5 © JForum Team