Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,58 @@
package org.jetbrains.kotlinx.dataframe.annotations

import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.convertTo

/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start a bit more generally and introductorily before jumping into exactly what it does.
So: "This annotation marks an interface or data class as 'data schema'" (link to https://kotlin.github.io/dataframe/schemas.html). Then continue with "It's used to generate extension properties, etc.". Gives a bit more context to this key DataFrame component :)

* Annotation to generate extension properties API for a given declaration, according to its properties.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each line in this KDoc starts with two spaces instead of one

* Annotated declaration should be non-local and non-private interface or a class.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Annotated declaration should be a non-local and non-private interface or class.

* The aim here is to provide convenient syntax for working with a dataframe instance right after reading from it CSV, JSON, Databases, Arrow, etc.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*a convenient syntax

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*databases

* After `val df = DataFrame.read*` operation, `df` is a source of truth for the DataSchema.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*After the

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the source of truth

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*data schema (the concept)

* One way to look at it, DataSchema "tells" the compiler what's already there. It doesn't affect reading.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*One way to look at it is that the data schema...

* See the list below of code generation methods to simplify the process of getting what we call an initial dataschema.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe "Check out the 'See Also' section?" Depending on how it's rendered, it's not really a list

* Given the initial schema of the data you read, the compiler plugin will provide a typed result for most operations.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to compiler plugin

*
* Example:
* ```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please annotate any code samples with the right file format to get highlights, anywhere you use Markdown ;P

* @DataSchema
* data class Group(
* val id: String,
* val participants: List<Person>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent trailing commas

* )
*
* @DataSchema
* data class Person(
* val name: Name,
* val age: Int,
* val city: String?
* )
*
* @DataSchema
* data class Name(
* val firstName: String,
* val lastName: String,
* )
*
* fun main() {
* val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent spacing. I might recommend using the @sample KoDEx tag :) similar to Korro, we can include code between // SampleStart and // SampleEnd comments, making sure it compiles and is formatted correctly.

* val df = DataFrame.readJson(url).cast<Group>()
* val i: Int = df.id[0] // properties style access to columns and values
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can come up with a better name than i and l right? ;P I know it's not relevant to the example itself, but I feel we should still set a good example with expressive variable names.

*
* val df1 = df.asGroupBy { participants }.aggregate {
* count() into "groupSize"
* distinct { city } into "cities"
* }
*
* // now compiler plugin uses previous knowledge of `Group` combined with its understanding of aggregate operation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the compiler plugin

* // to help you access new columns
* val l: List<String> = df1.cities[0]
* }
* ```
*
* @see [org.jetbrains.kotlinx.dataframe.api.generateDataClasses]
* @see [org.jetbrains.kotlinx.dataframe.api.generateInterfaces]
* @see [org.jetbrains.kotlinx.dataframe.DataFrame.cast]
* @see [org.jetbrains.kotlinx.dataframe.DataFrame.convertTo]
*/
@Target(AnnotationTarget.CLASS)
public annotation class DataSchema(val isOpen: Boolean = true)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably explain what isOpen does

Original file line number Diff line number Diff line change
@@ -1,9 +1,24 @@
package org.jetbrains.kotlinx.dataframe.api

import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema

/**
* Marker interface that's automatically added to classes annotated with [DataSchema]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... why? (because it can help with .append() etc.) Also, we should specify how this is added (with the compiler plugin) and that it's added only to data classes, not to interfaces (right?) and why.

"Added" -> "Added as supertype"

*/
public interface DataRowSchema

/**
* Example:
* ```
* @DataSchema
* data class Person(val name: String, val age: Int)
*
* fun main() {
* val df = dataFrameOf(Person("Alice", 30), Person("Bob", 25))
Copy link
Copy Markdown
Collaborator

@Jolanrensen Jolanrensen Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single space indent ;P Same story, try @sample :) You could even @sample a piece of code and exclude that from sources in this file. For instance:

@ExcludeFromSources
private interface Sample {

    // to make it compile without the compiler plugin
    fun <T> dataFrameOf(vararg rows: T): DataFrame<T> = TODO()

    // SampleStart
    @DataSchema
    data class Person(val name: String, val age: Int)

    fun main() {
        val df: DataFrame<Person> = dataFrameOf(Person("Alice", 30), Person("Bob", 25))
    }
    // SampleEnd
}

/**
 * Example:
 * @sample [Sample]
 */
public inline fun <reified T : DataRowSchema> dataFrameOf(vararg rows: T): DataFrame<T> =
    rows.asIterable().toDataFrame()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append could also do with a small sample like this :)

* }
* ```
*/
public inline fun <reified T : DataRowSchema> dataFrameOf(vararg rows: T): DataFrame<T> =
rows.asIterable().toDataFrame()

Expand Down
Loading